Tipo de pregunta Preg

De MoodleDocs
Nota: Aunque este plugin actualmente(Febrero 2020) está enlistado como disponible para las ramas de Moodle 2.3 a 3.1 solamente, la pestaña Stats de la página del plugin muestra más de 30 sitios que tienen este plugin instalado y son ramas de Moodle 3.2 a 3.8. Estas ramas de Moodle que oficialmente no están soportadas si pueden tener un plugin regex funcional, pero no tendrán las herramientas de autoría 'Authoring tools' disponibles. Los desarrolladores del plugin están trabajando para actualizar este plugin y reparar el problema con JavaScript que es el responsable de la falla. Si Usted habilita el modo de depuración a desarrollador ('DEVELOPER'), Usted podría ver algunas advertencias que no representan ningún riesgo grave para su servidor, las cuales deberán ser corregidas en versiones futuras de este plugin.



El tipo de pregunta Preg es un tipo de pregunta que usa expresiones regulares ( regexes ) para evaluar las respuestas del estudiante (aunque Usted puede usar este tipo de pregunta sin regexes para emplear sus características de proporcionar pistas). Las expresiones regulares ofrecen grandes capacidades y flexibilidad, tanto para los maestros al momento de hacer las preguntas, como para los estudiantes al escribir sus respuestas. Esta primera parte le guiará a Usted acerca del uso de esta documentación; por favor úsela con discreción. Puede encontrar más detalles acerca de la sintaxis regex en http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm. Existen muchos buenos manuales sobre regex. No se repetirá la información aquí.


Maneras de usar las preguntas Preg y esta documentación

Yo no quiero saber nada acerca de las expresiones regulares, pero las pistas acerca (de la letra de la) siguiente palabra me parecen útiles

Entonces Usted puede usar el tipo de pregunta Preg simplemente como si fuera un Tipo de Pregunta de Respuesta corta con pistas avanzadas, sin que tenga que saber acerca de expresiones regulares. Para hacer esto, Usted necesitará elegir:

  • Notación (Notation) => respuesta corta de Moodle
  • Motor (Engine) => Autómata de estado finito (Finite state automata)
  • Concordancia exacta (Exact matching) => Si (Yes)

Después de esto, Usted puede simplemente copiar respuestas desde sus preguntas de respuesta corta. Usted puede desear leer la sección acerca de dar pistas para entender más acerca de las configuraciones para proporcionar pistas.

Yo tengo un conocimiento vago sobre expresiones regulares, pero quiero usar concordancia de patrón

Si para Usted resulta dificil escribir expresiones regulares, pero quiere usar su fortaleza para evaluar patrones, existen herramientas de autoría que podrían ayudarle mucho para crear sus preguntas. Las herramientas le muestran a Usted el significado de su regex en formas diferentes: la estructura interna de la expresión (árbol de sintaxis), la ruta visual de la concordancia (gráfica explicatoria) y una descripción en texto. También le permiten probar su regex contra varias cadenas de caracteres y ver si funciona como se espera. Experimente y juegue con sus regexes; vea los cambios correspondientes en las herramientas de autoría, y eventualmente obtendrá el regex que quiere.

Lea la sección acerca de herramientas de autoría, entonces (probablemente después de experimentar un poco con las herramientas por Usted mismo) comience con la sección acerca de comprendiendo las expresiones regulares (esto es opcional, pero podría ser interesante y ayudarle mucho).También debería Usted de leer la sección acerca del funcionamiento de las preguntas para entender mejor varias configuraciones y como afectan a sus preguntas.

Yo puedo hacer un esfuerzo para aprender expresiones regulares bien y poder hacer todo lo que estas permiten

Bien, si Usted no conoce regexes pero quiere comprenderlas y crear con facilidad expresiones complejas. Entonces, en lugar de intentar fuerza bruta, mejor use su tiempo para leer y comprender esta sección. Después lea alegremente acerca de las herramientas de autoría y úselas para experimentar la creación de regexes. Con estas herramientas Usted puede ver si realmente las entiende bien y se comportan tal como Usted esperaba. El modo sin sintaxis puede ser especialmente útil cuando usted intenta obtener el significado correcto de precedence y arity. Después de que Usted comprenda bien los principios de regexes, lea la sección acerca del funcionamiento de las preguntas y la referencia sobre expresiones regulares (para conocer sus posibilidades; no se preocupe tratando de entender o recordarlo todo -simplemente repase periódicamente para aprender un poco más). Ahora Usted debería de poder escribir regexes sin mucha necesidad de usar herramientas de autoría, excepto por la herramienta para hacer pruebas que usará para probar sus expresiones.

Yo conozco las expresiones regulares lo suficientemente bien para escribirlas por mí mismo yo solito sin mayor guía

Usted debería de leer acerca del funcionamiento de las preguntas para comprender varias configuraciones y el comportamiento de las preguntas con esas configuraciones. Usted también podría estar interesado en probar regex en la sección acerca de herramientas de autoría. finalmente, las referencia sobre expresiones regulares podrían serle de alguna utilidad.


Como funcionan las preguntas Preg

Básicamente, este tipo de pregunta es una versión extendida de la pregunta de respuesta corta. Extiende sus características en varias maneras diferentes (Usted podría usarlas en casi cualquier combinación):

  • Concordancia de patrón (Pattern matching) - usando expresiones regulares, Usted puede crear patrones poderosos que describen las respuestas posibles de los estudiantes
  • Proporcionar pistas (Hinting) - cuando los estudiantes están atorados en una pregunta, usted podría permitirles que pidan una palabra siguiente correcta (lexema) o un caracter (con un posible castigo)

Configuraciones que afectan el funcionamiento de la pregunta

Configura la sensibilidad para minúsculas/MAYÚSCULAS para todas las expresiones regulares que Usted especifique como respuestas. Tome nota de que Usted también puede modificar la sensibilidad a minúsculas/MAYÚSCULAS para las partes regex.

Concordancia exacta afecta la pregunta de la siguiente manera:

Si
la respuesta completa del estudiante, desde la primera hasta la última letra, debe de coincidir con su expresión regular
No
la respuesta del estudiante puede solamente contener una parte que coincide con su regex: por ejemplo, si la respuesta correcta es "toda" entonces "toda la cadena" será una respuesta de estudiante correcta

Usted puede también configurar sus regexes para que concuerden (coincidan) con toda la respuesta de los estudiantes usando sintaxis regex especial.

Nota: Para hacer preguntas en idioma español, debemos elegir el Idioma (lenguaje) Inglés.


Expresión regular (Regular expression)
una notación usual para expresión regular. Precisamente, es un dialecto regex compatible con Perl. Usted puede escribir regex sobre múltiples cadenas de caracteres para una mejor lectura - los saltos de línea serán ignorados.
Expresión regular (extendida) Regular expression (extended)
útil para regexes realmente complejas . Es similar al modificador PHP 'x'. Ignorará cualquier espacio-blanco escapado en sus regexes, que no estuvieran dentro de sus clases de caracteres (character classes) (use \s en su lugar) - de forma tal que Usted pueda formatar libremente sus regexes con espacios. También ignorará saltos de línea con una sola excepción útil: todo lo que esté después del caracter '#' hasta el final de la cadena de caracteres será tratado como comentario (# no debería de estar escapado nio tampoco debería de estar adentro de una clase de caracter -character class-).
Respuesta corta de Moodle (Moodle shortanswer)
úselo para evitar completamente la sintaxis regex: solamente copie las respuestas de sus preguntas de respuesta corta. El comodín de '*' está soportado. Al elegir el motor FA (FA engine) Usted puede tener acceso a las características de proporcionar pistas. Usted puede saltarse todo lo dicho aquí sobre regexes, pero asegúrese de leer la sección acerca de como dar pistas para entender varias configuraciones que Usted puede alterar para configurar el comportamiento de dar pistas de sus preguntas.

Motor de concordancia (Matching engine) especifica el módulo de programa que efectua la concordancia regex. No hay un 'mejor' motor de concordancia - depende de las características que Usted quiera usar. Los motores tienen diferentes estabilidades y ofrecen diferntes características a usar.

Extensión preg PHP (PHP preg extension)
debería de usarse cuando Usted no necesita dar pistas y cuando otros motores están rechazando sus expresiones por ser demasiado difíciles o cuando Usted encuentra problemas en ellos. Está basado en las funciones nativas PHP preg_. Soporta características regex 100% compatibles con perl; es muy estable y ha sido probado exhaustivamente. Pero no soporta concordancia parcial; así que (a menos que logremos que los desarrolladores de PHP añadan soporte para concordancia parcial) NO se pueden dar pistas.Sin embargo, soporta captura de subpatrones. Elíjalo cuando Usted necesite características regex complejas que los otros motores no soportan.
Autómata de estado finito FA (Finite state automata FA)
puede usarse para dar pistas a sus estudiantes. El motor FA es un código PHP personalizado; permite muchas (pero no todas) características regex y esta probado exhaustivamente (pasa todas las pruebas de la suite AT&T testregex y la mayoría de las pruebas de PCRE testinput1, testinput4 suite para las características que soporta, lo que significa muchísimas), pero en raros casos, aun contiene problemas (bugs). Por ahora, las características no soportadas son lookaround assertions y algunos tipos de subpatrones condicionales.

Dar pistas

El dar pistas está soportado por el motor FA (FA engine) en comportamientos adaptivo e interactivo.

Coincidencia parcial

El dar pistas comienza con concordancia parcial. Entendemos por respuesta parcialmente correcta una cadena de caracteres que principia con caracteres correctos (que concuerdan con su regex) pero la concordancia se rompe en algunos caracteres. Asumamos que Usted hizo la pregunta '¿De qué colores son la bandera de Francia y la bandera de los Países Bajos?' e ingresó el regex

 "son azul, blanco(,| y) rojo"

y asumamos que un estudiante contestó

 "los colores son azul, vlanco y rojo"

En esta situación la concordancia parcial es

 "son azul, "

Note que el regex está no-anclado ("Concordancia exacta" está configurada a "No") por lo que la concordancia podría no iniciar con el primer caracter de la respuesta del estudiante (como en el ejemplo superior, donde "los colores " es saltado). Mientras se use solamente concordancia parcial, el estudiante verá las partes correctas e incorrectas:

 los colores son azul, vlanco y rojo

ES preg no exact matching.png


ES preg the colors of the french flag.png


Reglas generales para dar pistas

El tipo de pregunta preg no añade caracteres dados como pistas a la respuesta del estudiante (a diferencia del tipo de pregunta REGEXP), y en su lugar, los muestra separados por varias razones:

  1. Es responsabilidad del estudiante si es que quiere o no añadir caracteres al dar pistas a su respuesta (Y POSIBLEMENTE ALGO MÁS).
  2. Facilita ligeramente el pensar acerca de una pista, dado que, cuando la respuesta es modificada, es demasiado fácil simplemente presionar repetidamente pista, lo que usualmente no es un comportamiento deseable.

Cuando se aposible, el dar pista elije un caracter que conduce a la ruta más corta para completar la concordancia. Considere esta respuesta a la expresión regular previa:

 are blue, white; red

Hay dos caracteres posibles para pista: "," o " " (que conduce a la ruta " and"). La pregunta eligirá "," porque conduce a la ruta más corta para completar la concordancia, mientras que " " conduce a la ruta que es 3 caracteres más larga.

Es posible que no todas las expresiones regulares den calificación de 100%. Considere que USted añadió una expresión para los estudiantes con mala memoria:

 are white(,| and) red

con calificación de 60% y retroalimentación acerca de haber olvidado blue. Usted no querría dar pistas a estudiantes que condujeran a que el estudiante respondiera

  are white, red

si el estudiante hubiera ingresado

  are white, oh I forgot the other colors (...me olvidé de los otros colores).

Límite de calificación para dar pista (Límite de calificación para pista) controla esto. Solamente las expresiones regulares con la calificación mayor o igual al límite de calificación para dar pista serán usadas para concordancia parcial y dar pista. Si Usted configura el límite para calificación para dar pista a 1, solamente la expresión regular con calificación de 100% será usada para dar pistas, ssi Usted lo configura a 0,5 las expresiones regulares con calificaciones de 50% a 100% serán usadas para dar pistas y las de 0% a 49% no lo serán. Las expresiones regulares no usadas para dar pistas solamente funcionan cuando tienen una concordancia con la respuesta del estudiante.

Pista sobre el caracter siguiente

Cuando está disponible el dar pista sobre el caracter siguiente, el estudiante tendrá disponible para oprimir el botón para 'pista del caracter siguiente y cuando ' lo presione recibirá un caracter correcto siguiente, resaltado por su color del fondo:

 they are blue, wvhite and red

Típicamente, Usted debería configurar el castigo para la pista más que el castigo para una pregunta usual, porque son aplicados separadamente: el castigo usual para un intento sin dar pistas, mientas que el castigo para pista es para un intento que proporciona pistas.

Pista sobre el lexema (palabra) siguiente

Lexema significa una parte atómica (indivisible) de un lenguaje o idioma. Para un idioma natural una palabra, un número, un signo de puntuación (o grupo de puntuaciones como '?!' o '...') son lexemas. Para un lenguaje de programación, podrtía ser una palabra clave, un nombre de variable, una constante, un operador. Note que los espacios usualmente no son considerados lexemas, sino separadores entre ellos, dado que no tienen un significado particular.

Pista del siguiente lexema le mostrará al estudiante, ya sea la finalización del lexema actual (si la concordancia parcial termina adentro de él) o el lexema siguiente (si el estudiante completó el lexema actual). Por ejemplo

  are blue

o

  are blue,

o

  are blue, white

El tipo de pregunta preg, a partir de la versión 2.3, permite el uso de dar pista del lexema siguiente usando el bloque de lenguajes formales (formal languages block). Usted debería de elegir el lenguaje o idioma en el cual espera la respuesta para su pregunta, ya que los límites de los lexemas son diferentes para los diferentes lenguajes e idiomas. Por ahora se soportan los siguientes lengaujes e idiomas (habrá más):

Inglés
el examinador de idioma inglés reconoce palabras, números y signos de puntuación. También seleccionaremos idioma Inglés para nuestras preguntas en español;
'lenguaje 'C/C++ : un lenguaje de programación C (o C++);
lenguaje printf
un lenguaje especial para formateo de cadenas de caracteres en el lenguaje de programación C/C++, Usted probablemente lo tendrá deshabilitado.

El administrador del sitio puede controlar cual lenguajes o idioma está disponible por defecto para los profesores, para evitarles confusión.

Tome nota de que "lexema" típicamente no es la palabra que a Usted le gustaría que los estudiantes vieran en el botón para dar pistas. Cada lenguaje o idioma define su propia palabra para esto. Usted puede ingresar otra palabra en la descripción de la pregunta, si no le gustan las descripciones actuales.

Nota: Pendiente de Traducir. ¡Anímese a traducir esta página!.     ( y otras páginas pendientes)

Retroalimentación y captura de subpatrón

Any pair of parentheses in a regex are considered as a subpattern and when matching the engine remembers (captures) not only the whole match, but its parts corresponding to all subpatterns. Subpatterns can be nested. If a subpattern is repeated (i.e. have quantifier), only last match of all repeats will be captured. If you want to change order of evaluation without defining a subpattern to capture (which will speed up processing), you should use (?: ) instead of just ( ). Lookaround assertions don't create subpatterns.

Subpatterns are counted from left to right by opening parentheses. Precisely 0 is the whole regex, 1 is first subpattern etc. You can insert them in the answer's feedback using simple placeholders: {$0} will be replaced by the whole match, {$1} by the first subpattern value etc. That can improve the quality of you feedbacks. Placeholders won't work in the general feedback because different answers can have different number of subpatterns.

Let's look at a regex defining a decimal number with optional integral part:

[+\-]?([0-9]+)?\.([0-9]+)

It has two subpatterns: first capturing integral part, second - fractional part of the number. If you wrote the feedback:

The number is: {$0} Integral part is {$1} and fractional part is {$2}

Then a student entered

123.34

He will see

The number is: 123.34 Integral part is 123 and fractional part is 34

If no integral part is given, {$1} will be replaced by empty string. There is no way (for now) to erase "Integral part is" under that circumstances - the placeholder syntax may become complex and prone to errors.

Búsqueda de cosas faltantes y mal colocadas

Joseph Rezeau's REGEXP question type has a missing words feature, allowing to define an answer that will work when something is absent in the answer (and give an appropriate feedback to the student).

Se pueden lograr efectos similares con negative assertions combinados con el anclaje del principio de la coincidencia. La expresión regular para buscar la palabra faltante necessary sería

 ^(?!.*\bnecessary\b.*)

where

  • (?!.*\bnecessary\b.*) is a negative lookahead assertion, that allows matching only if there is no word necessary ahead of some point in the string;
  • ^ is an assertion too, that anchores the match to the start of the response (otherwise there would be places in response after the word "necessary", where matching is possible even if the word is present).

In case if the description is difficult to you, just surround regexp to be missing with ^(?! and ). Don't try '--' syntax, that is specific to Jospeh Rezeau's REGEX question type!

You can also have a rough search for misplaced words (it will actually work only if anything else is correct) using syntax like this:

  (?!<I\s+)\bam\b(?!\s+victor)

This expression catches misplaced "am" in the sentence "I am victor" by first looking for "am" doens't have "I" before it ("(?!<I\s+)" part) and then "victor" after it ("(?!\s+victor)" part). "\s+" allows any number of spaces between words. If you want to catch the first (last) word (punctuation mark, etc) - then you should place simple assertions for start/end of string ("^" or "$") instead of words in related assertions. For instance to look for misplaced "I" you should write something like

  (?!<^)\bI\b(?!\s+am)

which looks for "I" that is not preceded by start of the string and not followed by "am".

Note, that if you have several answers to catch missing and misplaced things, only one will actually work for any given student response.

Since the Preg 2.3 release you can combine hints and catching missing words. But you should be sure that the answers that look for missing things (and other to give specific feedback) have a fraction (grade) lower, that límite de calificación para pista (see #Hinting). You actually don't want to generate hints for these answers, as they don't define a correct situation, so it's not problem but a feature.

Plantillas

Preg 2.8 introduces a new feature called Templates.

A template is a more convenient and semantic way to write frequently used patterns. Template is a regular expression comment that is changed on special regex before execution. Some templates can be parametrized, any regular expression can be used as parameter value.

  • Simple template => (?###template_name)
  • Parametrized template => (?###template_name<)param1(?###,)param2(?###,)...paramN(?###>)

Templates can be used for making regex shorter and easier to understand. For particulary complex regexes (usually ones with parentheses) you may consider using notación extendida to be able to use line breaks and spaces for better formatting regex.

For now, templates are hard-coded in the Preg question type, but there are plans to add support for custom user templates in the next releases.

At the moment the following templates are available:

Template name Parameters Description Example
word None One or more 'word' characters (letters, digits and underscore). ((?###word)\s+)+ will match any number of words with any number of spaces between them
integer None Optional sign + or -, followed by one or more digits (?##integer) will match any integer value
parens_req The text you want to see in parentheses Something in at least one pair of correctly closed round parentheses (?###parens_req<)a(?###>) will match (a) ((a)) and (((((a)))))
parens_opt The text you want to see in parentheses Something optionally placed in any number of pairs of correctly closed round parentheses (?###parens_opt<)a(?###>) will match a (a) and (((a)))
brackets_req The text you want to see in brackets Something in at least one pair of correctly closed square brackets (?###brackets_req<)(?###word)(?###>) will match [abc] [[cat]]
brackets_opt The text you want to see in brackets Something optionally placed in any number of pairs of correctly closed square brackets (?###brackets_opt<)(?###word)(?###>) will match cat [dog] [[[Fido]]]
custom_parens_req 1. Pattern for the opening parenthesis 2. Text inside custom parentheses 3. Pattern for the closing parenthesis This template is similar to the parens_req, but allows you to specify custom parentheses (possibly by more than one character) (?###custom_parens_req<)<(?###,)a(?###,)>(?###>) will match <a> <<<a>>>
custom_parens_opt 1. Pattern for the opening parenthesis 2. Text inside custom parentheses 3. Pattern for the closing parenthesis This template is similar to the parens_req, but allows you to specify custom parentheses (possibly by more than one character) (?###custom_parens_opt<)/\*(?###,)(?###word)(?###,)\*/(?###>) will match /*something*/ /*/*/*word*/*/*/

One templates can be used inside other as parameters. For example you can write

 (?###parens_opt<)(?###word)(?###>)

It will match strings "a", "(a)", "(((((long_word_in_many_parens)))))" and so on.

Herramientas de autoría

Las herramientas de autoría están allí para ayudarle a escribir, probar y comprender sus regexes. For now they can show you the meaning of written regex (and its parts), and test it. Authoring tools are activated by pressing the "edit" icon near the regex field.

authoring tools icon

Existen cuatro herramientas de autoría disponibles:

árbol de sintaxis (syntax tree)
le muestra a Usted la estructura interna de expresiones regulares
gráfica explicativa (explaining graph)
le muestra a Usted de una forma gráfica como funcionarán sus expresiones
descripción
formula el significado de sus expresiones en idioma Inglés
Prueba de expresión regular (testing tool)
le permite a Usted escribir cadenas de caracteres y ver como concuerdan con su regex

Nota sobre la instalación y problemas técnicos conocidos

To have syntax tree and explaining graph tools working you (or your site admin) have to install Graphviz[1] on the server and fill the 'pathtodot' setting on you Moodle installation at Site Administration > Server > System Paths. Graphviz is used to draw pictures for you. Be sure to use Graphviz 2.36 or newer (earlier versions had a bug in svg output which led to incorrect pictures).

Syntax tree and explaining graph may not work correctly in old Opera versions - for some reason the images are not updated on user actions. Fortunately, there's a newer version 16 for Windows which works with herramientas de autoría pretty well. On Linux you will have to use something else.

Área de expresión regular

Here you can edit your regular expression. Clicking on "Show" sends the regex to all tools - syntax tree, explaining graph, description and testing results will be updated. "Save" closes the herramientas de autoría form and saves the regex and test strings in the main question editing form. "Cancel" closes the herramientas de autoría form and discards all changes made there.

You can select part of regular expression there, and corresponding parts of syntax tree, explaining graph, description and matched part of the strings will be highlighted. It is possible to select part of regex text, that doesnt correspond with a logically completed part of regular expression. In that case you selection will be widened to the nearest logically completed part.

Opciones de concordancia

There you can change options affecting matching - matching engine, regex notation, exact matching, and case sensitivity.

  • Matching engine will change the code performing matching - you could use Testing tool to see if it suits your needs.
  • Regular expression notation will change the way regexes are written - all instruments will show you the difference how this notation is interpreted.
  • Case sensitivity will affect basic case sensitivity of expression, the results you can see in the explaining graph - case insensitive nodes are gray, case sensitive - white.
  • Exact matching will add new parts to your regexp to ensure the entire student's response will match it. These added parts will be shown on gray background in the tools - see the picture below.

exact matching

Paneo y zoom de imágenes

Syntax tree and Explaining graph tools generate a pictures and they can be too large. So starting from Preg 2.6 these tool allow you easy pan and zoom features.

To Pan image press left mouse button on the free area (not on the nodes - it will select them) and drag mouse around without releasing button. On the Explaining graph you should put Rectangle selection mode off in order to pan, since in rectangle selection mode pressing mouse button starts drawing rectangle.

To Zoom image use mouse wheel while mouse pointer is over image.

Árbol de sintaxis

As was said above, regular expressions, like all expressions, are trees of operators and operands. Syntax tree shows the inner structure of expression graphically: what is inside what. This will be the most useful if you know how to understand regular expressions or aprendiendo a hacer esto.

If you don't understand operators and precedence conception well, it may have a small meaning to you. But it is still useful to find out, where you need parentheses: cf. trees for ab+ (a) and (ab)+ (b) on the picture below.

parenthesis in the structure of regex

The tree will show you names and numbers of all subpatterns, so you can check their numerations - and back references to it.

numbered and named subpatterns in tree

The part of expression you selected is shown in green rectangle. You can select nodes of the tree to by pressing on them when Collapsing mode check box is unchecked.

part of the tree is selected

Starting from Preg 2.6 Sytax tree tool have Collapsing mode, since syntax tree can be quite large and you usually need only part of it. When Collapsing mode is on, pressing on tree node collapse all it's child nodes into single ellipsis one (see image below). Pressing collapsed node again will un-collapse it. Switching off collapsing mode doesn't un-collapse nodes, it will allow you to return to the usual Selecton mode. On the picture you can see two collapsed nodes with tooltip, showing collapsed part of regular expression over one of them.

collapsed tree

Gráfica explicativa

The graph shows how regular expression works. Its nodes are matched characters, its edges show paths throught the nodes from the beginning to the end. alternatives and concatenation

Oval nodes represent individual characters, character sequences (so that graph isn't extremly big) or single special character classes (in which case they change line colour). Complex character classes are shown as rectangles. Simple assertions are checked between nodes, so they are written on the edges.

graph for regex ^\dabc[!,0-9]$

Dotted rectangles shows you repeated parts of you expression.

graph for regex \d*

Solid line rectangles show you subpatterns. When expression is matched, it remembers which part of the string matched each subpattern. You could insert it in the feedback or use in backreference in expression. If you do not need to remember subpatterns, you may use (?: ) instead of ( ) parentheses, that will speed up matching.

de)f

Green rectangle shows you selected part of expression. Switching on "Rectangle selection mode" you can select part of the graph using rubber rectangle and see corresponding part of regex selected on all instruments (including regular expression text).

selection in the tree and graph

Descripción

Descripción trata de formular una oración. que describe como se supone que funciona la expresión. La parte seleccionada de la expresión será mostrada por un color del fondo amarillo.

Herramienta para probar

You can enter a set of strings there, one per line. These strings will be matched against your expression. You'll see coloured strings, showing which parts of your strings matched the expression, so you can test if it works as you expected. You will also see green check marks for the strings that match entire regular expressions (and will be graded for that regex) and red crosses for the strings that don't give full match. PHP preg matcher can't show partial matches, so it only shows full matches or nothing (to not mislead you that entire string is wrong).

If you selected a part of regex, you will be able to see what part of strings matches that part (usually in yellow color, but that may depend on you theme). FA matcher will show that for any part of regex, PHP preg matcher - only for capturing subpatterns.

The strings for testing will be saved in database, if you save regex (they will be lost if you close window with "cancel" button) and (later) question.

Entendiendo las expresiones regulares

Entendiendo las expresiones regulares en general

Regular expressions - as any expressions - are just a bunch of operators with their operands. Don't worry - you all learned to master arithmetic expressions from chilhood and regular ones are just as easy - if you look at them from the right angle. Learn (or recall) only 4 new words - and you are a master of regexes with very wide possibilities. Let's go?

Look at a simple math expression: x+y*2. There are two operators: '+' and '*'. The operands of '*' are 'y' and '2'. The operands of '+' are 'x' and the result of 'y*2'. Easy?

Thinking about that expression deeper we can find that there is a definite order of evaluation, governed by operator's precedence. The '*' has a precedence over '+', so it is evaluated first. You can change the evaluation order by using parentheses: (x+y)*2 will evaluate '+' first and multiply the result by 2. Still easy?

One more thing we should learn about operators is their arity - this is just the number of operands required. In the example above '+' and '*' are binary operators - they both take two operands. Most of arithmetic operators are binary, but the minus has also the unary (single operand) form, like in this equation: y=-x. Note that the unary and binary minuses work differently.

Now any expression are just a lego game, where you set a sequence of operators with correct number of operands for each (arity), taking heed of their evaluation order by using their precedence and parentheses. Arithmetic expressions are for evaluating numbers. Regular expressions are for finding patterns in strings, so they naturally use another operands and operators - but they are governed by the same rules of precedence and arity.

Expresiones regulares

Regular expressions is a powerful mechanism for searching in strings using patterns. So their operands are characters or a sets of characters, that is allowed in particular position. A is a regular expressions that matches a single character 'A'. The operators in regular expressions define a way to combine individual characters in the pattern: sequence (concatenation operator), alternative and repeating (it is called quantifier). The concatenation is so simple operator, that it doesn't have any character for it at all - just write some characters in sequence, and they'll be concatenated. But it is still have precedence, so that the question can see, did you want to repeat a single character or a sequence of them. Alternative is written as vertical bar. There are many form of quantifiers - most commonly used are question mark (repeat zero or one times), asterisk (zero or more times) and plus (one or more times). You may specify mininimum and maximum number of repeats in curly braces - this is a quantifier too.

The special characters that define operators should be escaped when used as operands - preceded by a backslash. Mathematical expressions never have escaping problems since their operands (numbers, variables) are constructed from different characters than operators (+,- etc), but when constructing a pattern for matching you should be able to use any character as an operand.

Character classes allows you to specify several possible characters for one place. They can be defined in many different ways: by enumeration of characters in square brackets [as3], by ranges in square brackets [a-z], by special sequences (\d means any digit, \W anything except a letter, digit and underscore, [[:alpha:]] any letter etc). An important type of operand is a simple assertions: they allow you to test some conditions - start of the string ^, end of the string $ or word border \b.

You could find a list and more examples of operands and operators in reference section.

Prioridad y orden de evaluación

A quantifier has precedence over concatenation and concatenation has precedence over alternative. Let's look what it means:

  1. quantifiers over concatenation means that quantifiers are executed first and will repeat only a single character if used without parentheses:
    • "many times*" matches "manytime" followed by zero or more "s";
    • "(many times)*" matches "many times" zero or more times - changing the previous regex by using parentheses allows us define a string repetition;
  2. concatenation over alternative means that you can define multi-character alternatives without parentheses (for single character alternatives it's better to use character classes, not the alternative operator):
    • "first|second|third" matches "first" or "second" or "third";
    • "(first |second |)part" matches "first part" or "second part" or just "part" - typical use of an empty alternative (note that space is in alternative to not require it before just "part");
  3. quantifier over alternative means that you should use parentheses to repeat an alternative set:
    • "first|second*" matches "first" or "secon" followed by zero or more "d" like "secondddddd";
    • "(first|second)*" matches "first" or "second", repeated zero or more time in any order, like "firstsecondfirstfirst". Note that quantifiers repeat the whole alternative, not a definite selection from it, i.e.:
    • "(1|2){2}" matches "11" or "12" or "21" or "22", not just "11" or "22";
    • "1{2}|2{2}" matches "11" or "22" only.

An internal structure of regular expression can be viewed well on the {{arbol de sintaxis (herramienta de autoría). The operators that executed first are placed lower on the tree (or to the right on horizontal view), the operator that executed last is the root of the tree. You can compare tree and explaining graphs for the examples above in herramientas de autoría if this section doesn't seems too clear to you. Remember, that "execution" of regular expression operator means linking them in the string: sequental, alternative linking, or repeating.

Anclaje

El anclaje is used to set restrictions on the matching process by using simple assertions:

  • if a regular expression starts with the ^ the match should start at the start of the student's response;
  • if a regular expression ends with the $ the match should end at the end of the student's reponse;
  • otherwise a regex match can be found anywhere inside a student's response.

Note that simple assertions are concatenated with regex and concatenation has precedence over alternative, this makes it's usage slightly tricky:

  • "^start|end$" will match "start" from the start of the string or "end" at the end of it;
  • "^(start|end)$" using brackets to match exactly with "start" or "end";
  • "^start$|^end$" is another way to get exact match (all top-level alternatives are anchored).

If you set the exact matching options to "yes" (which is the default value), the question will add ^ and $ in each regular expression for you (it will not affect subpattern usage). However, you may prefer to use some non-anchored regexes to catch common errors and give feedback and use manually anchored expressions for grading.

Referencia sobre expresiones regulares

Operandos

Here's an incomplete list of operands that define character sets.

  1. Simple characters (with no special meaning) match themselves.
  2. Escaped special characters match corresponding special characters. Escaping means preceding special characters by the backslash "\". For example, the regex "\|" matches the string "|", the regex "a\*b\[" matches the string "a*b[". Backslash is a special character too and should be escaped: "\\" matches "\".
    • full list of characters needs escaping \ ^ $ . [ ] | ( ) ? * + { }
    • NOTE! when you are unsure whether to escape some character, it is safe to place "\" before any character except letters and digits. Do not escape letters and digits unless you know what you are doing - they get special meaning when escaped and lose it when not.
    • If you have too many characters that need escaping in some fragment, you can use \Q ... \E sequence instead. Anything between \Q and \E is treated literally as characters:
      • "\Q^(abc)$\E." matches "^(abc)$" followed by any character - there are NO simple assertions and subpatterns;
      • "\Q^(abc)$." matches "^(abc)$." because there is no "\E" and all characters after "\Q" are treated as literals till the end of the regex.
  3. Dot meta-character (".") matches any possible character (except newline, but students can't enter it anywhere), escape it "\." if you need to match a single dot. Loses it's special meaning inside character class.
  4. Character classes match any character defined in them. Character classes are defined by square brackets. The particular ways to define a character class are:
    • "[ab,!]" matches "a", "b", "," or "!";
    • "[a-szC-F0-9]" contains ranges (defined by a hyphen between 2 characters) "a-z", "C-F" and "0-9" mixed with the single character "z", it matches any character from "a" to "s", "z", from "C to "F" and from "0" to "9";
    • "[^a-z-]" starts with the "^" that means a negative character set: it matches any character except from "a" to "z" and "-" (note that the second hyphen is not placed between 2 characters so defines itself);
    • "[\-\]\\]" contains escaping inside a character set: it matches "-", "]" and "\", other characters loose their special meaning inside a character set and can be be not escaped, but if you want to include "^" in a character set it shouldn't be first there;
  5. Escape sequences for common character sets (can be used both inside or outside character classes):
    • "\w" for any word character (letter, underscore or digit) and "\W" for any non-word character;
    • "\s" for any space character and "\S" for any non-space character;
    • "\d" for any digit and "\D" for any non-digit.
  6. Unicode properties are special escape-sequences "\p{xx}" (positive) or "\P{xx}" (negative) for matching specific unicode characters which could be used both inside or outside character classes (the complete list of "xx" variations can be found at found at http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm):
    • "\p{Ll}" matches any lowercase letter;
    • "\P{Lu}" matches any non-uppercase letter.
  7. POSIX character classes are used for the same purpose as unicode properties (and complete list of them can be found on the Internet too), but may not work with non-ASCII characters. They are allowed only inside character classes:
    • "[[:alnum:]]" matches any alpha-numeric character;
    • "[[:^digit:]]" matches any non-digit chararcter.
  8. Simple assertions - they are not characters, but conditions to test, they don't consume characters while matching, unlike other operands (have those meaning only outside character classes):
    • "^" matches in the start of the string, fails otherwise;
    • "$" matches in the end of the string, fails otherwise;
    • "\b" matches on a word boundary, i.e. either between word (\w) and non-word (\W) characters, or in the start (end) of the string if it starts (ends) with a word character;
    • "\B" matches not on a word boundary, negative to "\b".

Still, a pattern that matches only one character isn't very useful. So here come the operators that allow us to define an expression that matches strings of several characters.

Operadores

Here's a list of the common regex operators:

  1. Concatenation - so simple binary operator that doesn't require any special character to be defined. It is still an operator and has it's precedence, which is important if you want to understand where to use brackets. Concatenation allows you to write several operands in sequence:
    • "ab" matches "ab";
    • "a[0-9]" matches "a" followed by any digit, for example, "a5"
  2. Alternative - a binary operator that lets you define a set of alternatives:
    • "a|b" matches "a" or "b";
    • "ab|cd|de" matches "ab" or "cd" or "de";
    • "ab|cd|" matches "ab" or "cd" or emptiness (useful as a part in more complex expressions);
    • "(aa|bb)c" matches "aac" or "bbc" - using parentheses to outline alternative set;
    • "(aa|bb|)c" matches "aac" or "bbc" or "c" - typical usage of the emptiness;
  3. Quantifiers - an unary operator that lets you define repetition of something used as its operand:
    • "x*" matches "x" zero or more times;
    • "x+" matches "x" one or more times;
    • "x?" matches "x" zero or one times;
    • "x{2,4}" matches "x" from 2 to 4 times;
    • "x{2,}" matches "x" two or more times;
    • "x{,2}" matches "x" from 0 to 2 times;
    • "x{2}" matches "x" exactly 2 times;
    • "(ab)*" matches "ab" zero or more times, i.e. if you want to use a quantifier on more than one character, you should use parentheses;
    • "(a|b){2}" matches "aa" or "ab" or "ba" or "bb", i.e. it is a repeated alternative, not a repetition of "a" or "b".

Subpatrones y referencias anteriores (Subpatterns and backreferences)

Subpatterns are operators that remember substrings captured by the regex. The simplest way to define a subpattern is to use parentheses: the regex "a(bc)d" contains a subpattern "bc". Subpatterns are numerated from 0 for the whole regex and counted by opening parentheses. That "(bc)" subpattern is the 1st. If we write, say, "a(b(c)(d))e" - there are subpatterns "bcd" which is 1st, "c" which is 2nd and "d" which is 3rd. Subpatterns are usually used with backreferences which, too, have numbers. Backreferences are operands that match the same strings which are matched by the subpatterns with the same numbers. The simplеst syntax for backreferences is a slash followed by a number: "\1" means a backreference to the 1st subpattern. The regular expression "([ab])\1" matches strings "aa" and "bb", but neither "ab" nor "ba" because the backreference should match the same character as the subpattern did. Constider a little example: declaration and initialization of an integer variable in C programming language:

  • "int ([_\w][_\w\d]*); \1 = -?\d+;" matches, for example, "int _var; _var = -10;". Of course, there can be any number of spaces between "int", variable name etc, so a more correct regex will look like:
  • "\s*int\s+([_\w][_\w\d]*)\s*;\s*\1\s*=\s*-?\d+\s*;\s*" - this will match, say, " int var2  ; var2=123  ; ". Looks a bit frightning, but it is easier to write this regex once than to try understand it after.

Finally, instead of just numbers, subpatterns and backreferences can have names via a little more complicated syntax:

  1. "(?<name1>...)" means a subpattern with name "name1";
  2. "(?'name2'...)" means a subpattern with name "name2";
  3. "(?P<name3>...)" means a subpattern with name "name3";
  4. "\k<name4>" means a backreference to the subpattern named "name4";
  5. "\k'name5'" means a backreference to the subpattern named "name5";
  6. "\g{name6}" means a backreference to the subpattern named "name6";
  7. "\k{name7}" means a backreference to the subpattern named "name7";
  8. "(?P=name8)" means a backreference to the subpattern named "name8".

This is very useful when you work with complicated regexes and often modify it by adding or removing subpatterns - names stay the same.

Nombres y números de subpatrones duplicados

There is a useful syntax when combining subpatterns with alternation. If you create a group "(?|...)" than every alternative inside that group will have the same subpattern numeration. Consider the regex "(?|(a(b))|(c(d)))" - there are 2 alternatives with 2 subpatterns in each. Subpatterns "ab" and "cd" are 1st ones, "b" and "d" are 2nd ones.

LLamadas a subpatrón

Another way to use a subpattern is to call it. When hitting a subpattern call, the matching engine goes to the beginning of the target subpattern, and then starts to match it over again, until its end (not the end of the whole regex). If a subpattern call is placed outside the subpattern it refers to, it is almost equivalent to copy-pasting the subpattern, except its number (it will stay the same).

The most common usage of subpattern calls is the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without recursive subpattern calls, it is impossible to handle an arbitrary nesting depth. Note that regex for arbitrary nested parentheses is quite complex and actively using them may get you regexes quite obscure. Preg question type provide several plantillas to make regexes with parenthesis more readable and easier to write.

The syntax of a subpattern call is:

  • (?R) recursive call fo the whole pattern
  • (?n) call subpattern by absolute number
  • (?+n) call subpattern by relative number
  • (?-n) call subpattern by relative number
  • (?&name) call subpattern by name
  • (?P>name) call subpattern by name
  • \g<name> call subpattern by name
  • \g'name' call subpattern by name
  • \g<n> call subpattern by absolute number
  • \g'n' call subpattern by absolute number

The first one is explicitly recursive. The rest of the variants cause recursion if placed inside the subpattern they refer to, for example: "a(b(?1)?c)d" contains recursion, "a(bc)(?1)d" does not.

When using the finite state automata engine, subpattern calls behave slightly different from PCRE in that the called subpatterns are NOT treated as atomic groups. Generally the behaviour implemented in Preg question type is more intuitive and helpful. Please take a look at PCRE docs for more information.

Subpatrones condicionales

Conditional subpatterns allow to write "if-then-else" alike constructions. Basically a conditional subpattern consists of a condition, a positive branch and an optional negative branch. If there is no explicit negative branch, it is implied to be empty, like (?:). General syntax is: "(?(condition)yes-pattern)" or "(?(condition)yes-pattern|no-pattern)".

The more specific options are:

  • (?(n)... absolute reference condition - is the n'th subpattern captured?
  • (?(+n)... relative reference condition
  • (?(-n)... relative reference condition
  • (?(<name>)... named reference condition - is the subpattern with the given name captured?
  • (?('name')... named reference condition
  • (?(name)... named reference condition
  • (?(R)... overall recursion condition - if there is no subpattern named 'R', the condition is true if a recursive call to the whole pattern or any subpattern has been made
  • (?(Rn)... specific group recursion condition - the condition is true if the most recent recursion is into the n'th subpattern
  • (?(R&name)... specific recursion condition - the condition is true if the most recent recursion is into the subpattern with the given name
  • (?(DEFINE)... define subpattern for reference
  • (?(assert)... complex assertion condition - the condition is true if the assert (positive/negative lookahead/lookbehind) matches

At "top level", all these recursion test conditions are false.

The latter type of conditional subpatterns is not yet supported by the FA engine.

Aseveraciones complejas (Complex assertions)

Assertions about some part of the string don't actually go into matching text, but affect the matching occurrence:

  • positive lookahead assertion "a+(?=b)" matches any number of "a" ending with "b" without including "b" in the match;
  • negative lookahead assertion "a+(?!b)" matches any number of "a" that is not followed by "b";
  • positive lookbehind assertion "(?<=b)a+" matches any number of "a" preceeded by "b";
  • negative lookbehind assertion "(?<!b)a+" matches any number of "a" that is not preceeded by "b".

Modificadores locales de sensibilidad a minúsculas/MAYÚSCULAS

Starting from Preg 2.1 you can set case-(in)sensitivity for parts of your regular expressions by using the standard syntax of Perl-compatible regular expressions:

  • "(?i)" will turn case-sensitivity off;
  • "(?-i)" will turn case-sensitivity on.

This affects general case-sensitivity, which is choosen on the question level. So you can make some answers case-sensitive and some not, or even do this for the parts of answers. For example you can set question as "use case" and have a 50% answer starting with "(?i)" to grade lesser when the case doesn't match, but everything else is correct.

When placed in parentheses, local modifiers work up to the closest ")". When placed on the top level (not inside parentheses) they work up to the end of the expression, i.e. with case sensitivity on for the question:

  • "abc(de(?i)gh)xyz" will have the bold part case-insensitive;
  • "abc(de)(?i)ghxyz" will have the bold part case-insensitive.

Reportes de errores

Native PHP preg extension functions only report if there is an error in regular expression or not, so PHP preg extension engine can't tell you much about the error.

FA' engine uses a custom regular expression parser, so it supports advanced error reporting. The are several classes of potential errors:

  • more than two top-level alternatives in a conditional subpattern "(?(?=f)first|second|third)";
  • unopened closing parenthesis "abc)";
  • unclosed opening parenthesis of any sort (subpatterns, assertions, etc) "(?:qwerty";
  • quantifier without an operand, i.e. at the start of (sub)expression with nothing to repeat "+" or "a(+)";
  • unclosed brackets of character classes "[a-fA-F\d";
  • setting and unsetting the same modifier at the same time "(?i-i)";
  • unknown unicode properties "\p{Squirrel}";
  • unknown posix classes "[[:hamster:]]";
  • unknown (*...) sequence "(*QWERTY)";
  • incorrect character set range "[z-a]";
  • incorrect quantifier ranges "{5,3}";
  • \ at end of pattern "ab\";
  • \c at end of pattern "ab\c";
  • invalid escape sequence;
  • POSIX class ouside of a character set "[:digit:]";
  • reference to unexisting subpattern (abc)\2;
  • unknown, wrong or unsupported modifier "(?z)";
  • missing ) after comment "(?#comment";
  • missing conditional subpattern name ending;
  • missing ) after (?C;
  • missing subpattern name ending;
  • missing backreference name ending;
  • missing backreference name beginning;
  • missing ) after control sequence;
  • wrong conditional subpattern number, digits expected;
  • assertion or condition expected "(?()a|b)";
  • character code too big "\x{ffffffff}";
  • character code disallowed "\x{d800}";
  • invalid condition (?(0);
  • too big number in (?C...) "(?C256)";
  • two named subpatterns have the same name "(?<name>a)(?<name>b)";
  • backreference to the whole expression "abc\g{0}";
  • different subpattern names for subpatterns of the same number "(?|(?<name1>a)|(?<name2>b))";
  • subpattern name expected "(?<>abc)";
  • \c should be followed by an ascii character "\cй";
  • \L, \l, \N{name}, \U, and \u are unsupported;
  • unrecognized character after (?<.

Autores:

  1. Idea, diseño, código del tipo de pregunta y comportamientos, dar pistas, reporte de errores, prueba de expresión regular (herramientas de autoría) - Oleg Sychev.
  2. análisis de regex, motor de concordancia FA regex, prueba de concordadores, respaldo y restauración, soporte unicode, plantillas - Valeriy Streltsov.
  3. Gráfica explicatoria (herramienta de autoría) - Vladimir Ivanov.
  4. {Arbol de sintaxis (Syntax tree) (herramienta de autoría) - Grigory Terekhov.
  5. Descripción de regex (herramienta de autoría) - Dmitriy Pahomov.

Gustosamente aceptaremos a contribuyentes y probadores (vea la sección sobre planes de desarrollo) - todavía hay mucho más trabajo por hacer que tiempo que tenemos. Agradecimientos a:

  • Joseph Rezeau - por ser un devoto probador de las versiones del tipo de pregunta Preg y por ser el autor original de muchas de las ideas que hemos implementado en el tipo de pregunta preg;
  • Tim Hunt - por sus amables y útiles respuestas y comentarios que nos ayudaron a escribir esta pregunta, y también por trabajo conjunto en el código de extra_question_fields y de extra_answer_fields, que nos es útil a muchos desarrolladores de tipos de preguntas;
  • Bondarenko Vitaly - por la conversión de un basto conjunto de pruebas de concordancia de expresión regular.

Usted también, podría ayudarnos mucho - sin importar como use Usted Preg y sus capacidades.

Como contribuir

This project is free software, so it's hard to get any feedback. You shouldn't expect to get software which ideally suits you needs without telling anyone about these needs, or encouragement, or some non-difficult support to the authors. Sometimes as little as writing where you work and how you use (or what prevents you from using) Preg question type may help a lot.

This software is considered a scientific project and such things could be really useful and appreciated:

  • an evidence that the results of our work (i.e. Preg questoin type) are really useful to people and were used in production environment;
  • a cooperative work to research it's effectiveness for various applications - basically you need to write about how you use Preg and make some survey with you teachers and/or students about it - but it can include co-authoring a conference thesis or a journal article;
  • cooperating in writing article or help publishing it in English-language journals (information and help in grants for further work is welcome too).

If you consider any way of helping, do not hesitate to write me about it and ask any questions about details. You may receive individual help during such work too (for example, doing cooperative research I may give you tips how to improve you regexes, etc).

I am a high school teacher, researcher and programmer who must do much on his main paid job and have not much free time to spend on developing this question type. If you could help me in some ways, I may be able to spend more time and effort doing this though. Some examples:

  • publishing a thesis or paper describing your usage of the Preg question I could give reference for would improve rating of the project there and my rating as a researcher/developer, so please publish and let me know the reference if you feel grateful for this software;
  • if you would take some more work and organise publishing a paper (or at least thesis) with me as co-author, that would help even more - please inform me immediately if you consider this;
  • if publishing is hard, you could just write me what your organisation is and how you use preg - that'll help and I would be able to better determine what should be done next;
  • join the testing efforts - there are many settings in the question, and regexes can be quite complex, so it's hard to do all testing by developers themselves.

Planes de desarrollo

There is no definite shedule or order of the development for those features - it depends on the available time and developers. Many features require complex code to achieve the results. If you want to help us with a specific feature, please contact the question type maintainer (Oleg Sychev) using http://moodle.org messaging.

  • Templates editor, allowing users to create custom templates
  • Support for complex assertions
  • Support for approximate matching to catch typos in answers
  • Improve a set of herramientas de autoría to make writing regular expressions easier
  • Add more languages for next lexem hinting
  • Develop more help and examples for the people that don't know much about regular expressions.

ES preg missing dependencies.png

ES preg pathtodot.png

ES regex authoring tools editor.png

ES qtype preg authortools1.png

ES 1 qtype preg authortools9.png

ES 2 qtype preg authortools9.png

ES 3 qtype preg authortools9.png