Note: You are currently viewing documentation for Moodle 3.3. Up-to-date documentation for the latest stable version of Moodle is probably available here: Preg question type.

Preg question type: Difference between revisions

From MoodleDocs
(copied from English 3.8)
 
(109 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Preg question type is a question type using regular expression pattern matching to find if studen response is correct. It is use Perl-compatible regular expressions dialect. For detailed description of regular expression syntax see http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm
{{Infobox plugin
|type = question type
|entry = https://moodle.org/plugins/qtype_preg
|set = https://moodle.org/plugins/browse.php?list=set&id=28
|tracker = https://bitbucket.org/oasychev/moodle-plugins/issues?status=new&status=open
|discussion = https://bitbucket.org/oasychev/moodle-plugins
|maintainer = [[user:Oleg Sychev|Oleg Sychev]]
|float = right
}}
{{Questions}}
{{Note|Even though this plugin is currently (February 2020) listed as available for Moodle branches 2.3 to 3.1 only, the plugin Stats tab shows that half of the more than 30 sites that have this plugin installed, are Moodle branches from 3.2 to 3.8. This officially not-supported branches can have a working regex plugin, but they will not have the 'Authoring tools' available. The plugin developers are working in order to update this plugin and fix the JavaScript problem that causes this issue. If you enable a DEVELOPER level of debugging, You may see some warnings that do not represent any danger to your server. They should be fixed in future releases.}}
 
Preg is a question type that uses regular expressions (regexes) to check student's responses (though you can use it without regexes for its hinting features). Regular expressions give vast capabilities and flexibility to both teachers when making questions and students when writing answers to them. [[#Ways to use Preg questions and this docs|First section]] should guide you to using of this docs, please use it with discretion. More details about regex syntax can be found at http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm. There are many good regex manuals, I'm not going to repeat them here.


Authors:
Authors:
# idea, design, question type code, regex parsing and error reporting - Oleg Sychev;
# Idea, design, question type and behaviours code, hinting, error reporting, regular expression testing (authoring tool) - Oleg Sychev.
# regex parsing, DFA regular expression matching engine - Dmitriy Kolesov.
# Regex parsing, FA regex matching engine, matchers testing, backup&restore, unicode support, templates - Valeriy Streltsov.
We would gladly accept testers and contributors (see [[#Development plans|development plans]] section) - there is still more to be done than we have time.
# Assertions support for FA matcher - Elena Lepilkina.
 
# Explaining graph (authoring tool) - Vladimir Ivanov.
For now new code with all this functionality located in the HEAD branch of preg question type (you could also download it from the [http://moodle.org/mod/data/view.php?d=13&rid=1901&filter=1 Modules and Plugins database] using 'latest version' link. It is works with Moodle 2.0. The code considered BETA quality - use it with care! If you find any bugs please report them on the tracker.
# Syntax tree (authoring tool) - Grigory Terekhov.
 
# Regex description (authoring tool) - '''looking for maintainer'''.
===Understanding expressions===
# Assertions support - Elena Lepilkina.
The regular expressions - as any '''expressions''' - are just a bunch of '''operators''' with their '''operands'''. Don't worry - you all learned to master arithmetic expressions from chilhood and regular ones are just as easy - if you look on them from the right angle. Learn (or recall) only 4 new words - and you are a master of regexes with very wide possibilities. Don't find that angle, and regular expressions could forever remain vast menace where only a few steps are sure. Let's go?
We would gladly accept testers and contributors (see the [[#Development plans|development plans]] section) - there is still more work to be done than we have time.  
Thanks to:
*  Joseph Rezeau for being devoted tester of Preg question type releases and being the original author of many ideas that have been implemented in Preg question type;
* Tim Hunt - for his polite and useful answers and commentaries that helped writing this question, also for joint work on extra_question_fields and extra_answer_fields code, that is useful to many question type developers;
* Bondarenko Vitaly - for conversion of a vast set of regular expression matching tests.
* Dmitriy Pahomov - for been first author of Regex description (authoring tool)
You, too, could aways [[#The ways to give back|help us]] a lot - regardless of the way you use Preg and your capabilities.


Look at a simple math expression: '''x+y*2'''. There are two '''operators''' there: '+' and '*'. The '''operands''' of '*' is 'y' and '2'. The '''operands''' of '+' is 'x' and result of 'y*2'. Easy?


Thinking about that expression deeper we could found, that there is a definite '''order of evaluation''' there, governed by operator's '''precedence'''. '*' has a precedence over '+', so it is evaluated first. You could change order of evaluation using brackets: '''(x+y)*2''' will evaluate '+' first and multiply it's results on the 2. Still easy?
==Ways to use Preg questions and this docs==


One more thing we should learn about operators: their '''arity''', which is just the number of operands required. In example above '+' and '*' are '''binary''' operators - they both take two operands. Most arithmetic operators are binary, but minus has '''unary''' (single operand) form, like in this equation: '''y=-x'''. Note that unary and binary minuses work differently.
===I don't (want to) know anything about regular expressions but next word (character) hinting seems useful===
Then you can use Preg question type just as Shortanswer with advanced hinting, without any knowledge about regular expressions. To do this, you need to choose
* '''Notation''' => Moodle shortanswer
* '''Engine''' => Finite state automata
* '''Exact matching''' => Yes


Now any epxression are just lego game, where you set a sequence of '''operators''' with correct number of '''operands''' for each ('''arity'''), taking heed of their order of evaluation using their '''precedence''' and brackets. Arithmetics expressions are for evaluating numbers. Regular expressions are for finding pattern matches in the strings, so they naturally use another operands and operators - but they are governed by the same rules of precedence and arity.
After that, you can just copy answers from you shortanswer questions. You may want to read the section about [[#Hinting|hinting]] to understand more about hinting settings.


===Regular expressions===
===I have a vague knowledge of regular expressions, but want to use pattern matching===
The goal of a regular expressions is a pattern matching in the strings. So their '''operands''' are characters or characters set. '''A''' is a regular expressions too and it matches with single character 'A'. There are several way to define a character set, described below. Special characters, used to write operators,must be '''escaped''' when used as operands - preceded by backslash. Math expressions never had escaping problems since their operands (numbers, variables) are constructed from different characters than operators (+,- etc), but setting pattern for matching you should be able to use any character as operand.
If writing regular expressions is hard for you, but you want to use their strength as patterns, authoring tools may help you a lot to create your questions. The tools show you the meaning of your regex in different ways: internal structure of the expression (syntax tree), visual path of matching (explaining graph) and a text description. They also allow you to test you regex against several strings and see if it works as expected. Experiment and play with your regexes, see corresponding changes in the authoring tools, and eventually you'll get the regex you want.


Still, pattern that match only with one character isn't very useful. So there comes '''operators''' that allows us to define expression matching with string of a several characters.
Read the section on [[#Authoring tools|authoring tools]], than (probably after some experimenting with tools on your own) a start of section about [[#Understanding regular expressions|understanding regular expressions]] (this is optional, but may be interesting and help a lot). You should also read a section about [[#How Preg questions work|question working]] to better understand various settings and how they affects you questions.


====Operands====
===I can make some effort to learn regular expressions well and be able to do anything they allow===
You could use those operands in you expressions:
Well, you don't know regexes but want to understand them and create complex expressions easily. Then, instad of blunt trying, you better spend some time and effort reading and understanding [[#Understanding regular expressions|this section]]. Then read slightly about [[#Authoring tools|authoring tools]] and use them to experiment creating regexes. With these tools you can see if you really understand them well and they behave as expected. Syntax tree may be especially useful when you try to get the right meaning of ''precedence'' and ''arity''. After you understand  the principles of regexes well, read sections about [[#How Preg questions work|question working]] and [[#Regular expressions reference|regular expression reference]] (to know your possibilities, don't bother to understand or remember them all - just look there periodically for something new to learn). Now you should be able to write regexes without much use of authoring tools, except the testing tool to test your expressions.
# ''simple characters'' match with themselves
# ''escaped special characters'' if you need to use character with special meaning (like |, * or bracket) just as usual character to match you should preceed it by backslash: '''a\*''' matches with a* (while '''a*''' matches with a zero or more times), backslash is a special character too and should be escaped '''\\''' matches with \
# ''character classes'' you could specify a number of possible characters in one place in square brackets:
#* '''[ab,!]''' matches with a or b or , or !
#* ''ranges'': '''[a-szC-F0-9]''' you could specify ranges for letters and digits in character classes, mixing them with single characters
#* ''negative character classes'' starts with ^ '''[^ab]''' means any characters except a and b
#* ''escaping inside character classes'': '''[\-\]\\]''' match with - or ] or \, other characters lost their special meaning inside character class and shoudn't be escaped, but if you want to include ^ in the character class it should not be first
# ''dot meta-character'' '''.''' match with any possible character (except newline, but student coudn't enter it anywhere), you should escape dot '''\.''' if you need to match single dot.


====Operators====
===I know regular expressions well enought to write them on my own without further guidance===
Most common regular expression operators used (could anyone help expand descriptions and examples please?):
You should read about [[#How Preg questions work|question working]] to understand various settings and question behaviour under them. You also may be interested in regex testing in the [[#Authoring tools|authoring tools]] section. Finally, [[#Regular expressions reference|regular expression reference]] may be of some use for you.
# ''concatenation'' - so simple '''binary''' operator that is doesn't have any character at all. Still it is an operator and has it's precedence, which is important if you want to understand where to use brackets. Concatenation allows you to write several operands in sequence:
#* '''ab''' matches with ab
#* '''a[0-9]''' matches with a followed by any digit
# ''alternative'' - '''binary''' operator that lets you define a set of alternatives:
#* '''a|b''' mean a or b
#* '''ab|cd|de''' mean ab or cd or de
#* empty alternative: '''ab|cd|''' mean ab or cd or emptiness (useful as a part more complex expressions)
#* '''(aa|bb)c''' mean aac or bbc - use brackets to outline alternative set
#* '''(aa|bb|)c''' mean aac or bbc or c - typical use of emptiness
# ''quantifiers'' - '''unary''' operator that lets you define repetition of a character (or regular expression) used as it's operands:
#* '''x*''' mean x zero or more times
#* '''x+''' mean x one or more times
#* '''x?''' mean x zero or one times
#* '''x{2,4}''' mean x from 2 to 4 times
#* '''x{2,}''' mean x two or more times
#* '''x{,2}''' mean x from 0 to 2 times
#* '''x{2}''' mean x exactly 2 times
#* '''(ab)*''' mean ab zero or more times, i.e. if you want to use quantifier on more than one character, you should use brackets
#* '''(a|b){2}''' mean aa or ab or ba or bb, i.e. it is repeated alternative, not selection one alternative and repeating it


====Precedence and order of evaluation====
'''Quantifier''' has precedence over '''concatenation''' and '''concatenation''' has precedence over '''alternative'''. Let's look what it means:
# ''quantifier over concatenation'' means quantifiers are executed first and without brackets would repeat only single character:
#* '''ab*''' matches with a followed with zero or more b's
#* changing this using brackets allows us define a string repetition: '''(ab)*''' matches with ab zero or more times
# ''concatenation over alternative'' means you could define multi-character alternatives without brackets (for single character alternatives use character classes, not alternative operators) but should use brackets when you need to add something to the alternative set:
#* '''ab|cd|de''' matches with ab or cd or de
#* '''(aa|bb)c''' matches with aac or bbc - use brackets to outline alternative set
#* '''(aa|bb|)c''' matches with aac or bbc or c - typical use of an empty alternative
# ''quantifier over alternative'' means you should use brackets to repeat an alternative set (not the last character in it):
#* '''ab|cd*''' matches with ab or c followed with zero or more d's
#* '''(ab|cd)*''' matches with ab or cd, repeated zero or more time in any order, like ababcdabcdcd etc
#* note that quantifiers repeats alternative, not the definite selection from it, i.e.:
#*# '''(a|b){2}''' matches with aa or ab or ba or bb, not just aa or bb
#*# use '''a{2}|b{2}''' to match aa or bb only


====Assertions====
==How Preg questions work==
''Assertions'' are assertions about some part of the string that doesn't actually goes into matching text, but affects whether matching occur or not.
Basically, this question type is an extended version of Shortanswer. It extends its features in several different ways (you could use them in almost any combination):
* ''positive lookahead assert'' '''a+(?=b)''' matches with any number of a ending with b without including b in the match
* '''Pattern matching''' - using regular expressions you can create powerful patterns describing possible students answers
* ''negative lookahead assert'' '''a+(?!b)''' matches with any number of a that is not followed by b
* '''Hinting''' - when students are stuck doing the question, you may allow them to ask for a next correct word (lexem) or a character (with possible penalty)
* ''positive lookbehind assert'' '''(?<=b)a+''' matches with any number of a preceeded by b
* ''negative lookbehind assert'' '''(?<!b)a+''' matches with any number of a that is not preceeded by b
===Settings affecting question work===
 
Sets the case sensitivity for all regular expressions you specify as answers. Note that you can also [[#Local case-sensitivity modifiers|set the case sensitivity for regex parts]].
====Matching====
'''Matching''' means finding a part of the student answer (or a whole answer) that suited the regular expression. This part called a '''match'''.
'''Exact matching''' affects the question in the following way:
 
; ''Yes'' : the ''entire'' student's response, from the first to the last letter, should match your regular expression
You should enter regular expressions as '''answers''' to the question without modifiers or enclosing characters (modifiers would be added for you by question - '''u''' added always and '''i''' in case-insensitive mode). You should also enter one correct response (that matches at least one 100% grade regular expression) to be shown to the student as '''correct answer'''. The question would get use all regular expressions in order to find first full match (full for expression, but not necessary all response - see [[#Anchoring|anchoring]]) and give a grade from it. If there is no full match and engine supports partial matching (see [[#Hinting|hinting]]) than partial match that is shortest to complete would be choosen (for displaying a hint, zero grade is given) - or the longest one, if engine coudn't tell which one would be shortest to complete.
; ''No'' : student's response can just contain a ''part'' that matches your regex: for example, if the correct answer is "whole" then "the whole string" will be a correct student response
 
====Anchoring====
You still can set some of your regexes to match the whole student's response using [[#Anchoring|special regex syntax]].
Anchoring sets restrictions on the matching process:
* if a regular expression starts with '''^''' the match should start at the start of the student's response;
'''Notations''' specify the "language" of your answers.
* if a regular exhression ends with '''$''' the match should ends at the end of the student's reponse;
; ''Regular expression'' : a usual notation for regular expression. Precisely it is Perl-compatible regex dialect. You may write regex on multiple strings for better reading - line breaks will be ignored.
* otherwise regular expression match could be contained anywhere inside student response.
; ''Regular expression (extended)'' : useful for really complex regexes. It is similar to the PHP 'x' modifier. It will ignore any unescaped whitespaces in you regexes, that are not inside character classes (use \s instead) - so that you may freely format you regexes with spaces. It will also ignore line breaks with one useful exception: everything after '#' character untill the end of string is treated as commentary (# should not be escaped and should not be inside a character class).
 
; ''Moodle shortanswer'' : use it to avoid regex syntax at all: just copy answers from you shortanswer questions. The '*' wildcard is supported. By choosing the FA engine you can get access to the hinting features. You can skip all that is said on regexes there, but be sure to read the [[#Hinting|hinting]] section to understand various settings you can alter to configure you question hinting behaviour.
If you set '''exact matching''' options to yes (default setting), the question would add ^ and $ in each regular expression for you. However, you may prefer to use some non-anchored regexes to catch common errors and give feedback while using manually anchored expression for grading.
'''Matching engine''' specifies the program module that performs the regex matching. There is no 'best' matching engine - it depends on the features you want to use. Engines have different stability and offer different features to use.
; ''PHP preg extension'' : should be used when you '''don't need hinting''' and '''other engines are rejecting you expressions''' as too difficult or you encounter bugs in them. It is based on the native PHP preg_ functions. It supports 100% perl-compatible regex features, it is very stable and thoroughly tested. But it doesn't support partial matching, so (unless we storm PHP developers to add support of partial matching) there is '''no hinting'''. However it supports subpattern capturing. Choose it when you need complex regex features that other engines don't support.
; ''Finite state automata (FA)'' : can be used to '''perform hinting''' for your students. FA engine is a custom PHP code, it allows many (but not all) regex features and is thoroughly tested (it passes all tests from AT&T testregex suite and most tests from PCRE testinput1, testinput4 suite for the features it supports, which means quite much), but still may contain bugs in rare cases. Unsupported features for now are lookaround assertions and some types of conditional subpatterns.


===Hinting===
===Hinting===
Some matching engines could support hinting (not easy thing to do on the PHP at all) in adaptive mode.
Hinting is supported by the FA engine in adaptive and interactive behaviours.


Hinting starts with '''partial matching'''. When a student enters a partially correct answer, partial matching could find that response starts matching and on some character broke it. Consider you enter expression:
====Partial matching====
   '''are blue, white(,| and) red'''
Hinting starts with '''partial matching'''. By partially correct response we understand a string that starts with correct characters (matching your regex) but on some character the match breaks. Assume you entered the regex
and student answered
   "'''are blue, white(,| and) red'''"
   they are blue, vhite and red
and a student answered
Partial matching will find that partial match is
   "they are blue, vhite and red"
   are blue,  
In this situation the partial match is
Remember, the regular expresion in unanchored so the match shouldn't start with the start of the student response. While using just partial matching the student will be shown correct and incorrect parts:
   "are blue, "
Note that the regex is unanchored ("Exact match" is set to "No") so the match may not start with the first character of the student's response (like in the example above: "they " is skipped). While using just partial matching the student will see the correct and incorrect parts:
   <span style="text-decoration:line-through; color:#FF0000;">they </span><span style="color:#0000FF;">are blue, </span><span style="text-decoration:line-through; color:#FF0000;">vhite and red</span>
   <span style="text-decoration:line-through; color:#FF0000;">they </span><span style="color:#0000FF;">are blue, </span><span style="text-decoration:line-through; color:#FF0000;">vhite and red</span>
When hinting is available, student will have '''hint''' button by pressing which he receive a hint with one next correct character, highlighted by background coloring:
 
  <span style="text-decoration:line-through; color:#FF0000;">they </span><span style="color:#0000FF;">are blue, </span><span style="background-color:#00FF00">w</span><span style="text-decoration:line-through; color:#FF0000;">vhite and red</span>
====General hinting rules====
You should typically set hint '''penalty''' more than usual question '''penalty''', because they are applied separately: usual penalty for an attempt without hinting, while hint penalty for an attempt with hinting.
Preg question type doesn't add hinted characters to the student's response (unlike the REGEXP question type), showing it separately instead for a number of reasons:
Preg question doesn't add hint character to the student's response (like regex question do it), showing it separately instead for a number of reasons:
# It is student's responsibility whether he wants to add hinted character to the his response (and some more possibly).
# it is student's responsibility whether he want to add hinted character to the his response (and some more possibly);
# It slightly facilitates thinking about a hint, since when the response is modified it is too easy to repeatedly press '''hint''', which is not usually a desirable behaviour.
# it slightly facilitates thinking about hint, since when the response is modified it is too easy to repeatedly press '''hint''', which is not a desirable behavour usually.
When possible, hinting chooses a character that leads to the shortest path to complete the match. Consider this response to the previous regular expression:
When possible (if question engine supports it), hinting choosing a character that leads to shortest path to complete a match. Consider this response to the previous regular expression:
   <span style="color:#0000FF;">are blue, white</span><span style="text-decoration:line-through; color:#FF0000;">; red</span>
   <span style="color:#0000FF;">are blue, white</span><span style="text-decoration:line-through; color:#FF0000;">; red</span>
There are two possible hint characters: ',' or ' '. The question will choose ',' since it leads to the shortest path to complete a match, while ' ' leads to a path 3 characters longer.
There are two possible hint characters: "," or " " (leading to the " and" path). The question will choose "," because it leads to the shortest path to complete the match, while " " leads to the path 3 characters longer.


It is possible that not all regular expressions will give 100% grade. Consider you add an expression for the students with bad memory:
It is possible that not all regular expressions will give 100% grade. Consider you added an expression for students with bad memory:
   '''are white(,| and) red'''
   '''are white(,| and) red'''
with 60% grade and feedback about forgetting ''blue''. You may not want hinting to lead student to the response
with 60% grade and feedback about forgetting ''blue''. You may not want hinting to lead student to the response
   are white, red
   are white, red
if he entered
if he entered
   are white, oh I forgot other colors.
   are white, oh I forgot the other colors.
'''Hint grade border''' controls this. Only regular expressions with grade greater or equal than hint grade border would be used for partial matching and hinting. If you set hint grade border to 1, only 100% grade regular expression would be used to hinting, if you set it to 0,5 regular expressions with 50%-100% grades would be used and 0%-49% would not. Regular expressions not used for hinting works only when they have a full match in the student response.
'''Hint grade border''' controls this. Only regular expressions with the grade greater or equal than the hint grade border will be used for partial matching and hinting. If you set hint grade border to 1, only 100% grade regular expression will be used for hinting, if you set it to 0,5 regular expressions with grades from 50% to 100% will be used for hinting and 0%-49% would not. Regular expressions not used for hinting work only when they have a full match with the student response.
 
====Next character hinting====
When next character hinting is available, student will have the '''hint next character''' button by pressing which he receives one next correct character, highlighted by background coloring:
  <span style="text-decoration:line-through; color:#FF0000;">they </span><span style="color:#0000FF;">are blue, </span><span style="background-color:#00FF00">w</span><span style="text-decoration:line-through; color:#FF0000;">vhite and red</span>
You should typically set the hint '''penalty''' more than usual question '''penalty''', because they are applied separately: usual penalty for an attempt without hinting, while hint penalty for an attempt with hinting.
 
====Next lexem (word) hinting====
'''Lexem''' means an atomic part of a language. For natural language a ''word'', a ''number'', a ''punctuation mark'' (or group of marks like '?!' or '...') are lexemes. For a programming language it can be a ''keyword'', a ''variable name'', a ''constant'', an ''operator''. Note that spaces are usually not considered to be lexems, but separators between them, since they don't have any particular meaning.
 
'''Next lexem hint''' will show student either completion of the current lexem (if partial match ends inside it) or next one (if student complete the current lexem). Like
  <span style="color:#0000FF;">are bl</span><span style="background-color:#00FF00">ue</span>
or
  <span style="color:#0000FF;">are blue</span><span style="background-color:#00FF00">,</span>
or
  <span style="color:#0000FF;">are blue,</span><span style="background-color:#00FF00"> white</span>
 
Preg question type, since the 2.3 release, allows usage of next lexem hinting using the ''formal languages block''. You should choose the language in which you expect a response for you question, since lexem borders are different for different languages. For now it supports these languages (there will be more):
; ''simple english'' : english language scanner recognize words, numbers and punctuation marks;
; ''C/C++ language'' : a programming language C (or C++);
; ''printf language'' : a special language for formatting strings in C/C++ programming language, you will have it disabled probably.
 
Administrator of the site can control what languages are available to the teachers, to avoid confusion. See the settings of the block "Formal languages" in the plugin settings menu.
 
Note that "lexem" typically isn't the word you would like your students to see on the hinting button. Each language define their own word for it. You can enter another word in the question description, if you don't like default ones.


===Subpattern capturing and feedback===
===Subpattern capturing and feedback===
Any pair of round brackets in the regular expressions are considered a '''subpattern''' and when doing matching engine (supporting subpatterns) remember ('''capture''') not only whole match, but it's parts corresponding to all subpatterns. Subpatterns can be nested. If subpattern is repeated (i.e. have quantifier), than only last match of all repeats will be captured. If you want to change order of evaluation without defining a subpattern to capture (which will speed up processing), you should use (?:  ) instead of just (  ). Asserts don't create subpatterns.
Any pair of parentheses in a regex are considered as a '''subpattern''' and when matching the engine remembers ('''captures''') not only the whole match, but its parts corresponding to all subpatterns. Subpatterns can be nested. If a subpattern is repeated (i.e. have quantifier), only last match of all repeats will be captured. If you want to change order of evaluation without defining a subpattern to capture (which will speed up processing), you should use (?:  ) instead of just (  ). Lookaround assertions don't create subpatterns.


Subpatterns are counted from left to right by opening brackets. Precisely '''0''' is the whole match, '''1''' is first subpattern etc. You could insert them in the ''answer's feedback'' using simple placeholders: '''{$0}''' is replaced by the whole match, '''{$1}''' by first subpattern value etc. That can improve the quality of you feedback. Placeholders won't work on the ''general feedback'' because different answers could have different number of subpatterns.
Subpatterns are counted from left to right by opening parentheses. Precisely '''0''' is the whole regex, '''1''' is first subpattern etc. You can insert them in the ''answer's feedback'' using simple placeholders: '''{$0}''' will be replaced by the whole match, '''{$1}''' by the first subpattern value etc. That can improve the quality of you feedbacks. Placeholders won't work in the ''general feedback'' because different answers can have different number of subpatterns.


'''PHP preg engine''' support full subpattern capturing. '''DFA''' engine coudn't do it, so you could use only {$0} placeholder working with DFA engine.
Let's look at a regex defining a decimal number with optional integral part:
 
Let's look at regex defining an decimal number with optional integral part:
  [+\-]?([0-9]+)?\.([0-9]+)
  [+\-]?([0-9]+)?\.([0-9]+)
It has two subpatterns: first capturing integral part, second - fractional part of the number.
It has two subpatterns: first capturing integral part, second - fractional part of the number.
You writed feedback:
If you wrote the feedback:
  The number is: {$0} Integral part is {$1} and fractional part is {$2}
  The number is: {$0} Integral part is {$1} and fractional part is {$2}
Then entering
Then a student entered
  123.34
  123.34
the student will see
He will see
  The number is: 123.34 Integral part is 123 and fractional part is 34
  The number is: 123.34 Integral part is 123 and fractional part is 34
If no integral part is given, {$1} will be replaced by empty string. There is no way (for now) to erase "Integral part is" under that circumstances - the placeholder syntax may become complex and prone to errors.
If no integral part is given, {$1} will be replaced by empty string. There is no way (for now) to erase "Integral part is" under that circumstances - the placeholder syntax may become complex and prone to errors.


===Error reporting===
===Looking for missing and misplaced things===
Native PHP preg extension functions only report if there error in regular expression or not, so '''PHP preg extension''' engine couldn't tell you much about what is error .
Joseph Rezeau's REGEXP question type has a '''missing words''' feature, allowing to define an answer that will work when something is absent in the answer (and give an appropriate feedback to the student).
 
Similar effect can be achieved with '''negative assertions''' combined with anchoring the matching start. The regular expression to look for the missing word '''necessary''' would be
  ^(?!.*\bnecessary\b.*)
where
* '''(?!.*\bnecessary\b.*)''' is a '''negative lookahead assertion''', that allows matching only if there is no word '''necessary''' ahead of some point in the string;
* '''^''' is an assertion too, that anchores the match to the start of the response (otherwise there would be places in response after the word "necessary", where matching is possible even if the word is present).
 
In case if the description is difficult to you, just surround regexp to be missing with '''^(?!''' and ''')'''. Don't try '--' syntax, that is specific to Jospeh Rezeau's REGEX question type!
 
You can also have a rough search for '''misplaced words''' (it will actually work only if anything else is correct) using syntax like this:
  (?!<I\s+)\bam\b(?!\s+victor)
This expression catches misplaced "am" in the sentence "I am victor" by first looking for "am" doens't have "I" before it ("(?!<I\s+)" part) and then "victor" after it ("(?!\s+victor)" part). "\s+" allows any number of spaces between words. If you want to catch the first (last) word (punctuation mark, etc) - then you should place simple assertions for start/end of string ("^" or "$") instead of words in related assertions. For instance to look for misplaced "I" you should write something like
  (?!<^)\bI\b(?!\s+am)
which looks for "I" that is not preceded by start of the string and not followed by "am".
 
Note, that if you have several answers to catch missing and misplaced things, only one will actually work for any given student response.
 
Since the Preg 2.3 release you can combine hints and catching missing words. But you should be sure that the answers that look for missing things (and other to give specific feedback) have a '''fraction''' (grade) lower, that '''hint grade border''' (see [[#Hinting]]). You actually don't want to generate hints for these answers, as they don't define a correct situation, so it's not problem but a feature.
 
===Templates===
 
Preg 2.8 introduces a new feature called Templates.
 
A template is a more convenient and semantic way to write frequently used patterns. Template is a regular expression comment that is changed on special regex before execution. Some templates can be parametrized, any regular expression can be used as parameter value.
* '''Simple template''' => (?###template_name)
* '''Parametrized template''' => (?###template_name<)param1(?###,)param2(?###,)...paramN(?###>)
 
Templates can be used for making regex shorter and easier to understand. For particulary complex regexes (usually ones with parentheses) you may consider using [[Preg_question_type#Settings_affecting_question_work|extended notation]] to be able to use line breaks and spaces for better formatting regex.
 
For now, templates are hard-coded in the Preg question type, but there are plans to add support for custom user templates in the next releases.
 
At the moment the following templates are available:
{| class="wikitable"
|-
! Template name
! Parameters
! Description
! Example
|-
| word
| None
| One or more 'word' characters (letters, digits and underscore).
| ((?###word)\s+)+  will match any number of words with any number of spaces between them
|-
| integer
| None
| Optional sign + or -, followed by one or more digits
| (?##integer) will match any integer value
|-
| parens_req
| The text you want to see in parentheses
| Something in at least one pair of correctly closed round parentheses
| (?###parens_req<)a(?###>) will match (a)  ((a)) and (((((a)))))
|-
| parens_opt
| The text you want to see in parentheses
| Something optionally placed in any number of pairs of correctly closed round parentheses
| (?###parens_opt<)a(?###>) will match a  (a) and (((a)))
|-
| brackets_req
| The text you want to see in brackets
| Something in at least one pair of correctly closed square brackets
| (?###brackets_req<)(?###word)(?###>) will match [abc] <nowiki>[[cat]]</nowiki>
|-
| brackets_opt
| The text you want to see in brackets
|  Something optionally placed in any number of pairs of correctly closed square brackets
| (?###brackets_opt<)(?###word)(?###>) will match cat [dog] <nowiki>[[[Fido]]]</nowiki>
|-
| custom_parens_req
| 1. Pattern for the opening parenthesis 2. Text inside custom parentheses 3. Pattern for the closing parenthesis
| This template is similar to the parens_req, but allows you to specify custom parentheses (possibly by more than one character)
| (?###custom_parens_req<)<(?###,)a(?###,)>(?###>) will match <a>  <<<a>>>
|-
| custom_parens_opt
| 1. Pattern for the opening parenthesis 2. Text inside custom parentheses 3. Pattern for the closing parenthesis
| This template is similar to the parens_req, but allows you to specify custom parentheses (possibly by more than one character)
| (?###custom_parens_opt<)/\*(?###,)(?###word)(?###,)\*/(?###>) will match /*something*/  /*/*/*word*/*/*/
|}
 
One templates can be used inside other as parameters. For example you can write 
  (?###parens_opt<)(?###word)(?###>)
It will match strings "a", "(a)", "(((((long_word_in_many_parens)))))" and so on.
 
==Authoring tools==
 
Authoring tools are there to help you write, test and understand you regexes. For now they can show you the meaning of written regex (and its parts), and test it. Authoring tools are activated by pressing the "edit" icon near the regex field.
 
[[Image:qtype preg authortools1.png|authoring tools icon]]
 
There are four authoring tools available:
; '''syntax tree''' : shows you the inner structure of regular expressions
; '''explaining graph''' : shows you how your expression will work in a graphical way
; '''description''' : formulates the meaning of your expression in English
; '''testing tool''' : allows you to enter strings and see how they match your regex
 
===Installation note and known technical issues===
To have ''syntax tree'' and ''explaining graph'' tools working you (or your site admin) have to install Graphviz[http://www.graphviz.org/Graphviz] on the server and fill the 'pathtodot' setting on you Moodle installation at Site Administration > Server > System Paths. Graphviz is used to draw pictures for you. Be sure to use Graphviz 2.36 or newer (earlier versions had a bug in svg output which led to incorrect pictures).
 
Syntax tree and explaining graph may not work correctly in old Opera versions - for some reason the images are not updated on user actions. Fortunately, there's a newer version 16 for Windows which works with authoring tools pretty well. On Linux you will have to use something else.
 
===Regular expression area===
Here you can edit your regular expression. Clicking on "Show" sends the regex to all tools - syntax tree, explaining graph, description and testing results will be updated. "Save" closes the authoring tools form and saves the regex and test strings in the main question editing form. "Cancel" closes the authoring tools form and discards all changes made there.
 
You can select part of regular expression there, and corresponding parts of syntax tree, explaining graph, description and matched part of the strings will be highlighted. It is possible to select part of regex text, that doesnt correspond with a logically completed part of regular expression. In that case you selection will be widened to the nearest logically completed part.
 
===Matching options===
There you can change options affecting matching - matching engine, regex notation, exact matching, and case sensitivity.
* '''Matching engine''' will change the code performing matching - you could use Testing tool to see if it suits your needs.
* '''Regular expression notation''' will change the way regexes are written - all instruments will show you the difference how this notation is interpreted.
* '''Case sensitivity''' will affect basic case sensitivity of expression, the results you can see in the explaining graph - case insensitive nodes are gray, case sensitive - white.
* '''Exact matching''' will add new parts to your regexp to ensure the entire student's response will match it. These added parts will be shown on gray background in the tools - see the picture below.
[[Image:qtype preg authortools9.png|exact matching]]
 
===Panning and zooming of pictures===
Syntax tree and Explaining graph tools generate a pictures and they can be too large. So starting from Preg 2.6 these tool allow you easy ''pan'' and ''zoom'' features.
 
To '''Pan''' image press left mouse button on the free area (not on the nodes - it will select them) and drag mouse around without releasing button. On the Explaining graph you should put ''Rectangle selection mode'' off in order to pan, since in rectangle selection
mode pressing mouse button starts drawing rectangle.
 
To '''Zoom''' image use mouse wheel while mouse pointer is over image.
 
===Syntax tree===
As was said above, regular expressions, like all expressions, are trees of operators and operands. Syntax tree shows the inner structure of expression graphically: what is inside what. This will be the most useful if you know how to understand regular expressions or [[#Understanding regular expressions|learning to do this]].
 
If you don't understand operators and precedence conception well, it may have a small meaning to you. But it is still useful to find out, where you need parentheses: cf. trees for ''ab+'' (a) and ''(ab)+'' (b) on the picture below.
 
[[Image:qtype preg authortools2.png|parenthesis in the structure of regex]]
 
The tree will show you names and numbers of all subpatterns, so you can check their numerations - and back references to it.
 
[[Image:qtype preg authortools8.png|numbered and named subpatterns in tree]]
 
The part of expression you selected is shown in green rectangle. You can select nodes of the tree to by pressing on them when Collapsing mode check box is unchecked.
 
[[Image:qtype preg authortools3.jpg|part of the tree is selected]]
 
Starting from Preg 2.6 Sytax tree tool have Collapsing mode, since syntax tree can be quite large and you usually need only part of it. When Collapsing mode is on, pressing on tree node collapse all it's child nodes into single ellipsis one (see image below). Pressing collapsed node again will un-collapse it. Switching off collapsing mode doesn't un-collapse nodes, it will allow you to return to the usual Selecton mode. On the picture you can see two collapsed nodes with tooltip, showing collapsed part of regular expression over one of them.
 
[[Image:qtype preg authortools12.jpg|collapsed tree]]
 
===Explaining graph===
The graph shows how regular expression works. Its nodes are matched characters, its edges show paths throught the nodes from the beginning to the end.
[[Image:qtype preg authortools4.png|alternatives and concatenation]]
 
Oval nodes represent individual characters, character sequences (so that graph isn't extremly big) or single special character classes (in which case they change line colour). Complex character classes are shown as rectangles. Simple assertions are checked between nodes, so they are written on the edges.
 
[[Image:qtype preg authortools5.jpg|graph for regex ^\dabc[!,0-9]$]]
 
Dotted rectangles shows you repeated parts of you expression.
 
[[Image:qtype preg authortools6.jpg|graph for regex \d*]]
 
Solid line rectangles show you subpatterns. When expression is matched, it remembers which part of the string matched each subpattern. You could insert it in the feedback or use in backreference in expression. If you do not need to remember subpatterns, you may use (?:  ) instead of (  ) parentheses, that will speed up matching.
 
[[Image:qtype preg authortools71.png|graph for regex (?:(abc)|de)f ]]
 
Green rectangle shows you selected part of expression. Switching on "Rectangle selection mode" you can select part of the graph using rubber rectangle and see corresponding part of regex selected on all instruments (including regular expression text).
 
[[Image:qtype preg authortools11.png|selection in the tree and graph ]]
 
===Description===
Description try to formulate a sentence, describing you how expression is supposed to work. Selected part of the expression will be shown by yellow background color.
 
===Testing tool===
You can enter a set of strings there, one per line. These strings will be matched against your expression. You'll see coloured strings, showing which parts of your strings matched the expression, so you can test if it works as you expected. You will also see green check marks for the strings that match entire regular expressions (and will be graded for that regex) and red crosses for the strings  that don't give full match. PHP preg matcher can't show partial matches, so it only shows full matches or nothing (to not mislead you that entire string is wrong).
 
If you selected a part of regex, you will be able to see what part of strings matches that part (usually in yellow color, but that may depend on you theme). FA matcher will show that for any part of regex, PHP preg matcher - only for capturing subpatterns.
 
The strings for testing will be saved in database, if you save regex (they will be lost if you close window with "cancel" button) and (later) question.
 
==Understanding regular expressions==
 
===Understanding expressions in general===
Regular expressions - as any '''expressions''' - are just a bunch of '''operators''' with their '''operands'''. Don't worry - you all learned to master arithmetic expressions from chilhood and regular ones are just as easy - if you look at them from the right angle. Learn (or recall) only 4 new words - and you are a master of regexes with very wide possibilities. Let's go?
 
Look at a simple math expression: '''x+y*2'''. There are two '''operators''': '+' and '*'. The '''operands''' of '*' are 'y' and '2'. The '''operands''' of '+' are 'x' and the result of 'y*2'. Easy?
 
Thinking about that expression deeper we can find that there is a definite '''order of evaluation''', governed by operator's '''precedence'''. The '*' has a precedence over '+', so it is evaluated first. You can change the evaluation order by using parentheses: '''(x+y)*2''' will evaluate '+' first and multiply the result by 2. Still easy?
 
One more thing we should learn about operators is their '''arity''' - this is just the number of operands required. In the example above '+' and '*' are '''binary''' operators - they both take two operands. Most of arithmetic operators are binary, but the minus has also the '''unary''' (single operand) form, like in this equation: '''y=-x'''. Note that the unary and binary minuses work differently.
 
Now any expression are just a lego game, where you set a sequence of '''operators''' with correct number of '''operands''' for each (arity), taking heed of their evaluation order by using their '''precedence''' and parentheses. Arithmetic expressions are for evaluating numbers. Regular expressions are for finding patterns in strings, so they naturally use another operands and operators - but they are governed by the same rules of precedence and arity.
 
===Regular expressions===
Regular expressions is a powerful mechanism for searching in strings using patterns. So their '''operands''' are characters or a sets of characters, that is allowed in particular position. '''A''' is a regular expressions that matches a single character 'A'. The '''operators''' in regular expressions define a way to combine individual characters in the pattern: sequence (''concatenation'' operator), alternative and repeating (it is called ''quantifier''). The concatenation is so simple operator, that it doesn't have any character for it at all - just write some characters in sequence, and they'll be concatenated. But it is still have precedence, so that the question can see, did you want to repeat a single character or a sequence of them. Alternative is written as vertical bar. There are many form of quantifiers - most commonly used are question mark (repeat zero or one times), asterisk (zero or more times) and plus (one or more times). You may specify mininimum and maximum number of repeats in curly braces - this is a quantifier too.
 
The special characters that define operators should be '''escaped''' when used as operands - preceded by a backslash.  Mathematical expressions never have escaping problems since their operands (numbers, variables) are constructed from different characters than operators (+,- etc), but when constructing a pattern for matching you should be able to use ''any'' character as an operand.
 
Character classes allows you to specify several possible characters for one place. They can be defined in many different ways: by enumeration of characters in square brackets '''[as3]''', by ranges in square brackets '''[a-z]''', by special sequences ('''\d''' means any digit, '''\W''' anything except a letter, digit and underscore, '''<nowiki>[[:alpha:]]</nowiki>''' any letter etc). An important type of operand is a ''simple assertions'': they allow you to test some conditions - start of the string '''^''', end of the string '''$''' or word border '''\b'''.
 
You could find a list and more examples of operands and operators in [[#Regular expressions reference|reference]] section.
 
===Precedence and order of evaluation===
A '''quantifier''' has precedence '''over concatenation''' and '''concatenation''' has precedence '''over alternative'''. Let's look what it means:
# ''quantifiers over concatenation'' means that quantifiers are executed first and will repeat only a single character if used without parentheses:
#* "many times*" matches "manytime" followed by zero or more "s";
#* "(many times)*" matches "many times" zero or more times - changing the previous regex by using parentheses allows us define a string repetition; 
# ''concatenation over alternative'' means that you can define multi-character alternatives without parentheses (for single character alternatives it's better to use character classes, not the alternative operator):
#* "first|second|third" matches "first" or "second" or "third";
#* "(first |second |)part" matches "first part" or "second part" or just "part" - typical use of an empty alternative (note that space is in alternative to not require it before just "part");
# ''quantifier over alternative'' means that you should use parentheses to repeat an alternative set:
#* "first|second*" matches "first" or "secon" followed by zero or more "d" like "secondddddd";
#* "(first|second)*" matches "first" or "second", repeated zero or more time in any order, like "firstsecondfirstfirst". Note that quantifiers repeat the whole alternative, not a definite selection from it, i.e.:
#* "(1|2){2}" matches "11" or "12" or "21" or "22", not just "11" or "22";
#* "1{2}|2{2}" matches "11" or "22" only.
 
An internal structure of regular expression can be viewed well on the [[#Syntax tree|syntax tree]] (authoring tool). The operators that executed first are placed lower on the tree (or to the right on horizontal view), the operator that executed last is the root of the tree. You can compare tree and explaining graphs for the examples above in authoring tools if this section doesn't seems too clear to you. Remember, that "execution" of regular expression operator means linking them in the string: sequental, alternative linking, or repeating.
 
===Anchoring===
Anchoring is used to set restrictions on the matching process by using simple assertions:
* if a regular expression starts with the '''^''' the match should start at the start of the student's response;
* if a regular expression ends with the '''$''' the match should end at the end of the student's reponse;
* otherwise a regex match can be found anywhere inside a student's response.
 
Note that simple assertions are concatenated with regex and concatenation has precedence over alternative, this makes it's usage slightly tricky:
* "^start|end$" will match "start" from the start of the string or "end" at the end of it;
* "^(start|end)$" using brackets to match exactly with "start" or "end";
* "^start$|^end$" is another way to get exact match (all top-level alternatives are anchored).
 
If you set the '''exact matching''' options to "yes" (which is the default value), the question will add ^ and $ in each regular expression for you (it will not affect subpattern usage). However, you may prefer to use some non-anchored regexes to catch common errors and give feedback and use manually anchored expressions for grading.
 
==Regular expressions reference==
 
===Operands===
Here's an incomplete list of operands that define character sets.
# '''Simple characters''' (with no special meaning) match themselves.
# '''Escaped special characters''' match corresponding special characters. Escaping means preceding special characters by the backslash "\". For example, the regex "\|" matches the string "|", the regex "a\*b\[" matches the string "a*b[". Backslash is a special character too and should be escaped: "\\" matches "\".
#* full list of characters needs escaping '''\ ^ $ . [ ] | ( ) ? * + { }'''
#*'''NOTE!''' when you are ''unsure'' whether to escape some character, it is safe to place "\" before any character except letters and digits. ''Do not'' escape letters and digits unless you know what you are doing - they get special meaning when escaped and lose it when not.
#* If you have too many characters that need escaping in some fragment, you can use '''\Q ... \E''' sequence instead. Anything between \Q and \E is treated literally as characters:
#** "\Q^(abc)$\E." matches "^(abc)$" followed by any character - there are NO simple assertions and subpatterns;
#** "\Q^(abc)$." matches "^(abc)$." because there is no "\E" and all characters after "\Q" are treated as literals till the end of the regex.
# '''Dot meta-character''' (".") matches ''any'' possible character (except newline, but students can't enter it anywhere), escape it "\." if you need to match a single dot. Loses it's special meaning inside character class.
# '''Character classes''' match any character defined in them. Character classes are defined by square brackets. The particular ways to define a character class are:
#* "[ab,!]" matches "a", "b", "," or "!";
#* "[a-szC-F0-9]" contains ranges (defined by a ''hyphen between 2 characters'') "a-z", "C-F" and "0-9" mixed with the single character "z", it matches any character from "a" to "s", "z", from "C to "F" and from "0" to "9";
#* "[^a-z-]" starts with the "^" that means a '''negative character set''': it matches any character except from "a" to "z" and "-" (note that the second hyphen is not placed between 2 characters so defines itself);
#* "[\-\]\\]" contains ''escaping inside a character set'':  it matches "-", "]" and "\", other characters loose their special meaning inside a character set and can be be not escaped, but if you want to include "^" in a character set it shouldn't be first there;
# '''Escape sequences''' for common character sets (can be used both inside or outside character classes):
#* "\w" for any word character (letter, underscore or digit) and "\W" for any non-word character;
#* "\s" for any space character and "\S" for any non-space character;
#* "\d" for any digit and "\D" for any non-digit.
# '''Unicode properties''' are special escape-sequences "\p{xx}" (positive) or "\P{xx}" (negative) for matching specific unicode characters which could be used both inside or outside character classes (the complete list of "xx" variations can be found at found at http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm):
#* "\p{Ll}" matches any lowercase letter;
#* "\P{Lu}" matches any non-uppercase letter.
# '''POSIX character classes''' are used for the same purpose as unicode properties (and complete list of them can be found on the Internet too), but may not work with non-ASCII characters. They are allowed only inside character classes:
#* <nowiki>"[[:alnum:]]"</nowiki> matches any alpha-numeric character;
#* <nowiki>"[[:^digit:]]"</nowiki> matches any non-digit chararcter.
# '''Simple assertions''' - they are not characters, but conditions to test, they ''don't consume'' characters while matching, unlike other operands (have those meaning only outside character classes):
#* "^" matches in the start of the string, fails otherwise;
#* "$" matches in the end of the string, fails otherwise;
#* "\b" matches on a word boundary, i.e. either between word (\w) and non-word (\W) characters, or in the start (end) of the string if it starts (ends) with a word character;
#* "\B" matches not on a word boundary, negative to "\b".
 
Still, a pattern that matches only one character isn't very useful. So here come the '''operators''' that allow us to define an expression that matches strings of several characters.
 
===Operators===
Here's a list of the common regex operators:
# '''Concatenation''' - so simple ''binary'' operator that doesn't require any special character to be defined. It is still an operator and has it's precedence, which is important if you want to understand where to use brackets. Concatenation allows you to write several operands in sequence:
#* "ab" matches "ab";
#* "a[0-9]" matches "a" followed by any digit, for example, "a5"
# '''Alternative''' - a ''binary'' operator that lets you define a set of alternatives:
#* "a|b" matches "a" or "b";
#* "ab|cd|de" matches "ab" or "cd" or "de";
#* "ab|cd|" matches "ab" or "cd" or ''emptiness'' (useful as a part in more complex expressions);
#* "(aa|bb)c" matches "aac" or "bbc" - using parentheses to outline alternative set;
#* "(aa|bb|)c" matches "aac" or "bbc" or "c" - typical usage of the emptiness;
# '''Quantifiers''' - an ''unary'' operator that lets you define repetition of something used as its operand:
#* "x*" matches "x" zero or more times;
#* "x+" matches "x" one or more times;
#* "x?" matches "x" zero or one times;
#* "x{2,4}" matches "x" from 2 to 4 times;
#* "x{2,}" matches "x" two or more times;
#* "x{,2}" matches "x" from 0 to 2 times;
#* "x{2}" matches "x" exactly 2 times;
#* "(ab)*" matches "ab" zero or more times, i.e. if you want to use a quantifier on more than one character, you should use parentheses;
#* "(a|b){2}" matches "aa" or "ab" or "ba" or "bb", i.e. it is a repeated alternative, not a repetition of "a" or "b".
 
===Subpatterns and backreferences===
'''Subpatterns''' are '''operators''' that ''remember'' substrings captured by the regex. The simplest way to define a subpattern is to use parentheses: the regex "a(bc)d" contains a subpattern "bc". Subpatterns are numerated from 0 for the whole regex and counted by opening parentheses. That "(bc)" subpattern is the 1st. If we write, say, "a(b(c)(d))e" - there are subpatterns "bcd" which is 1st, "c" which is 2nd and "d" which is 3rd.
Subpatterns are usually used with '''backreferences''' which, too, have numbers. Backreferences are '''operands''' that match the same strings which are matched by the subpatterns with the same numbers. The simplеst syntax for backreferences is a slash followed by a number: "\1" means a backreference to the 1st subpattern. The regular expression "([ab])\1" matches strings "aa" and "bb", but neither "ab" nor "ba" because the backreference should match the same character as the subpattern did.
Constider a little example: declaration and initialization of an integer variable in C programming language:
* "int ([_\w][_\w\d]*); \1 = -?\d+;" matches, for example, "int _var; _var = -10;". Of course, there can be any number of spaces between "int", variable name etc, so a more correct regex will look like:
* "\s*int\s+([_\w][_\w\d]*)\s*;\s*\1\s*=\s*-?\d+\s*;\s*" - this will match, say, "  int var2  ;    var2=123    ;  ". Looks a bit frightning, but it is easier to write this regex once than to try understand it after.
 
Finally, instead of just numbers, subpatterns and backreferences can have names via a little more complicated syntax:
# "(?<name1>...)" means a subpattern with name "name1";
# "(?'name2'...)" means a subpattern with name "name2";
# "(?P<name3>...)" means a subpattern with name "name3";
# "\k<name4>" means a backreference to the subpattern named "name4";
# "\k'name5'" means a backreference to the subpattern named "name5";
# "\g{name6}" means a backreference to the subpattern named "name6";
# "\k{name7}" means a backreference to the subpattern named "name7";
# "(?P=name8)" means a backreference to the subpattern named "name8".
This is very useful when you work with complicated regexes and often modify it by adding or removing subpatterns - names stay the same.
 
====Duplicate subpattern numbers and names====
There is a useful syntax when combining subpatterns with alternation. If you create a group "(?|...)" than every alternative inside that group will have the same subpattern numeration. Consider the regex "(?|(a(b))|(c(d)))" - there are 2 alternatives with 2 subpatterns in each. Subpatterns "ab" and "cd" are 1st ones, "b" and "d" are 2nd ones.
 
===Subpattern calls===
Another way to use a subpattern is to call it. When hitting a subpattern call, the matching engine goes to the beginning of the target subpattern, and then starts to match it over again, until its end (not the end of the whole regex). If a subpattern call is placed outside the subpattern it refers to, it is almost equivalent to copy-pasting the subpattern, except its number (it will stay the same).
 
The most common usage of subpattern calls is the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without recursive subpattern calls, it is impossible to handle an arbitrary nesting depth. Note that regex for arbitrary nested parentheses is quite complex and actively using them may get you regexes quite obscure. Preg question type provide several [[Preg_question_type#Templates|templates]] to make regexes with parenthesis more readable and easier to write.
 
The syntax of a subpattern call is:
* (?R) recursive call fo the whole pattern
* (?n) call subpattern by absolute number
* (?+n) call subpattern by relative number
* (?-n) call subpattern by relative number
* (?&name) call subpattern by name
* (?P>name) call subpattern by name
* \g<name> call subpattern by name
* \g'name' call subpattern by name
* \g<n> call subpattern by absolute number
* \g'n' call subpattern by absolute number
 
The first one is explicitly recursive. The rest of the variants cause recursion if placed inside the subpattern they refer to, for example: "a(b(?1)?c)d" contains recursion, "a(bc)(?1)d" does not.
 
When using the finite state automata engine, subpattern calls behave slightly different from PCRE in that the called subpatterns are NOT treated as atomic groups. Generally the behaviour implemented in Preg question type is more intuitive and helpful. Please take a look at PCRE docs for more information.
 
===Conditional subpatterns===
Conditional subpatterns allow to write "if-then-else" alike constructions. Basically a conditional subpattern consists of a condition, a positive branch and an optional negative branch. If there is no explicit negative branch, it is implied to be empty, like (?:).
General syntax is: "(?(condition)yes-pattern)" or "(?(condition)yes-pattern|no-pattern)".
 
The more specific options are:
* (?(n)... absolute reference condition - is the n'th subpattern captured?
* (?(+n)... relative reference condition
* (?(-n)... relative reference condition
* (?(<name>)... named reference condition - is the subpattern with the given name captured?
* (?('name')... named reference condition
* (?(name)... named reference condition
* (?(R)... overall recursion condition - if there is no subpattern named 'R', the condition is true if a recursive call to the whole pattern or any subpattern has been made
* (?(Rn)... specific group recursion condition - the condition is true if the most recent recursion is into the n'th subpattern
* (?(R&name)... specific recursion condition - the condition is true if the most recent recursion is into the subpattern with the given name
* (?(DEFINE)... define subpattern for reference
* (?(assert)... complex assertion condition - the condition is true if the assert (positive/negative lookahead/lookbehind) matches
 
At "top level", all these recursion test conditions are false.


'''DFA''' engine use custom '''regular expression parser''', so it supports advanced error reporting. The are several class of potential errors reported:
The latter type of conditional subpatterns is not yet supported by the FA engine.
* unclosed square brackets of character class;
* unclosed opening parenthesis of any sort (different forms of subpatterns and assertions);
* unopened closing parenthesis;
* empty parenthesis of any sort (different forms of subpatterns and assertions);
* quantifiers without operand, i.e. at the start of (sub)expression with nothing to repeat;
* three or more top-level alternatives in the conditional subpattern.
PCRE (and preg functions) treat most of them as '''non-errors''', making many characters meaning context-dependent. For example quantifier {2,4} placed at the start of regular expression lose it's meaning as quantifier and is treated as five-characters sequence instead (that matches with {2,4}). However such syntax is very prone to errors and make writing regular expression harder.


For now I'm vote for reporting errors instead of treating them as literals, even if it means incompatibility with PCRE/preg. If you are stand for or against this decision please write you positions and reasons on the page comments please. It may be best to have two modes, but this literally means two parsers and this is out of current scope of development. There are more pressing issues ahead.
===Complex assertions===
Assertions about some part of the string don't actually go into matching text, but affect the matching occurrence:
* '''positive lookahead assertion''' "a+(?=b)" matches any number of "a" ending with "b" without including "b" in the match;
* '''negative lookahead assertion''' "a+(?!b)" matches any number of "a" that is not followed by "b";
* '''positive lookbehind assertion''' "(?<=b)a+" matches any number of "a" preceeded by "b";
* '''negative lookbehind assertion''' "(?<!b)a+" matches any number of "a" that is not preceeded by "b".


===Looking for missing things===
===Local case-sensitivity modifiers===
Joseph Rezeau REGEXP question type has a special syntax for '''missing words''' feature, allowing to define an answer that would work when something is absent in the answer (and give appropriate feedback to the student). So if you want to look out for the missing word '''necessary''' in the response, you'll add this answer (WARNING - REGEXP only syntax on the next line):
Starting from Preg 2.1 you can set case-(in)sensitivity for parts of your regular expressions by using the standard syntax of Perl-compatible regular expressions:
  --.*\bnecessary\b.*
* "(?i)" will turn case-sensitivity off;
where \b defines a word boundary, while .* ensures that this word could be anywhere in the response.
* "(?-i)" will turn case-sensitivity on.
This affects general case-sensitivity, which is choosen on the question level. So you can make some answers case-sensitive and some not, or even do this for the parts of answers. For example you can set question as "use case" and have a 50% answer starting with "(?i)" to grade lesser when the case doesn't match, but everything else is correct.


There is no need to have such features in the PREG question type, since similar effect could be achieved with '''negative assertions''' combined with anchoring the matching start. Equivalent regular expression to look for the missing word '''necessary''' would be
When placed in parentheses, local modifiers work up to the closest ")". When placed on the top level (not inside parentheses) they work up to the end of the expression, i.e. with case sensitivity on for the question:
  ^(?!.*\bnecessary\b.*)
* "abc(de(?i)'''gh''')xyz" will have the bold part case-insensitive;
where
* "abc(de)(?i)'''ghxyz'''" will have the bold part case-insensitive.
* '''(?!.*\bnecessary\b.*)''' is a '''negative lookahead assertion''', that allows matching only if there are no word '''necessary''' ahead of some point in the string;
* '''^''' is an assertion too, that anchoring the match to the start of response (otherwise there would be places in response after the word "necessary", where matching is possible even if the word is present).


In case the description is difficult to you, just surround regexp to be missing with '''^(?!''' and ''')'''.
===Error reporting===
Native PHP preg extension functions only report if there is an error in regular expression or not, so '''PHP preg extension''' engine can't tell you much about the error.


===Matching engines===
'''FA'''' engine uses a custom '''regular expression parser''', so it supports advanced error reporting. The are several classes of potential errors:
Matching engines means different program code that do matching (either by different methods or written by different people). There are no single 'best' matching engine - it depends on the features you want to use and regular expressions engine should handle. They have a different degree of stability and offer different features to use.
* more than two top-level alternatives in a conditional subpattern "(?(?=f)first|second|third)";
====PHP preg extension====
* unopened closing parenthesis "abc)";
It is based on native PHP preg functions (which is in turn based on the PCRE library). It is supporting 100% perl-compatible regular expression features, been very stable and thoroughly tested. Sadly, PHP functions doesn't support partial matching (while PCRE could), so (unless we storm PHP developers to add support for partial matching) there is '''no hinting''' there. However it will support subpattern capturing. Choose it when you need complex regexp features other engines don't support, subpattern capturing or better performace.
* unclosed opening parenthesis of any sort (subpatterns, assertions, etc) "(?:qwerty";
* quantifier without an operand, i.e. at the start of (sub)expression with nothing to repeat "+" or "a(+)";
* unclosed brackets of character classes "[a-fA-F\d";
* setting and unsetting the same modifier at the same time "(?i-i)";
* unknown unicode properties "\p{Squirrel}";
* unknown posix classes <nowiki>"[[:hamster:]]"</nowiki>;
* unknown (*...) sequence "(*QWERTY)";
* incorrect character set range "[z-a]";
* incorrect quantifier ranges "{5,3}";
* \ at end of pattern "ab\";
* \c at end of pattern "ab\c";
* invalid escape sequence;
* POSIX class ouside of a character set "[:digit:]";
* reference to unexisting subpattern (abc)\2;
* unknown, wrong or unsupported modifier "(?z)";
* missing ) after comment "(?#comment";
* missing conditional subpattern name ending;
* missing ) after (?C;
* missing subpattern name ending;
* missing backreference name ending;
* missing backreference name beginning;
* missing ) after control sequence;
* wrong conditional subpattern number, digits expected;
* assertion or condition expected "(?()a|b)";
* character code too big "\x{ffffffff}";
* character code disallowed "\x{d800}";
* invalid condition (?(0);
* too big number in (?C...) "(?C256)";
* two named subpatterns have the same name "(?<name>a)(?<name>b)";
* backreference to the whole expression "abc\g{0}";
* different subpattern names for subpatterns of the same number "(?|(?<name1>a)|(?<name2>b))";
* subpattern name expected "(?<>abc)";
* \c should be followed by an ascii character "\cй";
* \L, \l, \N{name}, \U, and \u are unsupported;
* unrecognized character after (?<.


====Deterministic finite state automata (DFA)====
==The ways to give back==
This is a custom PHP code using DFA matching algorithm. It is heavily unit-tested, but considered beta-quality for now. Not all PHP operands and operators are supported, and for some (more exotic) ones support could still differs from standard (especially for non-latin characters). On the bright side it is support '''hinting'''.
This project is free software, so it's hard to get any feedback. You shouldn't expect to get software which ideally suits you needs without telling anyone about these needs, or encouragement, or some non-difficult support to the authors. Sometimes as little as writing where you work and how you use (or what prevents you from using) Preg question type may help a lot.


Currently supported operands (there would be more):
This software is considered a scientific project and such things could be really useful and appreciated:
* single characters
* an evidence that the results of our work (i.e. Preg questoin type) are really useful to people and were used in production environment;
* escaped special characters
* a cooperative work to research it's effectiveness for various applications - basically you need to write about how you use Preg and make some survey with you teachers and/or students about it - but it can include co-authoring a conference thesis or a journal article;
* character classes, including ranges and negative classes
* cooperating in writing article or help publishing it in English-language journals (information and help in grants for further work is welcome too).
* escape sequences  \w\W\s\S\t\d\D (locale-aware, but not Unicode for performance reasons, as in standard regular expression functions)
* octal and hexadecimal character codes preceeded by \o and \x
* meta-character . (any character)


Currently supported operators (there would be more):
If you consider any way of helping, do not hesitate to write me about it and ask any questions about details. You may receive individual help during such work too (for example, doing cooperative research I may give you tips how to improve you regexes, etc).
* concatenation
* alternative |
* quantifiers * + ? {2,3} {2,} {,2} {2}
* positive lookahead assertions
* changing operator precedence (  )  (without subpattern capturing) or (?:  )


Features that couldn't be supported by DFA matching at all:
I am a high school teacher, researcher and programmer who must do much on his main paid job and have not much free time to spend on developing this question type. If you could help me in some ways, I may be able to spend more time and effort doing this though. Some examples:
* subpattern capturing
* publishing a thesis or paper describing your usage of the Preg question I could give reference for would improve rating of the project there and my rating as a researcher/developer, so please publish and let me know the reference if you feel grateful for this software;
* backreferences
* if you would take some more work and organise publishing a paper (or at least thesis) with me as co-author, that would '''help even more''' - please inform me immediately if you consider this;
* if publishing is hard, you could just write me what your organisation is and how you use preg - that'll help and I would be able to better determine what should be done next;
* join the testing efforts - there are many settings in the question, and regexes can be quite complex, so it's hard to do all testing by developers themselves.


===Development plans===
==Development plans==
There is no definite shedule or order of development for those features - it depends on the available time and developers. Many features require complex code to achieve results. If you want to help us with specific feature, please contact question type maintainer (Oleg Sychev) using http://moodle.org messaging.
There is no definite shedule or order of the development for those features - it depends on the available time and developers. Many features require complex code to achieve the results. If you want to help us with a specific feature, please contact the question type maintainer (Oleg Sychev) using http://moodle.org messaging.
* Update DFA matching engine to support all operators DFA algorithm could
* Templates editor, allowing users to create custom templates
* Improve Unicode support of custom matching engines
* Support for complex assertions
* Add automatic generation of shortest possible correct answer in user-readable form
* Support for approximate matching to catch typos in answers
* Add generation of 'description' for regular expression to facilitate it's editing
* Improve a set of authoring tools to make writing regular expressions easier
* Develop NFA and backtracking matching engines
* Add more languages for next lexem hinting
* Develop more help and examples for the people that don't know much about regular expressions.
* Develop more help and examples for the people that don't know much about regular expressions.


[[Category:Contributed code]]
[[Category:Contributed code]]
[[es:Tipo de pregunta Preg]]

Latest revision as of 19:51, 10 February 2020


Note: Even though this plugin is currently (February 2020) listed as available for Moodle branches 2.3 to 3.1 only, the plugin Stats tab shows that half of the more than 30 sites that have this plugin installed, are Moodle branches from 3.2 to 3.8. This officially not-supported branches can have a working regex plugin, but they will not have the 'Authoring tools' available. The plugin developers are working in order to update this plugin and fix the JavaScript problem that causes this issue. If you enable a DEVELOPER level of debugging, You may see some warnings that do not represent any danger to your server. They should be fixed in future releases.


Preg is a question type that uses regular expressions (regexes) to check student's responses (though you can use it without regexes for its hinting features). Regular expressions give vast capabilities and flexibility to both teachers when making questions and students when writing answers to them. First section should guide you to using of this docs, please use it with discretion. More details about regex syntax can be found at http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm. There are many good regex manuals, I'm not going to repeat them here.

Authors:

  1. Idea, design, question type and behaviours code, hinting, error reporting, regular expression testing (authoring tool) - Oleg Sychev.
  2. Regex parsing, FA regex matching engine, matchers testing, backup&restore, unicode support, templates - Valeriy Streltsov.
  3. Assertions support for FA matcher - Elena Lepilkina.
  4. Explaining graph (authoring tool) - Vladimir Ivanov.
  5. Syntax tree (authoring tool) - Grigory Terekhov.
  6. Regex description (authoring tool) - looking for maintainer.
  7. Assertions support - Elena Lepilkina.

We would gladly accept testers and contributors (see the development plans section) - there is still more work to be done than we have time. Thanks to:

  • Joseph Rezeau for being devoted tester of Preg question type releases and being the original author of many ideas that have been implemented in Preg question type;
  • Tim Hunt - for his polite and useful answers and commentaries that helped writing this question, also for joint work on extra_question_fields and extra_answer_fields code, that is useful to many question type developers;
  • Bondarenko Vitaly - for conversion of a vast set of regular expression matching tests.
  • Dmitriy Pahomov - for been first author of Regex description (authoring tool)

You, too, could aways help us a lot - regardless of the way you use Preg and your capabilities.


Ways to use Preg questions and this docs

I don't (want to) know anything about regular expressions but next word (character) hinting seems useful

Then you can use Preg question type just as Shortanswer with advanced hinting, without any knowledge about regular expressions. To do this, you need to choose

  • Notation => Moodle shortanswer
  • Engine => Finite state automata
  • Exact matching => Yes

After that, you can just copy answers from you shortanswer questions. You may want to read the section about hinting to understand more about hinting settings.

I have a vague knowledge of regular expressions, but want to use pattern matching

If writing regular expressions is hard for you, but you want to use their strength as patterns, authoring tools may help you a lot to create your questions. The tools show you the meaning of your regex in different ways: internal structure of the expression (syntax tree), visual path of matching (explaining graph) and a text description. They also allow you to test you regex against several strings and see if it works as expected. Experiment and play with your regexes, see corresponding changes in the authoring tools, and eventually you'll get the regex you want.

Read the section on authoring tools, than (probably after some experimenting with tools on your own) a start of section about understanding regular expressions (this is optional, but may be interesting and help a lot). You should also read a section about question working to better understand various settings and how they affects you questions.

I can make some effort to learn regular expressions well and be able to do anything they allow

Well, you don't know regexes but want to understand them and create complex expressions easily. Then, instad of blunt trying, you better spend some time and effort reading and understanding this section. Then read slightly about authoring tools and use them to experiment creating regexes. With these tools you can see if you really understand them well and they behave as expected. Syntax tree may be especially useful when you try to get the right meaning of precedence and arity. After you understand the principles of regexes well, read sections about question working and regular expression reference (to know your possibilities, don't bother to understand or remember them all - just look there periodically for something new to learn). Now you should be able to write regexes without much use of authoring tools, except the testing tool to test your expressions.

I know regular expressions well enought to write them on my own without further guidance

You should read about question working to understand various settings and question behaviour under them. You also may be interested in regex testing in the authoring tools section. Finally, regular expression reference may be of some use for you.


How Preg questions work

Basically, this question type is an extended version of Shortanswer. It extends its features in several different ways (you could use them in almost any combination):

  • Pattern matching - using regular expressions you can create powerful patterns describing possible students answers
  • Hinting - when students are stuck doing the question, you may allow them to ask for a next correct word (lexem) or a character (with possible penalty)

Settings affecting question work

Sets the case sensitivity for all regular expressions you specify as answers. Note that you can also set the case sensitivity for regex parts.

Exact matching affects the question in the following way:

Yes
the entire student's response, from the first to the last letter, should match your regular expression
No
student's response can just contain a part that matches your regex: for example, if the correct answer is "whole" then "the whole string" will be a correct student response

You still can set some of your regexes to match the whole student's response using special regex syntax.

Notations specify the "language" of your answers.

Regular expression
a usual notation for regular expression. Precisely it is Perl-compatible regex dialect. You may write regex on multiple strings for better reading - line breaks will be ignored.
Regular expression (extended)
useful for really complex regexes. It is similar to the PHP 'x' modifier. It will ignore any unescaped whitespaces in you regexes, that are not inside character classes (use \s instead) - so that you may freely format you regexes with spaces. It will also ignore line breaks with one useful exception: everything after '#' character untill the end of string is treated as commentary (# should not be escaped and should not be inside a character class).
Moodle shortanswer
use it to avoid regex syntax at all: just copy answers from you shortanswer questions. The '*' wildcard is supported. By choosing the FA engine you can get access to the hinting features. You can skip all that is said on regexes there, but be sure to read the hinting section to understand various settings you can alter to configure you question hinting behaviour.

Matching engine specifies the program module that performs the regex matching. There is no 'best' matching engine - it depends on the features you want to use. Engines have different stability and offer different features to use.

PHP preg extension
should be used when you don't need hinting and other engines are rejecting you expressions as too difficult or you encounter bugs in them. It is based on the native PHP preg_ functions. It supports 100% perl-compatible regex features, it is very stable and thoroughly tested. But it doesn't support partial matching, so (unless we storm PHP developers to add support of partial matching) there is no hinting. However it supports subpattern capturing. Choose it when you need complex regex features that other engines don't support.
Finite state automata (FA)
can be used to perform hinting for your students. FA engine is a custom PHP code, it allows many (but not all) regex features and is thoroughly tested (it passes all tests from AT&T testregex suite and most tests from PCRE testinput1, testinput4 suite for the features it supports, which means quite much), but still may contain bugs in rare cases. Unsupported features for now are lookaround assertions and some types of conditional subpatterns.

Hinting

Hinting is supported by the FA engine in adaptive and interactive behaviours.

Partial matching

Hinting starts with partial matching. By partially correct response we understand a string that starts with correct characters (matching your regex) but on some character the match breaks. Assume you entered the regex

 "are blue, white(,| and) red"

and a student answered

 "they are blue, vhite and red"

In this situation the partial match is

 "are blue, "

Note that the regex is unanchored ("Exact match" is set to "No") so the match may not start with the first character of the student's response (like in the example above: "they " is skipped). While using just partial matching the student will see the correct and incorrect parts:

 they are blue, vhite and red

General hinting rules

Preg question type doesn't add hinted characters to the student's response (unlike the REGEXP question type), showing it separately instead for a number of reasons:

  1. It is student's responsibility whether he wants to add hinted character to the his response (and some more possibly).
  2. It slightly facilitates thinking about a hint, since when the response is modified it is too easy to repeatedly press hint, which is not usually a desirable behaviour.

When possible, hinting chooses a character that leads to the shortest path to complete the match. Consider this response to the previous regular expression:

 are blue, white; red

There are two possible hint characters: "," or " " (leading to the " and" path). The question will choose "," because it leads to the shortest path to complete the match, while " " leads to the path 3 characters longer.

It is possible that not all regular expressions will give 100% grade. Consider you added an expression for students with bad memory:

 are white(,| and) red

with 60% grade and feedback about forgetting blue. You may not want hinting to lead student to the response

  are white, red

if he entered

  are white, oh I forgot the other colors.

Hint grade border controls this. Only regular expressions with the grade greater or equal than the hint grade border will be used for partial matching and hinting. If you set hint grade border to 1, only 100% grade regular expression will be used for hinting, if you set it to 0,5 regular expressions with grades from 50% to 100% will be used for hinting and 0%-49% would not. Regular expressions not used for hinting work only when they have a full match with the student response.

Next character hinting

When next character hinting is available, student will have the hint next character button by pressing which he receives one next correct character, highlighted by background coloring:

 they are blue, wvhite and red

You should typically set the hint penalty more than usual question penalty, because they are applied separately: usual penalty for an attempt without hinting, while hint penalty for an attempt with hinting.

Next lexem (word) hinting

Lexem means an atomic part of a language. For natural language a word, a number, a punctuation mark (or group of marks like '?!' or '...') are lexemes. For a programming language it can be a keyword, a variable name, a constant, an operator. Note that spaces are usually not considered to be lexems, but separators between them, since they don't have any particular meaning.

Next lexem hint will show student either completion of the current lexem (if partial match ends inside it) or next one (if student complete the current lexem). Like

  are blue

or

  are blue,

or

  are blue, white

Preg question type, since the 2.3 release, allows usage of next lexem hinting using the formal languages block. You should choose the language in which you expect a response for you question, since lexem borders are different for different languages. For now it supports these languages (there will be more):

simple english
english language scanner recognize words, numbers and punctuation marks;
C/C++ language
a programming language C (or C++);
printf language
a special language for formatting strings in C/C++ programming language, you will have it disabled probably.

Administrator of the site can control what languages are available to the teachers, to avoid confusion. See the settings of the block "Formal languages" in the plugin settings menu.

Note that "lexem" typically isn't the word you would like your students to see on the hinting button. Each language define their own word for it. You can enter another word in the question description, if you don't like default ones.

Subpattern capturing and feedback

Any pair of parentheses in a regex are considered as a subpattern and when matching the engine remembers (captures) not only the whole match, but its parts corresponding to all subpatterns. Subpatterns can be nested. If a subpattern is repeated (i.e. have quantifier), only last match of all repeats will be captured. If you want to change order of evaluation without defining a subpattern to capture (which will speed up processing), you should use (?: ) instead of just ( ). Lookaround assertions don't create subpatterns.

Subpatterns are counted from left to right by opening parentheses. Precisely 0 is the whole regex, 1 is first subpattern etc. You can insert them in the answer's feedback using simple placeholders: {$0} will be replaced by the whole match, {$1} by the first subpattern value etc. That can improve the quality of you feedbacks. Placeholders won't work in the general feedback because different answers can have different number of subpatterns.

Let's look at a regex defining a decimal number with optional integral part:

[+\-]?([0-9]+)?\.([0-9]+)

It has two subpatterns: first capturing integral part, second - fractional part of the number. If you wrote the feedback:

The number is: {$0} Integral part is {$1} and fractional part is {$2}

Then a student entered

123.34

He will see

The number is: 123.34 Integral part is 123 and fractional part is 34

If no integral part is given, {$1} will be replaced by empty string. There is no way (for now) to erase "Integral part is" under that circumstances - the placeholder syntax may become complex and prone to errors.

Looking for missing and misplaced things

Joseph Rezeau's REGEXP question type has a missing words feature, allowing to define an answer that will work when something is absent in the answer (and give an appropriate feedback to the student).

Similar effect can be achieved with negative assertions combined with anchoring the matching start. The regular expression to look for the missing word necessary would be

 ^(?!.*\bnecessary\b.*)

where

  • (?!.*\bnecessary\b.*) is a negative lookahead assertion, that allows matching only if there is no word necessary ahead of some point in the string;
  • ^ is an assertion too, that anchores the match to the start of the response (otherwise there would be places in response after the word "necessary", where matching is possible even if the word is present).

In case if the description is difficult to you, just surround regexp to be missing with ^(?! and ). Don't try '--' syntax, that is specific to Jospeh Rezeau's REGEX question type!

You can also have a rough search for misplaced words (it will actually work only if anything else is correct) using syntax like this:

  (?!<I\s+)\bam\b(?!\s+victor)

This expression catches misplaced "am" in the sentence "I am victor" by first looking for "am" doens't have "I" before it ("(?!<I\s+)" part) and then "victor" after it ("(?!\s+victor)" part). "\s+" allows any number of spaces between words. If you want to catch the first (last) word (punctuation mark, etc) - then you should place simple assertions for start/end of string ("^" or "$") instead of words in related assertions. For instance to look for misplaced "I" you should write something like

  (?!<^)\bI\b(?!\s+am)

which looks for "I" that is not preceded by start of the string and not followed by "am".

Note, that if you have several answers to catch missing and misplaced things, only one will actually work for any given student response.

Since the Preg 2.3 release you can combine hints and catching missing words. But you should be sure that the answers that look for missing things (and other to give specific feedback) have a fraction (grade) lower, that hint grade border (see #Hinting). You actually don't want to generate hints for these answers, as they don't define a correct situation, so it's not problem but a feature.

Templates

Preg 2.8 introduces a new feature called Templates.

A template is a more convenient and semantic way to write frequently used patterns. Template is a regular expression comment that is changed on special regex before execution. Some templates can be parametrized, any regular expression can be used as parameter value.

  • Simple template => (?###template_name)
  • Parametrized template => (?###template_name<)param1(?###,)param2(?###,)...paramN(?###>)

Templates can be used for making regex shorter and easier to understand. For particulary complex regexes (usually ones with parentheses) you may consider using extended notation to be able to use line breaks and spaces for better formatting regex.

For now, templates are hard-coded in the Preg question type, but there are plans to add support for custom user templates in the next releases.

At the moment the following templates are available:

Template name Parameters Description Example
word None One or more 'word' characters (letters, digits and underscore). ((?###word)\s+)+ will match any number of words with any number of spaces between them
integer None Optional sign + or -, followed by one or more digits (?##integer) will match any integer value
parens_req The text you want to see in parentheses Something in at least one pair of correctly closed round parentheses (?###parens_req<)a(?###>) will match (a) ((a)) and (((((a)))))
parens_opt The text you want to see in parentheses Something optionally placed in any number of pairs of correctly closed round parentheses (?###parens_opt<)a(?###>) will match a (a) and (((a)))
brackets_req The text you want to see in brackets Something in at least one pair of correctly closed square brackets (?###brackets_req<)(?###word)(?###>) will match [abc] [[cat]]
brackets_opt The text you want to see in brackets Something optionally placed in any number of pairs of correctly closed square brackets (?###brackets_opt<)(?###word)(?###>) will match cat [dog] [[[Fido]]]
custom_parens_req 1. Pattern for the opening parenthesis 2. Text inside custom parentheses 3. Pattern for the closing parenthesis This template is similar to the parens_req, but allows you to specify custom parentheses (possibly by more than one character) (?###custom_parens_req<)<(?###,)a(?###,)>(?###>) will match <a> <<<a>>>
custom_parens_opt 1. Pattern for the opening parenthesis 2. Text inside custom parentheses 3. Pattern for the closing parenthesis This template is similar to the parens_req, but allows you to specify custom parentheses (possibly by more than one character) (?###custom_parens_opt<)/\*(?###,)(?###word)(?###,)\*/(?###>) will match /*something*/ /*/*/*word*/*/*/

One templates can be used inside other as parameters. For example you can write

 (?###parens_opt<)(?###word)(?###>)

It will match strings "a", "(a)", "(((((long_word_in_many_parens)))))" and so on.

Authoring tools

Authoring tools are there to help you write, test and understand you regexes. For now they can show you the meaning of written regex (and its parts), and test it. Authoring tools are activated by pressing the "edit" icon near the regex field.

authoring tools icon

There are four authoring tools available:

syntax tree
shows you the inner structure of regular expressions
explaining graph
shows you how your expression will work in a graphical way
description
formulates the meaning of your expression in English
testing tool
allows you to enter strings and see how they match your regex

Installation note and known technical issues

To have syntax tree and explaining graph tools working you (or your site admin) have to install Graphviz[1] on the server and fill the 'pathtodot' setting on you Moodle installation at Site Administration > Server > System Paths. Graphviz is used to draw pictures for you. Be sure to use Graphviz 2.36 or newer (earlier versions had a bug in svg output which led to incorrect pictures).

Syntax tree and explaining graph may not work correctly in old Opera versions - for some reason the images are not updated on user actions. Fortunately, there's a newer version 16 for Windows which works with authoring tools pretty well. On Linux you will have to use something else.

Regular expression area

Here you can edit your regular expression. Clicking on "Show" sends the regex to all tools - syntax tree, explaining graph, description and testing results will be updated. "Save" closes the authoring tools form and saves the regex and test strings in the main question editing form. "Cancel" closes the authoring tools form and discards all changes made there.

You can select part of regular expression there, and corresponding parts of syntax tree, explaining graph, description and matched part of the strings will be highlighted. It is possible to select part of regex text, that doesnt correspond with a logically completed part of regular expression. In that case you selection will be widened to the nearest logically completed part.

Matching options

There you can change options affecting matching - matching engine, regex notation, exact matching, and case sensitivity.

  • Matching engine will change the code performing matching - you could use Testing tool to see if it suits your needs.
  • Regular expression notation will change the way regexes are written - all instruments will show you the difference how this notation is interpreted.
  • Case sensitivity will affect basic case sensitivity of expression, the results you can see in the explaining graph - case insensitive nodes are gray, case sensitive - white.
  • Exact matching will add new parts to your regexp to ensure the entire student's response will match it. These added parts will be shown on gray background in the tools - see the picture below.

exact matching

Panning and zooming of pictures

Syntax tree and Explaining graph tools generate a pictures and they can be too large. So starting from Preg 2.6 these tool allow you easy pan and zoom features.

To Pan image press left mouse button on the free area (not on the nodes - it will select them) and drag mouse around without releasing button. On the Explaining graph you should put Rectangle selection mode off in order to pan, since in rectangle selection mode pressing mouse button starts drawing rectangle.

To Zoom image use mouse wheel while mouse pointer is over image.

Syntax tree

As was said above, regular expressions, like all expressions, are trees of operators and operands. Syntax tree shows the inner structure of expression graphically: what is inside what. This will be the most useful if you know how to understand regular expressions or learning to do this.

If you don't understand operators and precedence conception well, it may have a small meaning to you. But it is still useful to find out, where you need parentheses: cf. trees for ab+ (a) and (ab)+ (b) on the picture below.

parenthesis in the structure of regex

The tree will show you names and numbers of all subpatterns, so you can check their numerations - and back references to it.

numbered and named subpatterns in tree

The part of expression you selected is shown in green rectangle. You can select nodes of the tree to by pressing on them when Collapsing mode check box is unchecked.

part of the tree is selected

Starting from Preg 2.6 Sytax tree tool have Collapsing mode, since syntax tree can be quite large and you usually need only part of it. When Collapsing mode is on, pressing on tree node collapse all it's child nodes into single ellipsis one (see image below). Pressing collapsed node again will un-collapse it. Switching off collapsing mode doesn't un-collapse nodes, it will allow you to return to the usual Selecton mode. On the picture you can see two collapsed nodes with tooltip, showing collapsed part of regular expression over one of them.

collapsed tree

Explaining graph

The graph shows how regular expression works. Its nodes are matched characters, its edges show paths throught the nodes from the beginning to the end. alternatives and concatenation

Oval nodes represent individual characters, character sequences (so that graph isn't extremly big) or single special character classes (in which case they change line colour). Complex character classes are shown as rectangles. Simple assertions are checked between nodes, so they are written on the edges.

graph for regex ^\dabc[!,0-9]$

Dotted rectangles shows you repeated parts of you expression.

graph for regex \d*

Solid line rectangles show you subpatterns. When expression is matched, it remembers which part of the string matched each subpattern. You could insert it in the feedback or use in backreference in expression. If you do not need to remember subpatterns, you may use (?: ) instead of ( ) parentheses, that will speed up matching.

de)f

Green rectangle shows you selected part of expression. Switching on "Rectangle selection mode" you can select part of the graph using rubber rectangle and see corresponding part of regex selected on all instruments (including regular expression text).

selection in the tree and graph

Description

Description try to formulate a sentence, describing you how expression is supposed to work. Selected part of the expression will be shown by yellow background color.

Testing tool

You can enter a set of strings there, one per line. These strings will be matched against your expression. You'll see coloured strings, showing which parts of your strings matched the expression, so you can test if it works as you expected. You will also see green check marks for the strings that match entire regular expressions (and will be graded for that regex) and red crosses for the strings that don't give full match. PHP preg matcher can't show partial matches, so it only shows full matches or nothing (to not mislead you that entire string is wrong).

If you selected a part of regex, you will be able to see what part of strings matches that part (usually in yellow color, but that may depend on you theme). FA matcher will show that for any part of regex, PHP preg matcher - only for capturing subpatterns.

The strings for testing will be saved in database, if you save regex (they will be lost if you close window with "cancel" button) and (later) question.

Understanding regular expressions

Understanding expressions in general

Regular expressions - as any expressions - are just a bunch of operators with their operands. Don't worry - you all learned to master arithmetic expressions from chilhood and regular ones are just as easy - if you look at them from the right angle. Learn (or recall) only 4 new words - and you are a master of regexes with very wide possibilities. Let's go?

Look at a simple math expression: x+y*2. There are two operators: '+' and '*'. The operands of '*' are 'y' and '2'. The operands of '+' are 'x' and the result of 'y*2'. Easy?

Thinking about that expression deeper we can find that there is a definite order of evaluation, governed by operator's precedence. The '*' has a precedence over '+', so it is evaluated first. You can change the evaluation order by using parentheses: (x+y)*2 will evaluate '+' first and multiply the result by 2. Still easy?

One more thing we should learn about operators is their arity - this is just the number of operands required. In the example above '+' and '*' are binary operators - they both take two operands. Most of arithmetic operators are binary, but the minus has also the unary (single operand) form, like in this equation: y=-x. Note that the unary and binary minuses work differently.

Now any expression are just a lego game, where you set a sequence of operators with correct number of operands for each (arity), taking heed of their evaluation order by using their precedence and parentheses. Arithmetic expressions are for evaluating numbers. Regular expressions are for finding patterns in strings, so they naturally use another operands and operators - but they are governed by the same rules of precedence and arity.

Regular expressions

Regular expressions is a powerful mechanism for searching in strings using patterns. So their operands are characters or a sets of characters, that is allowed in particular position. A is a regular expressions that matches a single character 'A'. The operators in regular expressions define a way to combine individual characters in the pattern: sequence (concatenation operator), alternative and repeating (it is called quantifier). The concatenation is so simple operator, that it doesn't have any character for it at all - just write some characters in sequence, and they'll be concatenated. But it is still have precedence, so that the question can see, did you want to repeat a single character or a sequence of them. Alternative is written as vertical bar. There are many form of quantifiers - most commonly used are question mark (repeat zero or one times), asterisk (zero or more times) and plus (one or more times). You may specify mininimum and maximum number of repeats in curly braces - this is a quantifier too.

The special characters that define operators should be escaped when used as operands - preceded by a backslash. Mathematical expressions never have escaping problems since their operands (numbers, variables) are constructed from different characters than operators (+,- etc), but when constructing a pattern for matching you should be able to use any character as an operand.

Character classes allows you to specify several possible characters for one place. They can be defined in many different ways: by enumeration of characters in square brackets [as3], by ranges in square brackets [a-z], by special sequences (\d means any digit, \W anything except a letter, digit and underscore, [[:alpha:]] any letter etc). An important type of operand is a simple assertions: they allow you to test some conditions - start of the string ^, end of the string $ or word border \b.

You could find a list and more examples of operands and operators in reference section.

Precedence and order of evaluation

A quantifier has precedence over concatenation and concatenation has precedence over alternative. Let's look what it means:

  1. quantifiers over concatenation means that quantifiers are executed first and will repeat only a single character if used without parentheses:
    • "many times*" matches "manytime" followed by zero or more "s";
    • "(many times)*" matches "many times" zero or more times - changing the previous regex by using parentheses allows us define a string repetition;
  2. concatenation over alternative means that you can define multi-character alternatives without parentheses (for single character alternatives it's better to use character classes, not the alternative operator):
    • "first|second|third" matches "first" or "second" or "third";
    • "(first |second |)part" matches "first part" or "second part" or just "part" - typical use of an empty alternative (note that space is in alternative to not require it before just "part");
  3. quantifier over alternative means that you should use parentheses to repeat an alternative set:
    • "first|second*" matches "first" or "secon" followed by zero or more "d" like "secondddddd";
    • "(first|second)*" matches "first" or "second", repeated zero or more time in any order, like "firstsecondfirstfirst". Note that quantifiers repeat the whole alternative, not a definite selection from it, i.e.:
    • "(1|2){2}" matches "11" or "12" or "21" or "22", not just "11" or "22";
    • "1{2}|2{2}" matches "11" or "22" only.

An internal structure of regular expression can be viewed well on the syntax tree (authoring tool). The operators that executed first are placed lower on the tree (or to the right on horizontal view), the operator that executed last is the root of the tree. You can compare tree and explaining graphs for the examples above in authoring tools if this section doesn't seems too clear to you. Remember, that "execution" of regular expression operator means linking them in the string: sequental, alternative linking, or repeating.

Anchoring

Anchoring is used to set restrictions on the matching process by using simple assertions:

  • if a regular expression starts with the ^ the match should start at the start of the student's response;
  • if a regular expression ends with the $ the match should end at the end of the student's reponse;
  • otherwise a regex match can be found anywhere inside a student's response.

Note that simple assertions are concatenated with regex and concatenation has precedence over alternative, this makes it's usage slightly tricky:

  • "^start|end$" will match "start" from the start of the string or "end" at the end of it;
  • "^(start|end)$" using brackets to match exactly with "start" or "end";
  • "^start$|^end$" is another way to get exact match (all top-level alternatives are anchored).

If you set the exact matching options to "yes" (which is the default value), the question will add ^ and $ in each regular expression for you (it will not affect subpattern usage). However, you may prefer to use some non-anchored regexes to catch common errors and give feedback and use manually anchored expressions for grading.

Regular expressions reference

Operands

Here's an incomplete list of operands that define character sets.

  1. Simple characters (with no special meaning) match themselves.
  2. Escaped special characters match corresponding special characters. Escaping means preceding special characters by the backslash "\". For example, the regex "\|" matches the string "|", the regex "a\*b\[" matches the string "a*b[". Backslash is a special character too and should be escaped: "\\" matches "\".
    • full list of characters needs escaping \ ^ $ . [ ] | ( ) ? * + { }
    • NOTE! when you are unsure whether to escape some character, it is safe to place "\" before any character except letters and digits. Do not escape letters and digits unless you know what you are doing - they get special meaning when escaped and lose it when not.
    • If you have too many characters that need escaping in some fragment, you can use \Q ... \E sequence instead. Anything between \Q and \E is treated literally as characters:
      • "\Q^(abc)$\E." matches "^(abc)$" followed by any character - there are NO simple assertions and subpatterns;
      • "\Q^(abc)$." matches "^(abc)$." because there is no "\E" and all characters after "\Q" are treated as literals till the end of the regex.
  3. Dot meta-character (".") matches any possible character (except newline, but students can't enter it anywhere), escape it "\." if you need to match a single dot. Loses it's special meaning inside character class.
  4. Character classes match any character defined in them. Character classes are defined by square brackets. The particular ways to define a character class are:
    • "[ab,!]" matches "a", "b", "," or "!";
    • "[a-szC-F0-9]" contains ranges (defined by a hyphen between 2 characters) "a-z", "C-F" and "0-9" mixed with the single character "z", it matches any character from "a" to "s", "z", from "C to "F" and from "0" to "9";
    • "[^a-z-]" starts with the "^" that means a negative character set: it matches any character except from "a" to "z" and "-" (note that the second hyphen is not placed between 2 characters so defines itself);
    • "[\-\]\\]" contains escaping inside a character set: it matches "-", "]" and "\", other characters loose their special meaning inside a character set and can be be not escaped, but if you want to include "^" in a character set it shouldn't be first there;
  5. Escape sequences for common character sets (can be used both inside or outside character classes):
    • "\w" for any word character (letter, underscore or digit) and "\W" for any non-word character;
    • "\s" for any space character and "\S" for any non-space character;
    • "\d" for any digit and "\D" for any non-digit.
  6. Unicode properties are special escape-sequences "\p{xx}" (positive) or "\P{xx}" (negative) for matching specific unicode characters which could be used both inside or outside character classes (the complete list of "xx" variations can be found at found at http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm):
    • "\p{Ll}" matches any lowercase letter;
    • "\P{Lu}" matches any non-uppercase letter.
  7. POSIX character classes are used for the same purpose as unicode properties (and complete list of them can be found on the Internet too), but may not work with non-ASCII characters. They are allowed only inside character classes:
    • "[[:alnum:]]" matches any alpha-numeric character;
    • "[[:^digit:]]" matches any non-digit chararcter.
  8. Simple assertions - they are not characters, but conditions to test, they don't consume characters while matching, unlike other operands (have those meaning only outside character classes):
    • "^" matches in the start of the string, fails otherwise;
    • "$" matches in the end of the string, fails otherwise;
    • "\b" matches on a word boundary, i.e. either between word (\w) and non-word (\W) characters, or in the start (end) of the string if it starts (ends) with a word character;
    • "\B" matches not on a word boundary, negative to "\b".

Still, a pattern that matches only one character isn't very useful. So here come the operators that allow us to define an expression that matches strings of several characters.

Operators

Here's a list of the common regex operators:

  1. Concatenation - so simple binary operator that doesn't require any special character to be defined. It is still an operator and has it's precedence, which is important if you want to understand where to use brackets. Concatenation allows you to write several operands in sequence:
    • "ab" matches "ab";
    • "a[0-9]" matches "a" followed by any digit, for example, "a5"
  2. Alternative - a binary operator that lets you define a set of alternatives:
    • "a|b" matches "a" or "b";
    • "ab|cd|de" matches "ab" or "cd" or "de";
    • "ab|cd|" matches "ab" or "cd" or emptiness (useful as a part in more complex expressions);
    • "(aa|bb)c" matches "aac" or "bbc" - using parentheses to outline alternative set;
    • "(aa|bb|)c" matches "aac" or "bbc" or "c" - typical usage of the emptiness;
  3. Quantifiers - an unary operator that lets you define repetition of something used as its operand:
    • "x*" matches "x" zero or more times;
    • "x+" matches "x" one or more times;
    • "x?" matches "x" zero or one times;
    • "x{2,4}" matches "x" from 2 to 4 times;
    • "x{2,}" matches "x" two or more times;
    • "x{,2}" matches "x" from 0 to 2 times;
    • "x{2}" matches "x" exactly 2 times;
    • "(ab)*" matches "ab" zero or more times, i.e. if you want to use a quantifier on more than one character, you should use parentheses;
    • "(a|b){2}" matches "aa" or "ab" or "ba" or "bb", i.e. it is a repeated alternative, not a repetition of "a" or "b".

Subpatterns and backreferences

Subpatterns are operators that remember substrings captured by the regex. The simplest way to define a subpattern is to use parentheses: the regex "a(bc)d" contains a subpattern "bc". Subpatterns are numerated from 0 for the whole regex and counted by opening parentheses. That "(bc)" subpattern is the 1st. If we write, say, "a(b(c)(d))e" - there are subpatterns "bcd" which is 1st, "c" which is 2nd and "d" which is 3rd. Subpatterns are usually used with backreferences which, too, have numbers. Backreferences are operands that match the same strings which are matched by the subpatterns with the same numbers. The simplеst syntax for backreferences is a slash followed by a number: "\1" means a backreference to the 1st subpattern. The regular expression "([ab])\1" matches strings "aa" and "bb", but neither "ab" nor "ba" because the backreference should match the same character as the subpattern did. Constider a little example: declaration and initialization of an integer variable in C programming language:

  • "int ([_\w][_\w\d]*); \1 = -?\d+;" matches, for example, "int _var; _var = -10;". Of course, there can be any number of spaces between "int", variable name etc, so a more correct regex will look like:
  • "\s*int\s+([_\w][_\w\d]*)\s*;\s*\1\s*=\s*-?\d+\s*;\s*" - this will match, say, " int var2  ; var2=123  ; ". Looks a bit frightning, but it is easier to write this regex once than to try understand it after.

Finally, instead of just numbers, subpatterns and backreferences can have names via a little more complicated syntax:

  1. "(?<name1>...)" means a subpattern with name "name1";
  2. "(?'name2'...)" means a subpattern with name "name2";
  3. "(?P<name3>...)" means a subpattern with name "name3";
  4. "\k<name4>" means a backreference to the subpattern named "name4";
  5. "\k'name5'" means a backreference to the subpattern named "name5";
  6. "\g{name6}" means a backreference to the subpattern named "name6";
  7. "\k{name7}" means a backreference to the subpattern named "name7";
  8. "(?P=name8)" means a backreference to the subpattern named "name8".

This is very useful when you work with complicated regexes and often modify it by adding or removing subpatterns - names stay the same.

Duplicate subpattern numbers and names

There is a useful syntax when combining subpatterns with alternation. If you create a group "(?|...)" than every alternative inside that group will have the same subpattern numeration. Consider the regex "(?|(a(b))|(c(d)))" - there are 2 alternatives with 2 subpatterns in each. Subpatterns "ab" and "cd" are 1st ones, "b" and "d" are 2nd ones.

Subpattern calls

Another way to use a subpattern is to call it. When hitting a subpattern call, the matching engine goes to the beginning of the target subpattern, and then starts to match it over again, until its end (not the end of the whole regex). If a subpattern call is placed outside the subpattern it refers to, it is almost equivalent to copy-pasting the subpattern, except its number (it will stay the same).

The most common usage of subpattern calls is the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without recursive subpattern calls, it is impossible to handle an arbitrary nesting depth. Note that regex for arbitrary nested parentheses is quite complex and actively using them may get you regexes quite obscure. Preg question type provide several templates to make regexes with parenthesis more readable and easier to write.

The syntax of a subpattern call is:

  • (?R) recursive call fo the whole pattern
  • (?n) call subpattern by absolute number
  • (?+n) call subpattern by relative number
  • (?-n) call subpattern by relative number
  • (?&name) call subpattern by name
  • (?P>name) call subpattern by name
  • \g<name> call subpattern by name
  • \g'name' call subpattern by name
  • \g<n> call subpattern by absolute number
  • \g'n' call subpattern by absolute number

The first one is explicitly recursive. The rest of the variants cause recursion if placed inside the subpattern they refer to, for example: "a(b(?1)?c)d" contains recursion, "a(bc)(?1)d" does not.

When using the finite state automata engine, subpattern calls behave slightly different from PCRE in that the called subpatterns are NOT treated as atomic groups. Generally the behaviour implemented in Preg question type is more intuitive and helpful. Please take a look at PCRE docs for more information.

Conditional subpatterns

Conditional subpatterns allow to write "if-then-else" alike constructions. Basically a conditional subpattern consists of a condition, a positive branch and an optional negative branch. If there is no explicit negative branch, it is implied to be empty, like (?:). General syntax is: "(?(condition)yes-pattern)" or "(?(condition)yes-pattern|no-pattern)".

The more specific options are:

  • (?(n)... absolute reference condition - is the n'th subpattern captured?
  • (?(+n)... relative reference condition
  • (?(-n)... relative reference condition
  • (?(<name>)... named reference condition - is the subpattern with the given name captured?
  • (?('name')... named reference condition
  • (?(name)... named reference condition
  • (?(R)... overall recursion condition - if there is no subpattern named 'R', the condition is true if a recursive call to the whole pattern or any subpattern has been made
  • (?(Rn)... specific group recursion condition - the condition is true if the most recent recursion is into the n'th subpattern
  • (?(R&name)... specific recursion condition - the condition is true if the most recent recursion is into the subpattern with the given name
  • (?(DEFINE)... define subpattern for reference
  • (?(assert)... complex assertion condition - the condition is true if the assert (positive/negative lookahead/lookbehind) matches

At "top level", all these recursion test conditions are false.

The latter type of conditional subpatterns is not yet supported by the FA engine.

Complex assertions

Assertions about some part of the string don't actually go into matching text, but affect the matching occurrence:

  • positive lookahead assertion "a+(?=b)" matches any number of "a" ending with "b" without including "b" in the match;
  • negative lookahead assertion "a+(?!b)" matches any number of "a" that is not followed by "b";
  • positive lookbehind assertion "(?<=b)a+" matches any number of "a" preceeded by "b";
  • negative lookbehind assertion "(?<!b)a+" matches any number of "a" that is not preceeded by "b".

Local case-sensitivity modifiers

Starting from Preg 2.1 you can set case-(in)sensitivity for parts of your regular expressions by using the standard syntax of Perl-compatible regular expressions:

  • "(?i)" will turn case-sensitivity off;
  • "(?-i)" will turn case-sensitivity on.

This affects general case-sensitivity, which is choosen on the question level. So you can make some answers case-sensitive and some not, or even do this for the parts of answers. For example you can set question as "use case" and have a 50% answer starting with "(?i)" to grade lesser when the case doesn't match, but everything else is correct.

When placed in parentheses, local modifiers work up to the closest ")". When placed on the top level (not inside parentheses) they work up to the end of the expression, i.e. with case sensitivity on for the question:

  • "abc(de(?i)gh)xyz" will have the bold part case-insensitive;
  • "abc(de)(?i)ghxyz" will have the bold part case-insensitive.

Error reporting

Native PHP preg extension functions only report if there is an error in regular expression or not, so PHP preg extension engine can't tell you much about the error.

FA' engine uses a custom regular expression parser, so it supports advanced error reporting. The are several classes of potential errors:

  • more than two top-level alternatives in a conditional subpattern "(?(?=f)first|second|third)";
  • unopened closing parenthesis "abc)";
  • unclosed opening parenthesis of any sort (subpatterns, assertions, etc) "(?:qwerty";
  • quantifier without an operand, i.e. at the start of (sub)expression with nothing to repeat "+" or "a(+)";
  • unclosed brackets of character classes "[a-fA-F\d";
  • setting and unsetting the same modifier at the same time "(?i-i)";
  • unknown unicode properties "\p{Squirrel}";
  • unknown posix classes "[[:hamster:]]";
  • unknown (*...) sequence "(*QWERTY)";
  • incorrect character set range "[z-a]";
  • incorrect quantifier ranges "{5,3}";
  • \ at end of pattern "ab\";
  • \c at end of pattern "ab\c";
  • invalid escape sequence;
  • POSIX class ouside of a character set "[:digit:]";
  • reference to unexisting subpattern (abc)\2;
  • unknown, wrong or unsupported modifier "(?z)";
  • missing ) after comment "(?#comment";
  • missing conditional subpattern name ending;
  • missing ) after (?C;
  • missing subpattern name ending;
  • missing backreference name ending;
  • missing backreference name beginning;
  • missing ) after control sequence;
  • wrong conditional subpattern number, digits expected;
  • assertion or condition expected "(?()a|b)";
  • character code too big "\x{ffffffff}";
  • character code disallowed "\x{d800}";
  • invalid condition (?(0);
  • too big number in (?C...) "(?C256)";
  • two named subpatterns have the same name "(?<name>a)(?<name>b)";
  • backreference to the whole expression "abc\g{0}";
  • different subpattern names for subpatterns of the same number "(?|(?<name1>a)|(?<name2>b))";
  • subpattern name expected "(?<>abc)";
  • \c should be followed by an ascii character "\cй";
  • \L, \l, \N{name}, \U, and \u are unsupported;
  • unrecognized character after (?<.

The ways to give back

This project is free software, so it's hard to get any feedback. You shouldn't expect to get software which ideally suits you needs without telling anyone about these needs, or encouragement, or some non-difficult support to the authors. Sometimes as little as writing where you work and how you use (or what prevents you from using) Preg question type may help a lot.

This software is considered a scientific project and such things could be really useful and appreciated:

  • an evidence that the results of our work (i.e. Preg questoin type) are really useful to people and were used in production environment;
  • a cooperative work to research it's effectiveness for various applications - basically you need to write about how you use Preg and make some survey with you teachers and/or students about it - but it can include co-authoring a conference thesis or a journal article;
  • cooperating in writing article or help publishing it in English-language journals (information and help in grants for further work is welcome too).

If you consider any way of helping, do not hesitate to write me about it and ask any questions about details. You may receive individual help during such work too (for example, doing cooperative research I may give you tips how to improve you regexes, etc).

I am a high school teacher, researcher and programmer who must do much on his main paid job and have not much free time to spend on developing this question type. If you could help me in some ways, I may be able to spend more time and effort doing this though. Some examples:

  • publishing a thesis or paper describing your usage of the Preg question I could give reference for would improve rating of the project there and my rating as a researcher/developer, so please publish and let me know the reference if you feel grateful for this software;
  • if you would take some more work and organise publishing a paper (or at least thesis) with me as co-author, that would help even more - please inform me immediately if you consider this;
  • if publishing is hard, you could just write me what your organisation is and how you use preg - that'll help and I would be able to better determine what should be done next;
  • join the testing efforts - there are many settings in the question, and regexes can be quite complex, so it's hard to do all testing by developers themselves.

Development plans

There is no definite shedule or order of the development for those features - it depends on the available time and developers. Many features require complex code to achieve the results. If you want to help us with a specific feature, please contact the question type maintainer (Oleg Sychev) using http://moodle.org messaging.

  • Templates editor, allowing users to create custom templates
  • Support for complex assertions
  • Support for approximate matching to catch typos in answers
  • Improve a set of authoring tools to make writing regular expressions easier
  • Add more languages for next lexem hinting
  • Develop more help and examples for the people that don't know much about regular expressions.