Note: You are currently viewing documentation for Moodle 2.4. Up-to-date documentation for the latest stable version of Moodle may be available here: Preg question type.

Preg question type

From MoodleDocs
Revision as of 21:04, 15 September 2010 by Oleg Sychev (talk | contribs) (→‎Matching)

Preg question type is a question type using regular expression pattern matching to find if studen response is correct. It is use Perl-compatible regular expressions dialect. For detailed description of regular expression syntax see http://www.nusphere.com/kb/phpmanual/reference.pcre.pattern.syntax.htm

Authors:

  1. idea, design, question type code - Oleg Sychev;
  2. parsing regular expression, DFA regular expression matching engine - Dmitriy Kolesov.

We would gladly accept testers and contributors (see Development plans section) - there is still more to be done than we have time.

Matching

Matching means finding a part of the student answer (or a whole answer) that suited the regular expression. This part called a match.

You should enter regular expressions as answers to the question without modifiers or enclosing characters (modifiers would be added for you by question - u added always and i in case-insensitive mode). You should also enter one correct response (that matches at least one 100% grade regular expression) to be shown to the student as correct answer. The question would get use all regular expressions in order to find first full match (full for expression, but not necessary all response - see anchoring) and give a grade from it. If there is no full match and engine supports partial matching (see hinting) than partial match that is shortest to complete would be choosen (for displaying a hint, zero grade is given) - or the longest one, if engine coudn't tell which one would be shortest to complete.

Operands

You could construct you expression from these operands:

  1. simple characters match with themselves
  2. escaped special characters if you need to use character with special meaning (like |, * or bracket) just as usual character to match you should preceed it by backslash: a\* matches with a* (while a* matches with a zero or more times), backslash is a special character too and should be escaped \\ matches with \
  3. character classes you could specify a number of possible characters in one place in square brackets:
    • [ab,!] matches with a or b or , or !
    • ranges: [a-szC-F0-9] you could specify ranges for letters and digits in character classes, mixing them with single characters
    • negative character classes starts with ^ [^ab] means any characters except a and b
    • escaping inside character classes: [\-\]\\] match with - or ] or \
  4. dot meta-character . match with any possible character (except newline, but student coudn't enter it anywhere), you should escape dot \. if you need to match single dot.

Operators

Most common regular expression operators used (could anyone help expand descriptions and examples please?):

  1. alternative let you define a set of alternatives:
    • a|b mean a or b
    • ab|cd|de mean ab or cd or de
    • ab|cd| mean ab or cd or emptiness (useful as a part more complex expressions
    • (aa|bb)c mean aac or bbc - use brackets to outline alternative set
    • (aa|bb|)c mean aac or bbc or c - typical use of emptiness
  2. quantifiers let you define repetition of a character (or regular expression):
    • x* mean x zero or more times
    • x+ mean x one or more times
    • x? mean x zero or one times
    • x{2,4} mean x from 2 to 4 times
    • x{2,} mean x two or more times
    • x{,2} mean x from 0 to 2 times
    • x{2} mean x exactly 2 times
    • (ab)* mean ab zero or more times, i.e. if you want to use quantifier on more than one character, you should use brackets
    • (a|b){2} mean aa or ab or ba or bb, i.e. it is repeated alternative, not selection one alternative and repeating it
  3. asserts - these are assertions about some part of the string that doesn't actually goes into matching text
    • positive lookahead assert a+(?=b) matches with any number of a ending with b without including b in the match
    • negative lookahead assert a+(?!b) matches with any number of a that is not followed by b
    • positive lookbehind assert (?<=b)a+ matches with any number of a preceeded by b
    • negative lookbehind assert (?<!b)a+ matches with any number of a that is not preceeded by b

Anchoring

Anchoring sets restrictions on the matching process:

  • if a regular expression starts with ^ the match should start at the start of the student's response;
  • if a regular exhression ends with $ the match should ends at the end of the student's reponse;
  • otherwise regular expression match could be contained anywhere inside student response.

If you set exact matching options to yes (default setting), the question would add ^ and $ in each regular expression for you. However, you may prefer to use some non-anchored regexes to catch common errors and give feedback while using manually anchored expression for grading.

Hinting

Some matching engines could support hinting (not easy thing to do on the PHP at all) in adaptive mode.

Hinting starts with partial matching. When a student enters a partially correct answer, partial matching could find that response starts matching and on some character broke it. Consider you enter expression:

 are blue, white(,| and) red

and student answered

 they are blue, vhite and red

Partial matching will find that partial match is

 are blue, 

Remember, the regular expresion in unanchored so the match shouldn't start with the start of the student response. While using just partial matching the student will be shown correct and incorrect parts:

 they are blue, vhite and red

When hinting is available, student will have hint button by pressing which he receive a hint with one next correct character, highlighted by background coloring:

 they are blue, wvhite and red

You should typically set hint penalty more than usual question penalty, because they are applied separately: usual penalty for an attempt without hinting, while hint penalty for an attempt with hinting. Preg question doesn't add hint character to the student's response (like regex question do it), showing it separately instead for a number of reasons:

  1. it is student's responsibility whether he want to add hinted character to the his response (and some more possibly);
  2. it slightly facilitates thinking about hint, since when the response is modified it is too easy to repeatedly press hint, which is not a desirable behavour usually.

When possible (if question engine supports it), hinting choosing a character that leads to shortest path to complete a match. Consider this response to the previous regular expression:

 are blue, white; red

There are two possible hint characters: ',' or ' '. The question will choose ',' since it leads to the shortest path to complete a match, while ' ' leads to a path 3 characters longer.

It is possible that not all regular expressions will give 100% grade. Consider you add an expression for the students with bad memory:

 are white(,| and) red

with 60% grade and feedback about forgetting blue. You may not want hinting to lead student to the response

  are white, red

if he entered

  are white, oh I forgot other colors.

Hint grade border controls this. Only regular expressions with grade greater or equal than hint grade border would be used for partial matching and hinting. If you set hint grade border to 1, only 100% grade regular expression would be used to hinting, if you set it to 0,5 regular expressions with 50%-100% grades would be used and 0%-49% would not. Regular expressions not used for hinting works only when they have a full match in the student response.

Matching engines

Matching engines means different program code that do matching (either by different methods or written by different people). There are no single 'best' matching engine - it depends on the features you want to use and regular expressions engine should handle. They have a different degree of stability and offer different features to use.

PHP preg extension

It is based on native PHP preg functions (which is in turn based on the PCRE library). It is supporting 100% perl-compatible regular expression features, been very stable and thoroughly tested. Sadly, PHP functions doesn't support partial matching (while PCRE could), so (unless we storm PHP developers to add support for partial matching) there is no hinting there. Choose it when you need complex regexp features other engines don't support or better performace.

Deterministic finite state automata (DFA)

This is a custom PHP code using DFA matching algorithm. It is heavily unit-tested, but considered beta-quality for now. Not all PHP operands and operators are supported, and for some (more exotic) ones support could still differs from standard (especially for non-latin characters). On the bright side it is support hinting.

Currently supported operands (there would be more):

  • single characters
  • escaped special characters
  • character classes, including ranges and negative classes
  • escape sequences \w\W\s\S\t\d\D (WARNING: they probably will not work for Unicode (non-latin) characters for now)
  • octal and hexadecimal character codes preceeded by \o and \x
  • meta-character . (any character)

Currently supported operators (there would be more):

  • concatenation
  • alternative |
  • quantifiers * + ? {2,3} {2,} {,2} {2}
  • positive lookahead assertions
  • changing operator precedence ( ) (without subpattern capturing) or (?: )

Features that couldn't be supported by DFA matching at all:

  • subpattern capturing
  • backreferences

Development plans

There is no definite shedule or order of development for those features - it depends on the available time and developers. Many features require complex code to achieve results. If you want to help us with specific feature, please contact question type maintainer (Oleg Sychev) using http://moodle.org messaging.

  • Update DFA matching engine to support all operations DFA algorithm could
  • Develop NFA and backtracking matching engines
  • Add possibility to insert match from response (or subpattern match) in the feedback
  • Add automatic generation of shortest possible correct answer in user-readable form
  • Add negative matching answers (AKA "missing words" from Joseph Rezeau question) - depends on developing and checking in extra_answer_fields() code
  • Add generation of 'description' for regular expression to facilitate it's editing
  • Develop more help and examples for the people that don't know much about regular expressions.