Note: You are currently viewing documentation for Moodle 2.0. Up-to-date documentation for the latest stable version is available here: Student projects/Email interface.

Student projects/Email interface: Difference between revisions

From MoodleDocs
(Added UI ideas.)
(Updated to latest changes)
Line 18: Line 18:


mdl_email_sessions:
mdl_email_sessions:
* id : char(26), key field - session id of the email, storing the actual base32 encoded session number which matches the "argument" field of the email's return address.
* id : int(10), key field, auto-inc - required by DML
* userid : bigint(10) - user id which this email was sent to. This is used only in bounce processing.
* b32key : int(10), unsigned - session number of the email, encoded in a base32 part of the VERP field. An index has to be created on this to allow for fast lookup when an email is received.
* data : text - field to store data of any length mapped to this email session. This will probably be either a plain ascii parameter or a serialized object.
* data : text(medium) - field to store data of any length mapped to this email session. This will probably be either a plain ascii parameter or a serialized object.
* timestamp : int(10) - timestamp of the email sent which registered this session. This is to allow "old" sessions to be removed after a defined time interval when they are no longer valid during cron.
* timestamp : int(10), unsigned - timestamp of the email sent which registered this session. This is to allow "old" sessions to be removed after a defined time interval when they are no longer valid during cron.


An entry is inserted into the session table if either bounce handling or the email interface is enabled. We must ensure our data field is large enough to handle decently sized serialized objects. The TEXT type allows for 65535 characters in mysql, we can use medium- or large-text to increase this if neccesary.
An entry is inserted into the session table if either bounce handling or the email interface is enabled. We must ensure our data field is large enough to handle decently sized serialized objects - a medium text field in mysql stores 16 million characters, which is more than enough.


By uniquely identifying each email by a session ID, we can solve the problem mentioned of [[Email processing#Security_issues|handling repeats]] by removing the appropriate row from the table after the handler has run. Replies without an appropriate row could simply be discarded.
By uniquely identifying each email by a session ID, we can solve the problem mentioned of [[Email processing#Security_issues|handling repeats]] by removing the appropriate row from the table after the handler has run. Replies without an appropriate row could simply be discarded.
==VERP structure==
The structure of the VERP field has been changed several times since coding started - here is the latest version:
* 4 characters: prefix for MTA ("mdl+", etc.)
* 2 characters: encoded (base32 + pack('C')) module id to determine which processing function to call - module id 0 stands for Moodle 'core'.
* 16 characters: encoded (base32 + pack('C')) user id - this is only used for bounce processing, and is not passed upwards to handling functions. Any user identification is expected to be stored within the session data field.
* 16 characters: encoded (base32 + pack('C')) session id - this is used to identify the session row in mdl_email_sessions, and matches the b32key.
* 26 characters: hash validating and securing the VERP data. HMAC-MD5-26 is used for this.
==Security concerns==
Rather than using the auto-incremental id field as the session key, which would reduce the amount of data we need to store per email in the session table, a randomly generated number is used instead. Because identity should be stored within the session data, it would be extremely unlikely that even if the site identifier were known and a collision computed that a person would know another session key that should be using. The identity of the user id field in the VERP address is used only for bounce detection purposes, but this requires a session key to have been generated beforehand (and MD5 to match) to prevent denial of service attacks to force a users email address to go over their bounce-quota.


==Core functions==
==Core functions==
Line 38: Line 51:
* Module code - changes to specific modules to actually use the email interface.
* Module code - changes to specific modules to actually use the email interface.
** Forum module
** Forum module
*** Subscribers to forums receiving "single messages" can have their emailled replies directly added as a reply to that specific post.
*** Subscribers to forums receiving "single messages" can have their emailed replies directly added as a reply to that specific post.
*** Subscribers to forums receiving "digest messages" may need some further interface to specify which post they are replying to, e.g. "3>>Message".
*** Subscribers to forums receiving "digest messages" may need some further interface to specify which post they are replying to, e.g. "3>>Message".
** ''(Investigate other modules)''
** ''(Investigate other modules)''
Line 44: Line 57:
==Major issues==
==Major issues==
===Handling bounce messages (Delivery Status Notifications)===
===Handling bounce messages (Delivery Status Notifications)===
It is neccesary for the framework to be able to distinguish between a message sent by a user in response to an email, or an automated DSN / "bounce" message sent by a MTA. Once a message has been identified as such, it can simply be discarded but must not remove the appropriate session entry, as it is possible for these messages to be generated if an email is only temporarily unreachable.
The framework is able to distinguish between messages sent by a user in response to an email, or automated DSN / "bounce" messages sent by a MTA. This is done via checking for a null return path, or existence of specific content-types which are identified as DSN specific by RFC 1894. Once a message has been identified as such, it is simply discarded - the session entry is not removed, as it is possible for these messages to be generated if an email is only temporarily unreachable.
===Pruning the session table===
===Pruning the session table===
Each session entry must have an associated timestamp to identify when it was created - this can be used to prune the table of all entries before a certain period. This can be done with a cron function, and suitable defaults must be determined for how long a email session can be "kept".
Each session entry has an associated timestamp to identify when it was created - this can be used to prune the table of all entries before a certain period. A config variable - $CFG->mailsessiontime - stores the number of hours email sessions persist (defaulting to 1 month). A cron function (cleanup function) written in admin/cron.php is used to prune the session table.
===Handling HTML===
===Handling HTML===
Some email clients send HTML encodings of the actual email message - should we parse these or simply provided functions to allow module code to parse them? Should we also handle MIME attachments like this?
Some email clients send HTML encodings of the actual email message - should we parse these or simply provided functions to allow module code to parse them? Should we also handle MIME attachments like this?
===Backwards compatibility===
===Backwards compatibility===
Existing code already is in place that implements a email interface (in base64) as well as a email bounce handling facility - hopefully this can be retained and new code made avaliable so that existing modules depending on these do not need to be modified.
Existing code already is in place that implements a email interface (in base64) as well as a email bounce handling facility - this has been retained and new code made available so that existing modules depending on these do not need to be modified.
===Digest messages for the Forum module===
===Digest messages for the Forum module===
Users can choose for messages to be queued up in a single "digest" that is sent to them regularly. A method needs to be chosen to determine which message a reply to this digest is intended for.
Users can choose for messages to be queued up in a single "digest" that is sent to them regularly. A method needs to be chosen to determine which message a reply to this digest is intended for.
==Major tasks==
* Change configuration files (config-dist.php) to include constants for activation of the entire interface - same section as the handling of bounce messages. This should include documentation for them.
* Addition of new database table into moodle - how is this handled for non-module code?
* Creation / Modification of functions to handle encoding data - including insertion into the session table.
* Creation / Modification of functions to handle decoding data - from a session table, checking the appropriate hash. This will require us to change admin/process_email.php yet still retain the ability to handle bounces.
* Creation of a forum_process_email function within the Forum module to allow for action to be taken upon receipt of a message (reply) from a forum subscriber.
* Modification of the Forum module to pass data about the specific post(s) being sent, possibly by the session table.
* Look into how to better handle "digest"-style messages?
* Create appropriate settings for moodle admin settingpage under server->email (admin/settings/server.php).


==User interface ideas==
==User interface ideas==
Line 69: Line 71:
* In "digest" style emails, we need to be able to identify which discussion the reply is intended for.
* In "digest" style emails, we need to be able to identify which discussion the reply is intended for.
** HTML formatted emails can have mailto: links which specify a pre-existing body which contains data on which discussion this is intended for.
** HTML formatted emails can have mailto: links which specify a pre-existing body which contains data on which discussion this is intended for.
** Plain text emails could be handled by requiring the user to insert some seperator (">>1" to refer to message one) before inserting his post.
** Plain text emails could be handled by requiring the user to insert some separator (">>1" to refer to message one) before inserting his post.
** Plain text emails can have some sort of seperators between posts, and the reply must be inserted after the correct seperator, e.g:
** Plain text emails can have some sort of separators between posts, and the reply must be inserted after the correct separator, e.g:
  (Original email)
  (Original email)
  !Message 1 Seperator!
  !Message 1 Separator!
  Text of message one goes here.
  Text of message one goes here.
  !Message 2 Seperator!
  !Message 2 Separator!
  Text of message two goes here.
  Text of message two goes here.


  (Replied email)
  (Replied email)
  > !Message 1 Seperator!
  > !Message 1 Separator!
  > Text of message one goes here.
  > Text of message one goes here.
  User inserts message one reply here if he wants to reply to this specific message.
  User inserts message one reply here if he wants to reply to this specific message.
  > !Message 2 Seperator!
  > !Message 2 Separator!
  > Text of message two goes here.
  > Text of message two goes here.
* We need to be able to seperate the "reply" of the email from the quote and any prefix added by the mail client.
* We need to be able to separate the "reply" of the email from the quote and any prefix added by the mail client.
** HTML could contain specific elements (hr's) with id tags identifying the current discussion that the user would post between.
** HTML could contain specific elements (hr's) with id tags identifying the current discussion that the user would post between.
** Plain text email could also emulate the above with a rather more crude "insert reply below this line" approach.
** Plain text email could also emulate the above with a rather more crude "insert reply below this line" approach.


==Timeline==
==Timeline and current progress==
 
July 9th was used as a milestone for completion of the framework and some changes to the Forum module - at this stage, there was a working prototype of the framework and appropriate Forum module code, but many further refinements to the framework could still be made, specifically parsing and utility functions that all modules could use to help separate data in returned email bodies from redundant lines.
 
Another milestone should be reached by the end of July - a working prototype of the framework and the Forum module processing functions that:
* Correctly sends emails with appropriate bodies instructing users how to use the interface
* Recognises responses from single-post mode mails, strips them of erroneous content, and replies to that specific post (including preserving the title of the post if necessary).
* Recognises responses from digest mode mails including which post the reply is intended from, strips them of erroneous content and replies to that specific post.
* Some sort of response email for replies to the email interface that encounter errors (such as the email session having expired).
''(This will not necessarily be bug-free!)''


July 9th should be a milestone (mid-term evaluations) for the completion of the framework and most (if not all) changes to the Forum module. The period after that (till August 20th) can be used for further changes and refinements to the framework and other modules. Hopefully this can be updated as further details become apparent.
The time until August 20th can be used to test the code, fix/debug it, and hopefully write a few more handlers for other modules - once the framework utility functions are in place, this should be fairly easy!


Back to: [[Student projects]]
Back to: [[Student projects]]

Revision as of 06:03, 22 July 2007

Programmer: Peter Boswood

(This is very much still a work in progress, data here may be inaccurate or outdated!)

Summary

Moodle uses email in order to communicate with end users in a variety of manners - for authentication, subscriptions in the Forum module, etc. In some instances, rather than simply ignoring replies from users to these emails, it may be possible to "interpret" them intelligently, carrying out specific actions depending on the Module that sent this email.

Previous Work

Moodle currently has functions to allow for dealing with handling email bounces - see Email processing for more details. This framework allows for the module of a returned email to be uniquely identified, as well as arguments to be passed as part of the email that will be supplied to the module's handling code.

This framework currently uses base64 encoding in order to encode data used in the return address of the email - this requires case sensitivity, which most MTA's do not support by default. This will need to be changed to use base32 instead, which significantly decreases the amount of space available in the argument section. This can be mitigated using a "email session cookie table", as suggested here.

Database structures

A database table will be necessary in order to keep track of the relevant data attached to each email "session". The following design is proposed:

mdl_email_sessions:

  • id : int(10), key field, auto-inc - required by DML
  • b32key : int(10), unsigned - session number of the email, encoded in a base32 part of the VERP field. An index has to be created on this to allow for fast lookup when an email is received.
  • data : text(medium) - field to store data of any length mapped to this email session. This will probably be either a plain ascii parameter or a serialized object.
  • timestamp : int(10), unsigned - timestamp of the email sent which registered this session. This is to allow "old" sessions to be removed after a defined time interval when they are no longer valid during cron.

An entry is inserted into the session table if either bounce handling or the email interface is enabled. We must ensure our data field is large enough to handle decently sized serialized objects - a medium text field in mysql stores 16 million characters, which is more than enough.

By uniquely identifying each email by a session ID, we can solve the problem mentioned of handling repeats by removing the appropriate row from the table after the handler has run. Replies without an appropriate row could simply be discarded.

VERP structure

The structure of the VERP field has been changed several times since coding started - here is the latest version:

  • 4 characters: prefix for MTA ("mdl+", etc.)
  • 2 characters: encoded (base32 + pack('C')) module id to determine which processing function to call - module id 0 stands for Moodle 'core'.
  • 16 characters: encoded (base32 + pack('C')) user id - this is only used for bounce processing, and is not passed upwards to handling functions. Any user identification is expected to be stored within the session data field.
  • 16 characters: encoded (base32 + pack('C')) session id - this is used to identify the session row in mdl_email_sessions, and matches the b32key.
  • 26 characters: hash validating and securing the VERP data. HMAC-MD5-26 is used for this.

Security concerns

Rather than using the auto-incremental id field as the session key, which would reduce the amount of data we need to store per email in the session table, a randomly generated number is used instead. Because identity should be stored within the session data, it would be extremely unlikely that even if the site identifier were known and a collision computed that a person would know another session key that should be using. The identity of the user id field in the VERP address is used only for bounce detection purposes, but this requires a session key to have been generated beforehand (and MD5 to match) to prevent denial of service attacks to force a users email address to go over their bounce-quota.

Core functions

  • Framework - non-module code to support framework of the email interface.
    • Functions handling email sending - encoding of the reply-to address header.
      • Currently implemented as generate_email_process_address in lib/moodlelib.php.
      • Could be changed to something more specific - e.g. generate_email_session?
    • Functions handling email receiving - decode of the destination (reply-to) header.
      • Currently implemented as the command line mail parsing script admin/process_email.php.
      • This currently handles bounces as well as decoding and calling appropriate modules, and may not need to be changed much.
  • Module code - changes to specific modules to actually use the email interface.
    • Forum module
      • Subscribers to forums receiving "single messages" can have their emailed replies directly added as a reply to that specific post.
      • Subscribers to forums receiving "digest messages" may need some further interface to specify which post they are replying to, e.g. "3>>Message".
    • (Investigate other modules)

Major issues

Handling bounce messages (Delivery Status Notifications)

The framework is able to distinguish between messages sent by a user in response to an email, or automated DSN / "bounce" messages sent by a MTA. This is done via checking for a null return path, or existence of specific content-types which are identified as DSN specific by RFC 1894. Once a message has been identified as such, it is simply discarded - the session entry is not removed, as it is possible for these messages to be generated if an email is only temporarily unreachable.

Pruning the session table

Each session entry has an associated timestamp to identify when it was created - this can be used to prune the table of all entries before a certain period. A config variable - $CFG->mailsessiontime - stores the number of hours email sessions persist (defaulting to 1 month). A cron function (cleanup function) written in admin/cron.php is used to prune the session table.

Handling HTML

Some email clients send HTML encodings of the actual email message - should we parse these or simply provided functions to allow module code to parse them? Should we also handle MIME attachments like this?

Backwards compatibility

Existing code already is in place that implements a email interface (in base64) as well as a email bounce handling facility - this has been retained and new code made available so that existing modules depending on these do not need to be modified.

Digest messages for the Forum module

Users can choose for messages to be queued up in a single "digest" that is sent to them regularly. A method needs to be chosen to determine which message a reply to this digest is intended for.

User interface ideas

  • In "digest" style emails, we need to be able to identify which discussion the reply is intended for.
    • HTML formatted emails can have mailto: links which specify a pre-existing body which contains data on which discussion this is intended for.
    • Plain text emails could be handled by requiring the user to insert some separator (">>1" to refer to message one) before inserting his post.
    • Plain text emails can have some sort of separators between posts, and the reply must be inserted after the correct separator, e.g:
(Original email)
!Message 1 Separator!
Text of message one goes here.
!Message 2 Separator!
Text of message two goes here.
(Replied email)
> !Message 1 Separator!
> Text of message one goes here.
User inserts message one reply here if he wants to reply to this specific message.
> !Message 2 Separator!
> Text of message two goes here.
  • We need to be able to separate the "reply" of the email from the quote and any prefix added by the mail client.
    • HTML could contain specific elements (hr's) with id tags identifying the current discussion that the user would post between.
    • Plain text email could also emulate the above with a rather more crude "insert reply below this line" approach.

Timeline and current progress

July 9th was used as a milestone for completion of the framework and some changes to the Forum module - at this stage, there was a working prototype of the framework and appropriate Forum module code, but many further refinements to the framework could still be made, specifically parsing and utility functions that all modules could use to help separate data in returned email bodies from redundant lines.

Another milestone should be reached by the end of July - a working prototype of the framework and the Forum module processing functions that:

  • Correctly sends emails with appropriate bodies instructing users how to use the interface
  • Recognises responses from single-post mode mails, strips them of erroneous content, and replies to that specific post (including preserving the title of the post if necessary).
  • Recognises responses from digest mode mails including which post the reply is intended from, strips them of erroneous content and replies to that specific post.
  • Some sort of response email for replies to the email interface that encounter errors (such as the email session having expired).

(This will not necessarily be bug-free!)

The time until August 20th can be used to test the code, fix/debug it, and hopefully write a few more handlers for other modules - once the framework utility functions are in place, this should be fairly easy!

Back to: Student projects