Difference between revisions of "Student projects/Email interface"

Jump to: navigation, search

Note: You are currently viewing documentation for Moodle 2.0. Up-to-date documentation for the latest stable version is available here: Student projects/Email interface.

(Updated to latest changes)
(Shifted to dev docs)
Line 1: Line 1:
'''Programmer:''' Peter Boswood
''(This is very much still a work in progress, data here may be inaccurate or outdated!)''
Moodle uses email in order to communicate with end users in a variety of manners - for authentication, subscriptions in the Forum module, etc. In some instances, rather than simply ignoring replies from users to these emails, it may be possible to "interpret" them intelligently, carrying out specific actions depending on the Module that sent this email.
==Previous Work==
Moodle currently has functions to allow for dealing with handling email bounces - see [[Email processing]] for more details. This framework allows for the module of a returned email to be uniquely identified, as well as arguments to be passed as part of the email that will be supplied to the module's handling code.
This framework currently uses base64 encoding in order to encode data used in the return address of the email - this requires case sensitivity, which most MTA's do not support by default. This will need to be changed to use base32 instead, which significantly decreases the amount of space available in the argument section. This can be mitigated using a "email session cookie table", as suggested [http://moodle.org/mod/forum/discuss.php?d=24077#114289 here].
==Database structures==
A database table will be necessary in order to keep track of the relevant data attached to each email "session". The following design is proposed:
* id : int(10), key field, auto-inc - required by DML
* b32key : int(10), unsigned - session number of the email, encoded in a base32 part of the VERP field. An index has to be created on this to allow for fast lookup when an email is received.
* data : text(medium) - field to store data of any length mapped to this email session. This will probably be either a plain ascii parameter or a serialized object.
* timestamp : int(10), unsigned - timestamp of the email sent which registered this session. This is to allow "old" sessions to be removed after a defined time interval when they are no longer valid during cron.
An entry is inserted into the session table if either bounce handling or the email interface is enabled. We must ensure our data field is large enough to handle decently sized serialized objects - a medium text field in mysql stores 16 million characters, which is more than enough.
By uniquely identifying each email by a session ID, we can solve the problem mentioned of [[Email processing#Security_issues|handling repeats]] by removing the appropriate row from the table after the handler has run. Replies without an appropriate row could simply be discarded.
==VERP structure==
The structure of the VERP field has been changed several times since coding started - here is the latest version:
* 4 characters: prefix for MTA ("mdl+", etc.)
* 2 characters: encoded (base32 + pack('C')) module id to determine which processing function to call - module id 0 stands for Moodle 'core'.
* 16 characters: encoded (base32 + pack('C')) user id - this is only used for bounce processing, and is not passed upwards to handling functions. Any user identification is expected to be stored within the session data field.
* 16 characters: encoded (base32 + pack('C')) session id - this is used to identify the session row in mdl_email_sessions, and matches the b32key.
* 26 characters: hash validating and securing the VERP data. HMAC-MD5-26 is used for this.
==Security concerns==
Rather than using the auto-incremental id field as the session key, which would reduce the amount of data we need to store per email in the session table, a randomly generated number is used instead. Because identity should be stored within the session data, it would be extremely unlikely that even if the site identifier were known and a collision computed that a person would know another session key that should be using. The identity of the user id field in the VERP address is used only for bounce detection purposes, but this requires a session key to have been generated beforehand (and MD5 to match) to prevent denial of service attacks to force a users email address to go over their bounce-quota.
==Core functions==
* Framework - non-module code to support framework of the email interface.
** Functions handling email sending - encoding of the reply-to address header.
*** Currently implemented as generate_email_process_address in lib/moodlelib.php.
*** Could be changed to something more specific - e.g. generate_email_session?
** Functions handling email receiving - decode of the destination (reply-to) header.
*** Currently implemented as the command line mail parsing script admin/process_email.php.
*** This currently handles bounces as well as decoding and calling appropriate modules, and may not need to be changed much.
* Module code - changes to specific modules to actually use the email interface.
** Forum module
*** Subscribers to forums receiving "single messages" can have their emailed replies directly added as a reply to that specific post.
*** Subscribers to forums receiving "digest messages" may need some further interface to specify which post they are replying to, e.g. "3>>Message".
** ''(Investigate other modules)''
==Major issues==
===Handling bounce messages (Delivery Status Notifications)===
The framework is able to distinguish between messages sent by a user in response to an email, or automated DSN / "bounce" messages sent by a MTA. This is done via checking for a null return path, or existence of specific content-types which are identified as DSN specific by RFC 1894. Once a message has been identified as such, it is simply discarded - the session entry is not removed, as it is possible for these messages to be generated if an email is only temporarily unreachable.
===Pruning the session table===
Each session entry has an associated timestamp to identify when it was created - this can be used to prune the table of all entries before a certain period. A config variable - $CFG->mailsessiontime - stores the number of hours email sessions persist (defaulting to 1 month). A cron function (cleanup function) written in admin/cron.php is used to prune the session table.
===Handling HTML===
Some email clients send HTML encodings of the actual email message - should we parse these or simply provided functions to allow module code to parse them? Should we also handle MIME attachments like this?
===Backwards compatibility===
Existing code already is in place that implements a email interface (in base64) as well as a email bounce handling facility - this has been retained and new code made available so that existing modules depending on these do not need to be modified.
===Digest messages for the Forum module===
Users can choose for messages to be queued up in a single "digest" that is sent to them regularly. A method needs to be chosen to determine which message a reply to this digest is intended for.
==User interface ideas==
* In "digest" style emails, we need to be able to identify which discussion the reply is intended for.
** HTML formatted emails can have mailto: links which specify a pre-existing body which contains data on which discussion this is intended for.
** Plain text emails could be handled by requiring the user to insert some separator (">>1" to refer to message one) before inserting his post.
** Plain text emails can have some sort of separators between posts, and the reply must be inserted after the correct separator, e.g:
(Original email)
!Message 1 Separator!
Text of message one goes here.
!Message 2 Separator!
Text of message two goes here.
(Replied email)
> !Message 1 Separator!
> Text of message one goes here.
User inserts message one reply here if he wants to reply to this specific message.
> !Message 2 Separator!
> Text of message two goes here.
* We need to be able to separate the "reply" of the email from the quote and any prefix added by the mail client.
** HTML could contain specific elements (hr's) with id tags identifying the current discussion that the user would post between.
** Plain text email could also emulate the above with a rather more crude "insert reply below this line" approach.
==Timeline and current progress==
July 9th was used as a milestone for completion of the framework and some changes to the Forum module - at this stage, there was a working prototype of the framework and appropriate Forum module code, but many further refinements to the framework could still be made, specifically parsing and utility functions that all modules could use to help separate data in returned email bodies from redundant lines.
Another milestone should be reached by the end of July - a working prototype of the framework and the Forum module processing functions that:
* Correctly sends emails with appropriate bodies instructing users how to use the interface
* Recognises responses from single-post mode mails, strips them of erroneous content, and replies to that specific post (including preserving the title of the post if necessary).
* Recognises responses from digest mode mails including which post the reply is intended from, strips them of erroneous content and replies to that specific post.
* Some sort of response email for replies to the email interface that encounter errors (such as the email session having expired).
''(This will not necessarily be bug-free!)''
The time until August 20th can be used to test the code, fix/debug it, and hopefully write a few more handlers for other modules - once the framework utility functions are in place, this should be fairly easy!
Back to: [[Student projects]]

Latest revision as of 03:54, 15 September 2011

This development related page is now located in the Dev docs.

See the Student projects/Email interface page in the Dev docs.