Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Scheduled Tasks Proposal: Difference between revisions

From MoodleDocs
Line 118: Line 118:


Penny and Tim originally thought that the best approach was to try and do something that would cause an exception to be thrown - for example, try to insert into a row that had a unique constraint on it, and catch the exception.  However, this will cause far too much noise in the logs.  Matt Oquist came up with a different approach in MDL-21110.  We could potentially change this slightly to allow alternative implementations, by means of an abstract class and factory method (this was suggested by Sam Marshall), but probaby isn't needed for the initial implementation.
Penny and Tim originally thought that the best approach was to try and do something that would cause an exception to be thrown - for example, try to insert into a row that had a unique constraint on it, and catch the exception.  However, this will cause far too much noise in the logs.  Matt Oquist came up with a different approach in MDL-21110.  We could potentially change this slightly to allow alternative implementations, by means of an abstract class and factory method (this was suggested by Sam Marshall), but probaby isn't needed for the initial implementation.
=== Black magic ===
Cron.php will need to be rewritten to look something like this:
<code php>
<?php
while ($nexttask = cron_get_next_task()) {
    cron_call_function($nexttask);
}
</code>
With some black magic to hand out the next task, which does:
* Checks how long the existing process is allowed to run for
* Figures out if there's already a "blocking" task running
* Figures out the next task that's scheduled
* Tries to get a lock on it
* Returns that task


== Unresolved issues/ideas ==
== Unresolved issues/ideas ==

Revision as of 05:16, 5 January 2010

Moodle 2.0

Introduction

This proposal is meant both to provide a replacement for the moodle cron job, and provide a means to schedule once off tasks to be run outside of the user's request lifecycle.

Terminology

  • *Subtask* an individual piece of cron processing that should be run (equivalent to forum_cron now, or maybe even smaller)
  • *Moodle cron instance* a cron.php process

Rationale

The moodle cronjob currently delegates all scheduling to each subtask that is run - for example, the forum cron is responsible for checking when it last run, and making decisions about whether or not it should be run again. This sort of decision process should be centralised, and individual cron subtasks should be called by the central controller.

Additionally, there is not any central locking of subtasks. At the moment, some subtasks that expect that they might take a long time to run implement their own locking (for example statistics), but it's not centralised. Each moodle cron instance runs to completion, no matter how long it takes, and it processes tasks in the order that they're programmed, regardless of if there are any other moodle cron instances running, that might be processing sub tasks in parallel

Finally, we need to be able to run non-related tasks in parallel so that the entire moodle queue isn't held up by single long running jobs.

Goals

  • Centralised locking for all tasks
  • A way consistent for all plugin types to register with Moodle (at installation/upgrade) when they want their jobs run
  • More sophisticated scheduling rather than just intervals in seconds (eg every sunday at 11pm or similar) based on unix cron
  • An administration screen in Moodle to allow site administrators to adjust the scheduling of individual tasks

Approach

Plugin cron registration

Each plugin will be able to provide a db/tasks.php (alongside access.php and events.php etc) that lists all the cronjobs that it wants to have run. This will look something like the following:

<?php $tasks = array(

   array(
       'function'    => 'yourmodule_cron_somedescription',
       'minute'      => '*',
       'hour'        => '*',
       'day'         => '*',
       'month'       => '*',
       'dayofweek'   => '*',
       'description' => 'langstringkey', // this must correspond to get_string('langstringkey', 'yourmodule');
   ),

);

The fields are the same as normal unix cron, with the exception that you cannot use 3 letter words for the month and day of week fields like you can for unix cron. The following is straight from the unix manpage about cron:


              field          allowed values
              -----          --------------
              minute         0-59
              hour           0-23
              day of month   1-31
              month          1-12 (or names, see below)
              day of week    0-7 (0 or 7 is Sun, or use names)

       A field may be an asterisk (*), which always stands for ``first-last''.

       Ranges  of  numbers  are  allowed.  Ranges are two numbers separated with a hyphen.
       The specified range is inclusive.  For example, 8-11 for an ``hours'' entry specifies
       execution at hours 8, 9, 10 and 11.

       Lists are allowed.  A list is a set of numbers (or ranges) separated by commas.
       Examples: ``1,2,5,9'', ``0-4,8-12''.

       Step values can be used in conjunction with ranges.  Following a range with ``/<number>''
       specifies  skips  of  the  number's  value  through  the  range.   For  example,
       ``0-23/2'' can be used in the hours field to specify command execution every other hour
       (the alternative in the V7 standard is ``0,2,4,6,8,10,12,14,16,18,20,22'').  Steps
       are also permitted after an asterisk, so if you want to say ``every two hours'', just use ``*/2''.

Database

scheduled_tasks:

Field Datatype Comment
id integer sequence
plugintype varchar(50) plugintype - should match the path-style declarations in get_plugin_types (eg question/type, not qtype). Will be null for core tasks.
pluginname varchar(50) name of the plugin. Will be null for core tasks.
callfunction varchar(200) (unique) the function to call. Must be unique, as it will be used for the locking.
lastruntime int(10) unix timestamp
nextruntime int(10) unix timestamp
blocking int(1) 0 or 1 - whether this task, when running, blocks everything else from running.

The original database specification had extra custom fields that the cronjobs could insert information into, called custom1 and custom2 and so on. I've removed these from this specification until such time as we have a solid use-case for them. The original spec also had a priority field, but this has been removed after the conversation in Jizerka, which led to the proposal to just let some core jobs block all others, rather than prioritising individual tasks.

Locking

Penny and Tim originally thought that the best approach was to try and do something that would cause an exception to be thrown - for example, try to insert into a row that had a unique constraint on it, and catch the exception. However, this will cause far too much noise in the logs. Matt Oquist came up with a different approach in MDL-21110. We could potentially change this slightly to allow alternative implementations, by means of an abstract class and factory method (this was suggested by Sam Marshall), but probaby isn't needed for the initial implementation.


Black magic

Cron.php will need to be rewritten to look something like this:

<?php

while ($nexttask = cron_get_next_task()) {

   cron_call_function($nexttask);

}

With some black magic to hand out the next task, which does:

  • Checks how long the existing process is allowed to run for
  • Figures out if there's already a "blocking" task running
  • Figures out the next task that's scheduled
  • Tries to get a lock on it
  • Returns that task

Unresolved issues/ideas

  • Do we need to allow for scheduling different subtasks on different servers?
  • We need to find a way to separate subtasks by some logic so that subtasks that write to the same areas of the database never run at the same time. We could do this by getting each cron job to say what areas of Moodle they write to, but this is problematic.
  • We also have to deal with the order of some subtasks - we could maybe do this by introducing dependencies
  • When the first cron in a long time is running, we should lock the entire cron and let it run to completeness, because the order is really important then.

Psuedo code proposal

Moved to the talk page

Audit of current cron

Main section Subtask Frequency Notes
session_gc every run
mod/assignment plugins (none) every minute
mod/assignment message submissions every minute checks last run time
mod/chat update chat times every five minutes
mod/chat update_events every five minutes
mod/chat delete old chat_users and add quits every five minutes
mod/chat delete old messages every five minutes
mod/data every minute no _cron function (includes file unnecessarily)
mod/forum mail posts every minute checks last run time
mod/forum digest processing every minute
mod/forum delete old read tracking every minute
mod/scorm reparse all scorms every five minutes does hourly checking
mod/wiki delete expired locks every hour
blocks/rss_client update feeds every five minutes
quiz/report/statistics delete old statistics every run
admin/reports none every run
language_cache every run
remove expired enrolments every run
main gradebook lock pending grades (*2) every run
main gradebook clean old grade history every run has a TODO to not process as often
event queue every run potentially large
portfolio cron clean expired exports every run potentially large
longtimenosee 20%
deleteunconfirmedusers 20%
deleteincompleteusers 20%
deleteoldlogs 20%
deletefiltercache 20%
notifyloginfailures 20%
metacourse syncing 20%
createpasswordemails 20%
tag cron 20%
clean contexts 20%
gc_cache_flags 20%
build_context_path 20%
scheduled backups daily (admin defined)
make rss feeds every run
auth/mnet keepalives every run
auth/mnet delete old sessions every run
auth/ldap sync users custom not scheduled (external cronjob)
auth/cas sync users custom not scheduled (external cronjob)
auth/db sync users custom not scheduled (external cronjob)
enrol/authorize clears old data daily (admin defined)
enrol/authorize notifies administrators of old data daily (admin defined)
enrol/authorize process orders & email teachers every run
enrol/flatfile read file and sync users every run !?!
enrol/imsenterprise read file and sync users every run !?!?!
enrol/manual notify people of pending unenrolments daily
statistics daily (admin defined) huge
grade/import none every run
grade/export none every run
grade/reports none every run
fetch blog entries every run
file gc (optional) daily, else every run?
local cron

Tasks

  • Audit all existing cronjobs (done)
  • Implement the locking code, either Matt's or something similar and write robust tests for it (this will be hard to test perhaps - can we test race conditions using simpletest?)
  • Write the black magic that hands out the next task to be run for a given cronjob
  • Rewrite cron.php to use the black magic
  • Migrate all the existing cronjobs to the new system
  • Test thoroughly