Scheduled Tasks Proposal
Moodle 2.0
Status
This work is now complete (Moodle 2.7) and the API docs for the new system are here: https://docs.moodle.org/dev/Task_API
Introduction
This proposal is meant both to provide a replacement for the moodle cron job, and provide a means to schedule once off tasks to be run outside of the user's request lifecycle.
Terminology
- Subtask an individual piece of cron processing that should be run (equivalent to forum_cron now, or maybe even smaller)
- Moodle cron instance a cron.php process
Rationale
The moodle cronjob currently delegates all scheduling to each subtask that is run - for example, the forum cron is responsible for checking when it last run, and making decisions about whether or not it should be run again. This sort of decision process should be centralised, and individual cron subtasks should be called by the central controller.
Additionally, there is not any central locking of subtasks. At the moment, some subtasks that expect that they might take a long time to run implement their own locking (for example statistics), but it's not centralised. Each moodle cron instance runs to completion, no matter how long it takes, and it processes tasks in the order that they're programmed, regardless of if there are any other moodle cron instances running, that might be processing sub tasks in parallel
The existing events API seems like it should provide a way to schedule tasks to be run outside of a user's request cycle, but in reality this just adds to the existing cron problem. We need to have a way in Moodle to schedule once off tasks to be run "at the next available time" which are picked up by cron. This can be used to process the event queue, but also for some code to just register a new once off cron event "on the fly" and be picked up on the next run.
Finally, we need to be able to run non-related tasks in parallel so that the entire moodle queue isn't held up by single long running jobs.
Goals
- Centralised locking for all tasks
- A way consistent for all plugin types to register with Moodle (at installation/upgrade) when they want their jobs run
- More sophisticated scheduling rather than just intervals in seconds (eg every sunday at 11pm or similar) based on unix cron
- An administration screen in Moodle to allow site administrators to adjust the scheduling of individual tasks
- An easy way for core and module code to schedule a once off task to be run as soon as possible
Approach
Plugin cron registration
Each plugin will be able to provide a db/tasks.php (alongside access.php and events.php etc) that lists all the cronjobs that it wants to have run. This will look something like the following:
<?php
$tasks = array(
array(
'function' => 'yourmodule_cron_somedescription',
'minute' => '*',
'hour' => '*',
'day' => '*',
'month' => '*',
'dayofweek' => '*',
'description' => 'langstringkey', // this must correspond to get_string('langstringkey', 'yourmodule');
),
);
The fields are the same as normal unix cron, with the exception that you cannot use 3 letter words for the month and day of week fields like you can for unix cron. The following is straight from the unix manpage about cron:
field allowed values ----- -------------- minute 0-59 hour 0-23 day of month 1-31 month 1-12 (or names, see below) day of week 0-7 (0 or 7 is Sun, or use names) A field may be an asterisk (*), which always stands for ``first-last''. Ranges of numbers are allowed. Ranges are two numbers separated with a hyphen. The specified range is inclusive. For example, 8-11 for an ``hours'' entry specifies execution at hours 8, 9, 10 and 11. Lists are allowed. A list is a set of numbers (or ranges) separated by commas. Examples: ``1,2,5,9'', ``0-4,8-12''. Step values can be used in conjunction with ranges. Following a range with ``/<number>'' specifies skips of the number's value through the range. For example, ``0-23/2'' can be used in the hours field to specify command execution every other hour (the alternative in the V7 standard is ``0,2,4,6,8,10,12,14,16,18,20,22''). Steps are also permitted after an asterisk, so if you want to say ``every two hours'', just use ``*/2''.
The unix crontab manpage goes on to say that one can use 3 letter words in the month and dayofweek fields (eg Sun or Feb). I don't think this is necessary for our implementation.
Database
scheduled_tasks:
Field | Datatype | Comment |
id | integer | sequence |
component | char(255) | the component that defines this task (e.g. used for uninstall) |
classname | char(255) (unique) | the class that extends the scheduled_task. Must be unique, as it will be used for the locking. Autoloading will be used to find the class to load. |
lastruntime | int(10) | unix timestamp |
nextruntime | int(10) | unix timestamp |
blocking | int(1) | 0 or 1 - whether this task, when running, blocks everything else from running. |
minute | varchar(25) | |
hour | varchar(25) | |
day | varchar(25) | |
month | varchar(25) | |
dayofweek | varchar(25) | |
faildelay | integer(10) | Used the throttle failing tasks. |
customised | integer(1) | 0 or 1 - whether this time differs from what is in code |
This table is for all the normal scheduled regular tasks. The time fields are intially populated when a plugin is installed or upgraded, but can be overridden by an administrator, which sets the "customised" flag to 1. If the administrator later decides to revert their customisation, the original code-values are repopulated into this table.
adhoc_tasks:
Field | Datatype | Comment |
id | integer | sequence |
component | char(255) | the component that defines this task (e.g. used for uninstall) |
classname | char(255) (unique) | the class that extends the adhoc_task. Must be unique, as it will be used for the locking. Autoloading will be used to find the class to load. |
nextruntime | int(10) | unix timestamp |
customdata | text | any data or serialised information |
blocking | int(1) | 0 or 1 - whether this task, when running, blocks everything else from running. |
This table is for the once off tasks, which are run and then deleted. There isn't a unique constraint on 'classname' in this table, because the same once off task may be scheduled twice before the first one is processed. In this case, the named lock will be obtained using a combination of classname and the id.
The original database specification had extra custom fields in the main scheduled_tasks table, that the cronjobs could insert information into, called custom1 and custom2 and so on. I've removed these from this specification until such time as we have a solid use-case for them. It is certain that we need it for the once off tasks, however, but in this case I think it's better to just have a text field that can either contain a single value, or a serialised blob of information.
The original spec also had a priority field, but this has been removed after the conversation in Jizerka, which led to the proposal to just let some jobs block all others, rather than prioritising individual tasks. This decision may be later reverted especially for once off tasks, where some may be really urgent.
Locking
Penny and Tim originally thought that the best approach was to try and do something that would cause an exception to be thrown - for example, try to insert into a row that had a unique constraint on it, and catch the exception. However, this will cause far too much noise in the logs. Matt Oquist came up with a different approach in MDL-21110. We could potentially change this slightly to allow alternative implementations, by means of an abstract class and factory method (this was suggested by Sam Marshall), but probaby isn't needed for the initial implementation.
Black magic
Cron.php will need to be rewritten to look something like this:
<?php
while (!major_change_happened() && $nexttask = cron_get_next_task()) {
cron_call_function($nexttask);
}
With some black magic to hand out the next task, which does:
- Checks how long the existing process is allowed to run for
- Figures out if there's already a "blocking" task running
- Figures out the next task that's scheduled
- Tries to get a lock on it
- Returns that task
Note about "major_change_happened()" - this is because of this situation: Task A starts - this task takes an hour to complete. Since it runs for an hour - any API using static caches will cache records that are up to an hour stale by the time the task finishes. If this cron process just starts the next task in the queue, and something major has happened in the meantime (e.g. course deleted, lots of enrolments added etc) it may do a lot of wrong things due to the stale data in it's static caches. The partial solution we came up with is to add a function that can be called to mark the time of last "major" change - then before starting any new task see if any major changes have happened since this cron started. If there were major changes, we can just exit which will clear the static caches for this cron process, and the next one will continue from where we were in the queue. We can implement this with a record in the config table storing the timestamp of the last major change.
- I am not so sure major_change_happened is the right abstraction. It sounds tempting, but how on earth could one implement it reliable? an alternative is to change it to while (time() < $timestart + CRON_TIME_LIMIT) { ... }. Where CRON_TIME_LIMIT is about 1 minute. Then exit the script with either 0, if there are no tasks waiting, or 1 if there are still tasks in the queue. Then, ideally, one would run cron in a wrapper script with, if the exit is 1 immediately re-runs it, or if 0 is passed, then waits the appropriate delay. That is probably simpler overall.--Tim Hunt (talk) 16:41, 5 December 2013 (WST)
- The problem with a wrapper script - is that it is not easy for people on shared hosting etc to run a wrapper script, where as now they have the option of hitting the cron url via some other automated process to trigger cron. Adding retry logic to that makes things too complicated. Also - time to run is not really the important factor - a quick enrolment change could be enough to invalidate the data in the static caches and then you want to force those caches to be purged before running any more cron jobs--Damyon Wiese (talk) 19 February 2014 (WST)
Unresolved issues/ideas
- It might be nice at some point to find a way to allow different subtasks to run on different servers by designation. This could be eventually added in to the administration screens as an extra setting (IP address)
- We obviously need some way to avoid different tasks trampling on eachother. We ran through a number of ideas already, from differentiating between read/write operations, to having dependencies or conflicts between tasks, to having each task say which database tables it uses. Finally we decided it would be best to just have some tasks that are able to simply block all others from being run. Anything to do with authentication and enrolment must block other tasks from running, as otherwise there could be the problem of for example, forum posts being emailed out just before someone is unenrolled from a course.
- When the first cron in a long time is running, we should lock the entire cron and let it run to completeness, because the order is really important then. This means that there also needs to be some global lastcronruntime flag somewhere (like in the config table)
Psuedo code proposal
Moved to the talk page
Audit of current cron
Main section | Subtask | Frequency | Notes |
---|---|---|---|
session_gc | every run | ||
mod/assignment | plugins (none) | every minute | |
mod/assignment | message submissions | every minute | checks last run time |
mod/chat | update chat times | every five minutes | |
mod/chat | update_events | every five minutes | |
mod/chat | delete old chat_users and add quits | every five minutes | |
mod/chat | delete old messages | every five minutes | |
mod/data | every minute | no _cron function (includes file unnecessarily) | |
mod/forum | mail posts | every minute | checks last run time |
mod/forum | digest processing | every minute | |
mod/forum | delete old read tracking | every minute | |
mod/scorm | reparse all scorms | every five minutes | does hourly checking |
mod/wiki | delete expired locks | every hour | |
blocks/rss_client | update feeds | every five minutes | |
quiz/report/statistics | delete old statistics | every run | |
admin/reports | none | every run | |
language_cache | every run | ||
remove expired enrolments | every run | ||
main gradebook | lock pending grades (*2) | every run | |
main gradebook | clean old grade history | every run | has a TODO to not process as often |
event queue | every run | potentially large | |
portfolio cron | clean expired exports | every run | potentially large |
longtimenosee | 20% | ||
deleteunconfirmedusers | 20% | ||
deleteincompleteusers | 20% | ||
deleteoldlogs | 20% | ||
deletefiltercache | 20% | ||
notifyloginfailures | 20% | ||
metacourse syncing | 20% | ||
createpasswordemails | 20% | ||
tag cron | 20% | ||
clean contexts | 20% | ||
gc_cache_flags | 20% | ||
build_context_path | 20% | ||
scheduled backups | daily (admin defined) | ||
make rss feeds | every run | ||
auth/mnet | keepalives | every run | |
auth/mnet | delete old sessions | every run | |
auth/ldap | sync users | custom | not scheduled (external cronjob) |
auth/cas | sync users | custom | not scheduled (external cronjob) |
auth/db | sync users | custom | not scheduled (external cronjob) |
enrol/authorize | clears old data | daily (admin defined) | |
enrol/authorize | notifies administrators of old data | daily (admin defined) | |
enrol/authorize | process orders & email teachers | every run | |
enrol/flatfile | read file and sync users | every run | !?! |
enrol/imsenterprise | read file and sync users | every run | !?!?! |
enrol/manual | notify people of pending unenrolments | daily | |
statistics | daily (admin defined) | huge | |
grade/import | none | every run | |
grade/export | none | every run | |
grade/reports | none | every run | |
fetch blog entries | every run | ||
file gc | (optional) daily, else every run? | ||
local cron |
Tasks
- Audit all existing cronjobs (done)
- Implement the locking code, either Matt's or something similar and write robust tests for it (this will be hard to test perhaps - can we test race conditions using simpletest?) (done)
- Write the black magic that hands out the next task to be run for a given cron process (done)
- Rewrite cron.php to use the black magic (done)
- Migrate all the existing cronjobs to the new system (done)
- Write screens to allow administrators to reschedule tasks (done)
- Write code to transfer between unix-cron-syntax and user-friendly syntax (and vice versa) (done)
- Write code to capture requests to schedule once off tasks (done)
- Update portfolio code to use once off tasks rather than events API
- Evaluate integration of event API to once off tasks
- Test thoroughly (done)
The above tasks are now in MDL-25499 as sub-tasks to implement this proposal.