Development:Backup 2.0
Template:Development:Backup 2.0Template:Moodle 2.0
Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.
Summary & Objectives
The backup & restore functionality has been present since Moodle 1.1 (29 August 2003, more than six years ago!) and, although it has been continuously improved and renewed, the core has remained basically the same.
Many things have evolved since then both in Moodle and PHP-land so this project is an attempt to rework the backup & restore functionality in order to achieve these goals:
- Modernise the code base, using actual PHP/DB techniques: OOP, exceptions, temporary DB tables, transactions... aiming to reduce the current "spaghetti code" in various places.
- Unify current developments, so all backup alternatives (manual, export, silent, scheduled...) will be executed by the same code base (also for restore/import of course).
- Improve the process, making it more reliable, being able to pre-detect wrong situations, rollback (if possible) to previous states, and apply any possible benefit both for speed (some interim in-memory caches) and memory (better parsing techniques).
- Improve Security, so the Moodle permissions structure will be 100% respected and any user data disclosure will be avoided, of course, whilst giving the admins the flexibility to change default behaviours by configuring them in a standard way.
- New features requested from the Community will be added. Things like fine grained control of various restore parts, the ability to backup not only course structures, but sections or individual activities, anonymised backups and so on will début with Moodle 2.0. See the requirements section for more info.
Requirements
This section lists the ideas added to this page in the past few months. Each requirement, if necessary, will have its own explanation/page to have it properly defined.
Overview
- Note about 1.9 pending tasks: Creation of users & creation and assignment of roles/caps. Must be in 1.9.8.
About 2.0 backup/restore:
- Change format dramatically, splitting current monolithic moodle.xml into smaller pieces (1xplugin, 1xusers, 1xroles, 1x...). Working in experimental way since 1.9, proved to be great for speed (saves ~20 repeated parsings of the whole file). No dependencies from previous stuff at all.
- Change the current "xmlize" approach when restoring files to one new progressive_parser instead. Each part of restore will decide how it wants to receive the information and who is in charge of processing it. Great for memory.
- Allow 1-activity, 1-section and 1-course backup. Use different shells/envelopes for any type of backup, keeping the "content" the same.
- Unify all multiple backup variations (course backup, course export, silent backup, scheduled backup) into a single codebase. Same for multiple restore.
- Backup and restore execution function (only ONE function, the executor!!) will be "blind": They won't perform decisions based on (changing) roles/capabilities/configuration settings any more. One master "backup_director" object will direct them once the object has performed all validations and ensured the process is executable. That director will handle/decide/instruct the executor about output/logging and everything else.
- Store complete information about any backup/restore performed in the site. This will include all the information for the the director (array-serialised) and the results of the executor. And obviously all the logged info cached when running.
- Important UI changes:
- UI won't make any processes! (like pre-calculating, or post-checks and friends). It will be completely separate from the "director" and the "executor".
- Complete site/user defaults
- View the backup/restore form in "course" organisation (inline? nah for 2.0 IMO) instead of grouped by activity mode. Surely over multiple configuration "steps".
- Bunch of new detail-options (restore section descriptions, overwrite course settings..., usually requested).
- Reuse code like crazy. Some things like the "id-remapping" on restore, the logs output, the "flush" output and so on will be central and any point of restore will be able to request that "services" or "utilities" when desired in a easy way (mini PHP-API to define things in a declarative way VS one/multiple helpful backup/restore "utilities" classes).
General architecture
Known problems
- Improve XML parsing: One of the major bottlenecks in backup and, especially, in restore reliability under Moodle 1.x has been the large amount of memory needed to handle those operations (see bugs like MDL-14302, MDL-15489, MDL-9838... and many others). While the whole XML file (moodle.xml) is parsed in a SAX way (hence, "streamed" and requiring small amounts of memory), we use to group some XML contents into "parts" in order to delegate the operations over those "parts" to different plugins (modules, blocks...). And problems arrive when some of those "parts" are big enough and processing them with the xml_parse_into_struct() function and the xmlize library uses exaggerated amounts of memory (for example, to process a 12.5MB file requires 311MB of memory, crazy!). So we need to switch to an alternate method to parse those "parts" using much less memory and to build the corresponding in-memory object with one acceptable throughput (speed).
- Upwards compatibility of Moodle 1.9.x backups: While the Backup 2.0 multiple formats document describes how different formats will be supported by the restore process in Moodle 2.0, it's going to be a highly complex task to perform the transformation of Moodle 1.9.x backups into the new, improved, Moodle 2.x backup format. Too many things have changed between both versions to be able to achieve this easily. Surely it's computable (as long as upgrade is being able to apply the correct logic, so restore will) but it is going to be handled as a separate development that will allow us to 1) be centred in the 2.0 backup/restore and 2) be free (both in mind and implementation) from any 1.9.x XML format dependency.
Implementation plan
See also
To recap
- All the existing/required caps define behaviour for each one.
- All the zones and splitter.
- Files backup/ restore.
- 1.9 => 2.0 transformations.
- Shells, separate structure and activities/blocks.
- Backup by "context". Each "context"(or inner items) will have:
- users
- files
- logs
- blocks
- role assignments
- role overrides
- completion info
- conditional info
- comments
- tags
- questions and categories
- filters
- metadta
(alternatively, all these parts can be found at the end of the backup file, globally for all contexts)
Some points
- Need stamps for users (perhaps not because of MDL-16658) and roles (like question/categories ones) to guarantee uniqueness.
- Need to decide about backup file names (moving from .zip to .xxx). Multiple extensions or prefixed info to "tell more" about the backup type.
- 1.9 => 2.0 restore:
- Really complex, needs to mimic all current logic in upgrade.php scripts:
- xml transformations --> relatively easy
- files migration (course and moddata)
- wiki, resources, workshop transformations (complex)
- incoming question bank changes.
- surely other stuff (comments, messages, blogs)
- Possible solutions:
- How:
- Reuse as much as possible code from upgrade scripts. Not everything is re-usable (bulk changes).
- Re-implement all the logic in restore.
- Forgetting about it
- When:
- In one step, so Moodle 2.0 will natively-on-the-fly be able to process 1.9 files. Impossible!
- Keep it within Moodle, so conversion will be executed transparently immediately before restore (in the original plan at Development:Backup 2.0 multiple formats.
- Completely separate utility, so files need to be transformed before being accepted by restore.
- Never
- Where:
- Restore everything into some duplicate tables in order to do the conversion there.
- Do that into memory/some specialised tables not duplicating anything.
- Nowhere
- How:
- Really complex, needs to mimic all current logic in upgrade.php scripts:
- Scheduled backups (it's just a "timed" backup iterator). Do we want it to continue being part of Moodle or have a separate utility. Using moodle cron or its own one?
- Incremental backup/restore. Is also a external thing. We provide constant "content" by separating envelopes and content, but all the "diff", "patch" stuff is no related with Moodle at all.
- Structures over course level. They are a pain. Can lead to non-restorable courses or differently-behaviour courses.
- ... more ideas/things..