Backup 2.0 general architecture

Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.

This page tries to explain briefly all the rationale behind Backup 2.0. It isn't a complete and exhaustive description of the whole thing, but a summary showing the main concepts involved.

Nomenclature

When talking about backup, there are some basic terms that it's better to have clearly defined. Here they are:

Backup (or restore) Type: Simply, what is being handled, and, at the time of writing this, can be: 1 activity, 1 section or 1 course.
Backup Format: The "physical" representation of the information to be handled and its organization in one transferable file. The main format is, of course, the "moodle2" format (the one that guarantees the max data to be preserved when moving Moodle contents), with others in the pipeline, like the "imscc" format and friends...
Interactivity: To determine if the process, once launched, requires user interaction or not. Basically an interactive backup/restore is needed in order to give the user a UI to adjust all the settings (options) before the process is launched.
Execution:: To determine if the execution of the process is immediate or delayed (scheduled?). Has implications in the output of the process, logging and others.
Backup Mode: Can be considered like "purposes" of the backup, and based on them, various options will be defined and locked. Think on them like some sort of presets or hardcoded rules altering the settings and process.
Backup Defaults: Or how settings are configured initially, each time a backup/restore is requested. They are defined for each (backup) Type and Format and are altered depending on the Mode selected.
Backup UI: The front-end displayed to the user in Interactive backups, allowing him/her to configure the backup settings. This includes any feedback from the backup system before the execution process is started. Strongly form-based.
Backup Output: The front-end displayed to the user in Interactive and Immediate backups showing information after the process starts, showing any information about the different steps being executed, final status...

Backup

The components

These are the main components involved in backup and their main responsibilities in the process:

Backup Controller: (cvs) It's the main object in the life of one backup operation. It contains all the information needed in order to be able to execute a backup successfully (you can see some more details in the next section). From the complete specs of the backup (type, mode, execution, interactivity, source...), including all the settings, logging, output options and also the whole backup plan to be executed and various steps that any backup must fulfil (load defaults, ui invocation, apply security constraints...). Everything happens in / is done by the backup controller. One important feature is that it must be 100% serializable, so all their elements (before process execution) must be also, serializable. This guarantees we can delay execution, safely storing the whole controller in DB while we avoid completely using sessions for backup transient information.

Backup Loggers: (cvs) Typical loggers chain able to log information to various places (DB, file, error_log...). With typical logging levels and so on. Should be configurable globally from admin options. Each backup controller has one chain of loggers instantiated.

Backup Destinations (cvs) (not implemented yet!): The basic idea about backup destinations is to be able to send the final backup file to/over a variety of systems. For example, in backups of Moodle = HUB, we could be sending the backup file to the corresponding Moodle HUB, or to other repositories, or by email, or direct download... or any combination of these. For now, they only will be able to send the backup files to different file areas within Moodle, or perhaps they won't be used at all initially. To be decided.

Backup Settings: (cvs) Anything altering backup behaviour must be considered to be one setting. No matter if it has visual representation in the UI or not, or if can be configured by the user or not. It's a setting. And other components in backup will rely on setting values to conditionally perform different actions. Note that each setting has one level (root, course, section, activity) and they follow somehow the observer pattern so relations and dependencies between them can be specified easily (and UI must respect them).

Backup UI: (cvs) (not implemented yet!): All the stuff that, once all the settings have been created and pre-defined (observing defaults, security and dependencies), will organize them in one usable front-end, allowing the user to interact with them, changing the final behaviour of the process.

Backup Output: (cvs) In charge of show information about the process, this singleton can receive requests from the backup loggers and/ or by direct invocation. Current implementation is pretty simple (just prints one list of messages, but could be extended to support progress bars and /or other commodities).

Backup Structure: (cvs) The base component on top of which all the rest of backup is built. Implements one simple PHP API allowing to define (virtually) any structure suitable to be sent to xml, fetch any information (using the visitor pattern to recursively process the whole tree) from iterator and perform other operations in a transparent and consistent way. In three words, "the heart of backup" (well, they are four).

Backup Plan: (cvs and cvs2) The execution plan, dependent of the Type and Format of the backup. Split into tasks, each one having one or more steps. Note there are two types of steps: execution steps that only execute (custom) php code, used for anything not being XML output and structure steps that, using the "backup structure" component, are used basically to generate XML output. Note they are spread out in the moodle code, so each block and activity, for each format available, will have its own plans, tasks and steps. Also in three words, "backup's skeleton" (grrr, they are two now).

Other components: Last, some more components that are good to know about, all them being non-instantiable classes with static execution of methods:
- Backup checks: (cvs) Various fixed checks performed by the controller. Security being the most important.
- Backup dbops: (cvs) Operations involving DB access must go here, not 100%, but at least to avoid "spaghetti code" in above (critical) components.
- Backup helper: (cvs) Classes and methods not being part of other components but being used along the backup process. Some iterator implementations to help backup structures, some functions used more than 1 time...
- Backup factories: (cvs) Some methods in charge of, dynamically, instantiate different objects along the backup process. Mainly supporting backup plans and tasks.
- Backup interfaces: (cvs) Very basic interfaces to enforce the implementation of some methods in order to achieve functionalities in a consistent way (checksum-able, process-able....).
- Backup xml-writer: (cvs) Very low level XML writer implementation, with error detection, support for memory/file output, dynamic contents transformations. Used extensively in the process to output all the XML contents.
- ... (note there are some more components, but they don't seem to be important enough to be introduced here).

The process

Any backup will go through the following steps:

Backup controller instantiation, that automatically
1. Performs various checks, prepare loggers, destinations and other components
2. Load the backup plan (so settings become available)
3. Apply defaults to settings (based in $CFG config options and backup type/format/mode)
4. Apply security constraints to settings (modifying settings if necessary)
If the backup is interactive, looping over the next steps until finished:
1. Save / load the controller as necessary (to make it persistent along multiple requests).
2. Show UI (observing settings status, values and dependencies)
3. Process UI changes
4. Re-apply security constraints to settings (with error if something is violated)
Finish UI if was used (interactive)
Save the controller if execution delayed (cron will get it later - once implemented)
Execute controller plan (so everything will be generated)

So, just as an example for the impatient people, literally following the steps above, and knowing that the UI implementation and other bits are still missing, here it's your first course backup (just replace the XXs and execute it). Please ignore any output in browser / command line.

<?php

require_once('config.php');
require_once($CFG->dirroot . '/backup/util/includes/backup_includes.php');

$course = XX; // id of the course to backup
$user   = XX; // id of the user performing the backup

$bc = new backup_controller(backup::TYPE_1COURSE, $course, backup::FORMAT_MOODLE,
                            backup::INTERACTIVE_YES, backup::MODE_GENERAL, $user);
$bc->finish_ui();
$bc->execute_plan();
$bc->get_results();

You should end with one directory under $CFG->dataroot/temp/backup having the complete backup contents within it. Time to take a look to it.

The results

Temp note: First of all, did I say that your execution above should be performed in one course having some forum(s)? As far as right now, forum is the only activity with backup implemented... it's a good recommendation. ;-) In the other side, all blocks, filters, comments, ratings, role assignments and overrides ... are fully supported, so use them here and there.

Temp note2: There are some missing bits in the explanations below. More noticeable are questions and their categories and complete gradebook information. Will be added once decided and implemented.

This section tries to summarize the final results of one "moodle2" format backup. As listed in the requirements of the project, we are moving from one "monolithic" moodle.xml file to one multi-file format, better structured and easier to be handled by restore. At the same time, while contents will be practically the same (as far as the xml generated) continues being one structured "database dump" one of the main visually differences is that we have, finally, moved to lowercase tag names (Moodle 1.x used to have everything uppercase).

Also, we have introduced some new "references" files will provide information about which part of the backup is using which instances of some elements so, on restore, if we are restoring one course partially, only the "used" files / groups / users / scales ... will be restored. This was one of the major design limits in Moodle 1.x backup, where those items were restored always completely and not selectively based in that extra "usage" information.

From an organizational point of view, folders follow the schema of available Backup Types (see above). So, in the backup generated file we are going to find, always, one of more of this folders:

course: (present only in 1 course backups). Will contain information about the course (course.xml), role uses (assignments and overrides), comments, filters, logs and, of course, blocks (we comment about them below) and the corresponding "references" (inforef.xml) file, pointing about uses of some key components, as explained above.
sections (present in 1 course and 1 section backups). Will contain one directory for each section included in backup (numbered with the section->id). Each section will contain information about the section (section.xml) and its corresponding "references" file (just referencing file uses).
activities (present in all backups). Will contain one directory for each activity included in backup (numbered with the cm->id). Each activity will contain information about the activity (activityname.xml), its module characteristics (module.xml), role uses, filter, comments, user completion, logs, activity grade items...) and, once more, blocks and the corresponding references file.

Special mention is necessary for the "blocks" directory (present under each course / activity dir). As far a blocks are 1st class citizens in Moodleland, they can have a lot of information associated so they have their own space in backups. It will be one directory for each block belonging to that course/activity (numbered with block_instance->id). Each block directory will contain information about the block (block.xml), optionally information about the DB related to the block (blockname.xml), roles, comments and the corresponding references file.

Finally we have to talk about the backup root directory and what we can find there:

moodle_backup.xml: it's the root file in the backup. It contains information about the backup itself: versions, details, summary of contents and full list of settings, as configured on backup. We can consider it like the "map" that shows/describes the whole backup and it will be highly useful in the restore process.

files.xml and "files" directory. It contains all the files used in the backup process, no matter of which course/section/activity is using them. Its internal structure is kinda similar to current moodle file storage structure, although this could diverge (safely) in the future. Of course, it's important to comment that we have all those "reference" (inforef.xml) files spread along the course/section/activity/block directories above so it will be easy to pick the needed on restore.

other files (users, roles, scales, groups, outcomes...). All these follow the same pattern as the files.xml file above, as far as all them are merely "stores" of all the information used by any backup part. Once more, reference files will point to the required ones on restore.

That is!

Documentation