Backup 2.0 for developers

Template:Development:Backup 2.0Moodle 2.0

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

Introduction

This page tries to explain, from a development perspective, how to implement the backup feature for various Moodle 2.x plugins, mainly, modules and blocks.

Note that, at the time of writing this, the backup & restore subsystem itself is under development, so still there are some missing bits, specially about the way to handle subplugins (question types, data fields...) under the new backup infrastructure, so any module using such artifacts, like data, assignment, quiz, workshop, aren't good candidates right now. This will be addressed (and this Docs updated) once we have determined how each subplugin is expected to work (from a DB / backup perspective).

Everything in backup is about tasks and steps (see Backup 2.0 general architecture for more information), all them conforming one backup plan. Each module instance and each block instance will be backup by one backup task instance that is, basically, one collection of backup steps. How steps are organized within the task will dictate to the backup system what to do and in which order.

Another important point is that Moodle backup 2.0, supports multiple backup formats ("moodle2", "imscc"...) each one having its own tasks/steps, completely unrelated between them. In any case, for now, we are focusing all the explanations below in the "moodle2" format, that is the one required in order to keep any module/block transportable between Moodle instances without any loss of data. Surely, for the rest of formats, we'll end with other documents describing them, as far as each one can have its own particularities.

Talking about backup steps we must differentiate two type of steps:

Execution steps: That, simply, execute arbitrary PHP code. They are useful to prepare different structures, create directories, whatever have to be done not involving the generation of XML files. Normally you won't need them.
Structure steps: That, using one PHP API (detailed below) define completely the XML structure to be exported and its contents. Hopefully, "normal" modules only will have to use one step of this type in order to have the backup functionality 100% implemented.

Said that, in the next steps we are going through all the steps necessary to create the backup of one simple module and one block, in order to get all the possibilities covered.

How to backup one module

For this section, we have selected one simple module (choice) that requires practically all the backup "machinery" to be used, so it will get explained as we progress in the development. The only point not covered here are the "subplugin" facilities, covered later in a separate section.

Prerequisites

In order to achieve the backup implementation for the module, some prerequisites should be fulfilled. They are just recommendations but, specially while getting used to backup, it's good to follow them. Let's see:

Learn about the module. If you aren't the creator of the module, spend some time playing with the module, creating and using it, exploring each one of its functionalities. By doing this you will end with some "real" data in the module instances that will be really useful when testing / debugging how the module backup is being generated.
Draw one schema of the module DB structures. While you are playing with the module, look continuously to the DB, how records are saved and which are the relations between the module's tables. As far a backup is, basically, one "selective dump" of those tables, knowing the maximum about them is highly recommended. At the end, you must end with one tree structure will be the basis for the generated XML file.
Annotate which tables contain user-related info and which ones don't. One of the core functionalities that must be present on each module is the ability to include user related information or skipping it, so you will need that information later.

Tip: If the module already existed before Moodle 2.0, it can be a good idea to take a look to its 1.9 backuplib.php file, as far as the structure is already defined there and can help to understand the organization better and, at the same time, keeping the XML structure as similar as possible, that will, definitively, make things easier when converting 1.9 backup files to the new backup 2.0 format.

Note: It's important to highlight that the structure schema, once decided, should be as stable as possible along the time, because any change in the structure makes restore really way-more-more complex to be implemented. There isn't problems adding / deleting fields, nor adding new elements to the structure. But the structure (tree) itself must persist as stable as possible, so please, be careful when deciding it.

So, applying these pre-requisites to our candidate module (choice), here it's the corresponding schema. We'll use it along the whole process.

Schema

In the schema, you must try to put as much information as possible, so that will produce the coding process later to be quicker and easier while keeping the final results free from errors and missing bits. So, once more, don't start coding immediately, instead spend some time with the requisites above, understanding how the module works and designing the final structure that represents it better.

The (correct) candidate

The tree on the left shows the ER/DB structure of the choice module, where one choice have one or more options and each option is answered by users one or more times (note: ignore cardinality accuracy in the previous phrase).

And that's the best schema representing the structure of the module, with each element properly nested so, we won't have any problem with restore as far as the order required by restore (1, 2, 3) are naturally given.

Looks easy, cool, but follow reading… you will get surprised! :-)

The (chosen) candidate

Instead we are going to use the (complete this time) diagram below. See the rationale about that after the image.

The main reason to use this tree (instead of the "correct") one is that this is the structure that has been used by Moodle 1.9 backup since ages ago for the choice module and, of course have done its work ok. As commented above we must try to reduce the number of structural changes in one module backup in order to keep the restore operations working along the time (the same is applicable for the conversion of 1.9 backup files to the new 2.0 format). Finally, this is a good example about how one module can have different XML representations and we need to try to get that best one (this is not the case) on each case. So, once more, spending some time analyzing the activity is worth it.

Let's analyze the schema with some detail:

Detecting user information: We must be able to define the entities (tables) in the diagram that are used to store user information. One of the core features of the backup subsystem since its early days have been the ability to produce backup with and without user information. So we need that info. Hence, the "no user info" in the choice and choice_options elements (they are configuration, user-independent), while the choice_answers is marked as "user info" (contains user's answers to the choice).

Determining the correct order of backup: This, while simple, is critical too (especially from a restore perspective). As far as the restore reads progressively the xml file and performs actions in that order, we must guarantee that the "read order" is the correct one, fulfilling any possible dependency. Back to our schema it's clear that we need to backup the "choice" element at first, and then the "choice_options" and "choice_answers" ones. More yet, the "choice_options" must be backup before "choice_answers" as far as the later needs to save the values of the former (the "optionid" information). Be noted, that, as commented some paragraphs above, the "correct" alternative really gave us that order information easily. Doesn't matter as far as we have been able to establish it also in the "chosen" schema.

Attributes and elements: Now it's time to decide which fields will be considered attributes in the resulting XML file and which ones will be child elements (tags). The rule is simple: All the "id" fields must (should) be defined as attributes. Note this is just one arbitrary rule without much rationale behind it as far as, from a restore perspective, everything (attributes and child tags) will be handled in the same way (object attributes). So, in our schema, all the "id" fields have been marked as "attr".

Not needed elements: If we have designed properly the schema, we'll detect that some fields aren't necessary, as far as such information is already included in some parent element. In our schema, the field "choiceid", pointing to the "choice->id", both in the options and in the answer elements have been marked as "not needed" as far as their parent "choice" already contains it. Something similar happens with the "course" field in the choice element, it doesn't need to be included in the backup file as far as something above it (course element, out from the module scope) already has it defined. Finally, there is one element marked as "needed" that shows us, once more, that the schema we are using is not the best. In the "chosen" one, as far as "choice_answers" isn't nested under "choice_options" we must keep that field in the backup, or restore won't know to which option each answer belongs to. Instead, with the "correct" schema, where answers are nested under options, that field is not needed. Summarizing, any field, but those already existing in parent elements must be included in backup.

Detecting file areas used by the module: Along the module we can be using various file areas in different elements and fields. We need to know exactly which file area is handled by which element and the (optional) itemid information used for that file area. In general, anything being one text field, or anything looking like one attachment has high chances to have one (hidden) file area associated. In our schema, we have one file area detected (choice_intro) corresponding to the introduction of the module and available to put any images or whatever in that field. Also, in general, all the "xxx_intro" file areas use to have no itemid, as far as the module's context is enough to define them without ambiguities. So, we mark the choice->intro as "choice_intro" file area and "no itemid"). From our expertise playing with the module we know there aren't more file areas at all.

Annotating some important bits: Due to the modularity of the backup and in order to know exactly which information must be saved (because it's used) and which one can be skipped, it's important to annotate some important elements along the whole backup process. So, back to our schema, we have marked the "choice_answer->userid" field as "annotation" (so backup will, automatically, add all the information for that user). Here it's the list of elements that we must not forget to annotate (or we could end with non-restorable backups). Note that we must, always, be annotating "id" values and not other types of data:
- user: Any field pointing to one user->id present along the schema (as said above, our schema has one).
- grouping: Any field pointing to one grouping->id
- group: Any field pointing to one group->id
- role: Any field pointing to one role->id
- scale: Any field pointing to one scale->id
- outcome: Any field pointing to one outcome->id

And this is all the information we need to know, before starting to code. Surely, once used to backup and restore, you will be able to start coding sooner, but don't forget about the importance of choosing one good and stable structure before anything else. It's really the critical part of any module's backup.

Said that, let's see how to code all this information in order to get one cool backup.

Documentation