Backup 1.9 conversion for developers

Revision as of 14:12, 24 May 2011 by David Mudrak (talk | contribs) (Finishing the job)

Jump to: navigation, search
Tutorial for developers on backup 1.9 conversion
Project state Ready to use
Tracker issue MDL-22414
Discussion N/A
Assignee David Mudrak


Introduction

This page follows up the tutorial at Backup 2.0 for developers and describes how to implement support of Moodle 1.9 backup conversion into the new Moodle 2.x format.

In short, since Moodle 2.1 there is a new type of core component available called backup converters. Converter is a tool that takes a directory containing some Moodle course data and converts it into another format. At the moment we are working on converter called moodle1 that supports 1.9 => 2.x conversion path. In the future, more converters can be written (eg supporting custom formats, Blackboard, IMS CC etc). Converters can be chained so if there is a converter that supports IMS => 1.9 conversion and another one converting 1.9 => 2.x, Moodle can automatically restore IMS backups. The core of the moodle1 converter is implemented in backup/converter/moodle1/.

The moodle1 converter is under a heavy development at the moment still. The first milestone is to be able to convert all activity modules without user data.

Getting familiar with the required changes of the structure

For the purpose of this tutorial, the Choice module is used as an example as it allows to demonstrate the basic workflow of the conversion. Let us start with performing a backup in both 1.9 and 2.1. Create an empty course with a single simple Choice module instance inside in both 1.9 and 2.1. Choose backup mode without any user data, without roles, files etc and include just the instance of the module you created.

In case of Moodle 1.9, you will end up having once monolithic file moodle.xml. In Moodle 2.1, the module data will be stored in a file like activities/choice_x/choice.xml. The task of the converter you are going to write is to convert data contained in moodle.xml into choice.xml.

Getting the list of XML paths

Looking at moodle.xml in 1.9 backup, you can see that the module data are stored in XML nodes at /MOODLE_BACKUP/COURSE/MODULES/MOD and that you are interested only to those MODs having <MODTYPE>choice</MODTYPE>. To make your life easier, the core of the moodle1 converter injects one virtual node into the path so that to our module, it appears as if its data were in /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE. That is as if all Choice data were wrapped by yet another tag in moodle.xml.

Now look into 1.9 Moodle code and locate the file mod/choice/backuplib.php. Reading its code you can see that the MOD element in moodle.xml (that will be presented as MOD/CHOICE element to our code) contains the following tags holding the corresponding fields from mdl_choice table:

   ID              $choice->id
   NAME            $choice->name
   TEXT            $choice->text
   FORMAT          $choice->format
   PUBLISH         $choice->publish
   SHOWRESULTS     $choice->showresults
   DISPLAY         $choice->display
   ALLOWUPDATE     $choice->allowupdate
   SHOWUNANSWERED  $choice->showunanswered
   LIMITANSWERS    $choice->limitanswers
   TIMEOPEN        $choice->timeopen
   TIMECLOSE       $choice->timeclose
   TIMEMODIFIED    $choice->timemodified

Below these, the choice options data are dumped into the OPTIONS section, with each option details being wrapped by the OPTION tag:

   ID              $cho_opt->id
   TEXT            $cho_opt->text
   MAXANSWERS      $cho_opt->maxanswers
   TIMEMODIFIED    $cho_opt->timemodified

The file moodle.xml will be parsed by a progressive parser. That basically means it will be read in a sequential order and each time some interesting path is reached, the data contained by that element are dispatched to a handler (on contrary to DOM like parsers where the whole file would be converted into a huge in-memory tree structure). To catch the choice data in moodle.xml we will have to handle /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE and /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION paths

Getting know how data change during 1.9 => 2.x upgrade

Now let us open mod/choice/db/upgrade.php in your Moodle 2.1 code. It contains all the upgrade logic that is happening during the upgrade of 1.9 site to 2.x. Reading the code, we realize that:

  • the database field text in the {choice} table is renamed to intro
  • the database field format in the {choice} table is renamed to introformat
  • a new field completionsubmit is added to the {choice} table with the default value 0

Getting know the structure of choice.xml

Finally look at mod/choice/backup/moodle2/ in your Moodle 2.1, particularly the file backup_choice_stepslib.php. Here you can see how the structure of the choice.xml is defined and what data it contains:

   $choice = new backup_nested_element('choice', array('id'), array(
       'name', 'intro', 'introformat', 'publish',
       'showresults', 'display', 'allowupdate', 'allowunanswered',
       'limitanswers', 'timeopen', 'timeclose', 'timemodified',
       'completionsubmit'));

The <choice> element has a child element <options> that in turn contains set of <option> elements:

   $option = new backup_nested_element('option', array('id'), array(
       'text', 'maxanswers', 'timemodified'));

Great! Now we have all the information needed to start with coding.

Writing the conversion handler

Before we start, please make your own local branch that will track David's https://github.com/mudrd8mz/moodle/tree/backup-convert tree. That is the pre-integration branch where the development of backup conversion is happening and your work will have to be merged there before it gets into the moodle.git master (use Github's pull request feature once you want your work being included there).

To add this branch into your current moodle.git clone, you may want to use something like this:

   $ cd ~/public_html/moodle21
   $ git remote add mudrd8mz git://github.com/mudrd8mz/moodle.git
   $ git fetch mudrd8mz
   $ git checkout --track -b backup-convert mudrd8mz/backup-convert

Alternatively, if you want to create a pristine installation just for this project, you can simply run

   $ cd ~/public_html
   $ git clone git://github.com/mudrd8mz/moodle.git moodle21convert
   $ cd moodle21convert
   $ git checkout --track -b backup-convert origin/backup-convert

The backup conversion workflow

To be able to write the module conversion handler, you should understand what's going on in the background. This is a simplified description of the workflow:

UML sequence diagram of the backup conversion machinery
  • In its constructor, the restore_controller instance detects the format of the data to be restored and if it realizes it is not the standard moodle2 format, it knows the conversion will be needed (grep for backup::STATUS_REQUIRE_CONV)
  • The restore_controller::convert() method is called (@todo is it? where? by who?) and it in turn calls convert_helper::to_moodle2_format(). This helper method tries to find the most effective conversion path between the current format and the target moodle2 format as it constructs the chain of converters, though this is not much interesting yet as we have just one converter now ... well not yet even :-p
  • For each converter in the chain, convert_factory::get_converter() creates new instance of it and its public convert() method is called.
  • The core functionality of moodle1 converter is defined in backup/converter/moodle1/lib.php in the class moodle1_converter and its subclasses. When the class is instantiated, it prepares a progressive parser and a parser processor and it registers all available handlers of the parsed data. The incoming convert() call sets up the directory to write to and runs the instance's execute() method.
  • The execute() starts up the parser of the 1.9 moodle.xml. That file is parsed sequentially and whenever it reaches a node to which some handler is attached, it dispatches the parsed data via dispatch_chunk(). The parser also triggers notify_path_start() and notify_path_end() when it is entering some registered path element and when it is leaving it.
  • The parser processor re-dispatches the parsed data and events via path_start_reached(), process_chunk() and path_end_reached() methods defined by the moodle1_convert.
  • Finally, the parsed data and events are re-dispatched once more and they are handled by moodle1_handler subclasses via their on_xxx_start(), process_xxx() and on_xxx_end() methods. These are the places where the actual conversion of data must happen and the new data are written into the new XML files.
  • The moodle1_handler subclasses can use either xml_writer's begin_tag(), full_tag() and end_tag() method to construct the XML file contents. A helper method that dumps a complete tree-ish structure is available: write_xml().

After the end of the parsed moodle.xml file is reached, the working directory with the new XML files is renamed so that it replaces the previous format. If there was a chain of converters, it would be the next one's round now. For us at the moment, the job is done. Once the directory contains valid moodle2 format, normal restore process is executed as if the course backup come from a 2.0 server.

Highlights and places to look at

While working on the converter, always keep in mind:

  • The file moodle.xml is parsed sequentially by a progressive parser. If the current node's data are available, the handler must either write them immediately into a new XML file or stash them for processing them later - either by the on_element_end() event handler method or by other instance of the handler that is being executed later.
  • Minimise the memory footprint of the conversion job. Do not accumulate the incoming data in memory if it is not necessary. Especially those that may be huge (like all forum posts). Write the data to the new XML file as soon as possible (on-the-fly ideally but that is not always possible).
  • Look at various examples of moodle1 handler classes in backup/converter/moodle1/handlerlib.php

The handler library template

The moodle1_converter itself converts just part of the moodle.xml file via handlers defined in handlerlib.php (for example the course sections, course modules etc). To convert module-specific data, moodle1_handlers_factory searches for the handlers in modules' (and generally plugins') directories. The Choice module conversion logic is stored in a library mod/choice/backup/moodle1/lib.php. The basic template for such a file is:

<?php
 
// This file is part of Moodle - http://moodle.org/
//
// Moodle is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// Moodle is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with Moodle.  If not, see <http://www.gnu.org/licenses/>.
 
/**
 * Provides support for the conversion of moodle1 backup to the moodle2 format
 *
 * @package    mod
 * @subpackage choice
 * @copyright  2011 Your Name <your@email>
 * @license    http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
 */
 
defined('MOODLE_INTERNAL') || die();
 
/**
 * Choice conversion handler
 */
class moodle1_mod_choice_handler extends moodle1_mod_handler {
 
    /**
     * Declare the paths in moodle.xml we are able to convert
     *
     * The method returns list of {@link convert_path} instances.
     * For each path returned, the corresponding conversion method must be
     * defined.
     *
     * Note that the path /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE does not
     * actually exist in the file. The last element with the module name was
     * appended by the moodle1_converter class.
     *
     * @return array of {@link convert_path} instances
     */
    public function get_paths() {
        return array(
            new convert_path('choice', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE'),
            new convert_path('choice_option', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION'),
        );
    }
 
    /**
     * This is executed every time we have one /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE
     * data available
     */
    public function process_choice($data) {
    }
 
    /**
     * This is executed every time we have one /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION
     * data available
     */
    public function process_choice_option($data) {
    }
}

Let us look at this code template closer so that you understand the bits and will be able to use it as a template for your own work.

As you can see, the file defines a single class that extends moodle1_mod_handler. All moodle1 handler classes must define get_paths() public method that returns a list of convert_path instances. In our example, we declare that the handler is interested in those two paths we have identified above. We register them with "aliases" choice and choice_option. The developer can choose any reasonable alias here and provide it as the first parameter of the convert_path constructor.

For each convert_path instance, a corresponding processing method must exist in the class. This method must be named process_xxx() where xxx is the alias declared in the get_paths().

Setting-up the development environment

At this stage of the development, this seems to be the most effective way of testing and debugging your code:

  • copy the moodle.xml file with the backup of the 1.9 course you created into $CFG->dirroot/backup/converter/moodle1/simpletest/files/moodle.xml (override the one existing there)
  • modify the process_choice() method so that it just dumps the $data:
public function process_choice($data) {
    print_object($data); // DONOTCOMMIT
}
  • put the following into your config.php:
    $CFG->keeptempdirectoriesonbackup = true;
  • execute the unit tests in the path "backup/converter/moodle1", there is one test method that actually runs the conversion of the file you copied

If you are lucky enough, you should see an array dumped for each choice instance defined in moodle.xml file. Also look at $CFG->dataroot/temp/backup/ - there should be several directories created and one of the most recently created is the one that contains the converted backup in the unpacked MBZ format. For each execution of the convert process, a new directory is created. Do not continue unless this works for you.

Data structure transformations

As you can from the print_object() output, the process_choice() method is passed the data contained in the corresponding <MOD>...</MOD> part of the moodle.xml file. We know that we need to rename fields text and format, add a new field completionsubmit and drop the field modtype (that one was already used to expand the XML path so that it looks as if the moodle.xml defined the data inside <MOD><CHOICE>...</CHOICE></MOD>). These transformations are pretty common and the converter can do them on behalf of us.

You can provide so called recipes that will be applied to the element data before they are dispatched to the proces_xxx() methods. These recipes are used to "pre-cook" the raw data from moodle.xml. In fact, one such recipe has been already applied by default - converting the tag names to lower case. Three other types of recipes are supported at the moment: to add new fields, to rename fields and to drop fields. All you need to do is to declare what recipes and in what order should be applied on the parsed data. You can do so by adding the third parameter to the constructor of convert_path instances in get_paths() method:

public function get_paths() {
    return array(
        new convert_path(
            'choice', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE',
            array(
                'renamefields' => array(
                    'text' => 'intro',
                    'format' => 'introformat',
                ),
                'newfields' => array(
                    'completionsubmit' => 0,
                ),
                'dropfields' => array(
                    'modtype'
                ),
            )
        ),
        new convert_path('choice_option', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION'),
    );
}

As you can see, the third parameter is an associative array. The keys of the array are the names of the recipes. The renamefields recipe will be applied first and will rename text to intro and format to introformat. The newfields recipe will add a new field completionsubmit with the value set to 0. And finally, the dropfields recipe will delete the field modtype from the data.

In case you would need to know how the data looked like before applying the recipes, your process_xxx() method can get them via its second parameter:

public function process_choice($data, $raw) {
    print_object($data); // DONOTCOMMIT
    print_object($raw); // DONOTCOMMIT
}

This may be required if, for example, you need to calculate a value of a new field from the value of some field that was dropped by the recipe.

Writing the transformed data into the new XML file

Now we have the choice data transformed and passed to process_choice() method and we can write them into the new XML file in the MBZ format directory. As you know, the module instance data are saved in a file like activities/choice_x/choice.xml where x is the course moduleid of our current instance (note that we do not know this cmid yet). Following the principle "write data as soon as possible", let us go and write them into the file:

public function process_choice($data) {
 
    // get the course module id and context id
    $instanceid = $data['id'];
    $cminfo     = $this->get_cminfo($instanceid);
    $moduleid   = $cminfo['id'];
    $contextid  = $this->converter->get_contextid(CONTEXT_MODULE, $moduleid);
 
    // we now have all information needed to start writing into the file
    $this->open_xml_writer("activities/choice_{$moduleid}/choice.xml");
    $this->xmlwriter->begin_tag('activity', array('id' => $instanceid, 'moduleid' => $moduleid,
        'modulename' => 'choice', 'contextid' => $contextid));
    $this->xmlwriter->begin_tag('choice', array('id' => $instanceid));
 
    unset($data['id']); // we already write it as attribute, do not repeat it as child element
    foreach ($data as $field => $value) {
        $this->xmlwriter->full_tag($field, $value);
    }
 
    $this->xmlwriter->begin_tag('options');
}

Let us look closely on what's going on here. At the top, we prepare the values needed for attributes of the <activity> tag that is the root element of the file. Our parent class defines a method get_cminfo() that returns a structure containing the needed cmid for our instance. The get_contextid() method is a bit more tricky. In Moodle 1.9 backup files, there is no context information stored at all. But Moodle 2.0 format contains it. So the converter must actually generate fictive context ids for various levels (course, modules, blocks, ...) as if they were stored in moodle.xml.

In the next step, we open a new XML file for writing. Then we use xml_writer's method begin_tag() to dump the first lines of the file. Then, in the foreach loop, we write all cooked data into the opened file. Finally, we write the opening <options> tag and leave the file open so that process_choice_option() can continue adding tis own data to it.

The process_xxx() method should return the $data if it modifies them (so they could be used later by on_choice_end() method, if it was needed). But as long as we do not do any other transformation, we do not return anything. In that case, our $this->converter stores just the cooked data for eventual later processing.

Writing a complete tree-ish structure into the XML file in one step

As the parser continues parsing the moodle.xml file, our handler's method process_choice_option() is called once there are data for a single <OPTION>...</OPTION> element available (because we registered it as a path we are interested in). In this case, the situation is much much simpler. There are no transformations needed (but converting tags to lower case and it is done automatically). Therefore we we can write the whole array including the closing </option> tag. In such case there is a nice helper method available:

public function process_choice_option($data) {
    $this->write_xml('option', $data, array('/option/id'));
}

This write_xml() method writes the given $data tree into the <option>...</option> element and $data->id will be written as the option's attribute instead of a child element.

Finishing the job

When all options are written, we need to generate the closing tags, close the file and release our xml writer. For this we can use the event handling method that is executed as the parsers reaches the closing </MOD> tag in moodle.xml (or rather think of it as if it reached the closing </CHOICE> tag in that virtual </CHOICE></MOD> path. If your handle defines a method like on_xxx_end() function (where xxx is again the registered alias of the path), it is triggered by that event.

public function on_choice_end() {
    // close choice.xml
    $this->xmlwriter->end_tag('options');
    $this->xmlwriter->end_tag('choice');
    $this->xmlwriter->end_tag('activity');
    $this->close_xml_writer();
}

Note that this on_xxx_end() function can take a parameter $data that would contain the data returned previously by process_xxx(). If the processing method does not return anything, just the cooked raw data are available to the on_xxx_end() method.

And we are done! Execute the unit tests in the path backup/converter/moodle1 again and you shall find the choice.xml file converted correctly there. Was easy, heh? :-)