Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Backup 1.9 conversion for developers

From MoodleDocs
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Tutorial for developers on backup 1.9 conversion
Project state Ready to use
Tracker issue MDL-22414
Discussion N/A
Assignee David Mudrak


Introduction

zip2mbz.png


This page follows up the tutorial at Backup 2.0 for developers and describes how to implement support of Moodle 1.9 backup conversion into the new Moodle 2.x format.

In short, since Moodle 2.1 there is a new type of core component available called backup converters. Converter is a tool that takes a directory containing some Moodle course data and converts it into another format. At the moment we are working on converter called moodle1 that supports 1.9 => 2.x conversion path. In the future, more converters can be written (eg supporting custom formats, Blackboard, IMS CC etc). Converters can be chained so if there is a converter that supports IMS => 1.9 conversion and another one converting 1.9 => 2.x, Moodle can automatically restore IMS backups. The core of the moodle1 converter is implemented in backup/converter/moodle1/.

The moodle1 converter is under a heavy development at the moment still. The first milestone is to be able to convert all activity modules without user data.

Getting familiar with the required changes of the structure

For the purpose of this tutorial, the Choice module is used as an example as it allows to demonstrate the basic workflow of the conversion. Let us start with performing a backup in both 1.9 and 2.1. Create an empty course with a single simple Choice module instance inside in both 1.9 and 2.1. Choose backup mode without any user data, without roles, files etc and include just the instance of the module you created.

In case of Moodle 1.9, you will end up having once monolithic file moodle.xml. In Moodle 2.1, the module data will be stored in a file like activities/choice_x/choice.xml. The task of the converter you are going to write is to convert data contained in moodle.xml into choice.xml.

Getting the list of XML paths

Looking at moodle.xml in 1.9 backup, you can see that the module data are stored in XML nodes at /MOODLE_BACKUP/COURSE/MODULES/MOD and that you are interested only to those MODs having <MODTYPE>choice</MODTYPE>. To make your life easier, the core of the moodle1 converter injects one virtual node into the path so that to our module, it appears as if its data were in /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE. That is as if all Choice data were wrapped by yet another tag in moodle.xml.

Now look into 1.9 Moodle code and locate the file mod/choice/backuplib.php. Reading its code you can see that the MOD element in moodle.xml (that will be presented as MOD/CHOICE element to our code) contains the following tags holding the corresponding fields from mdl_choice table:

   ID              $choice->id
   NAME            $choice->name
   TEXT            $choice->text
   FORMAT          $choice->format
   PUBLISH         $choice->publish
   SHOWRESULTS     $choice->showresults
   DISPLAY         $choice->display
   ALLOWUPDATE     $choice->allowupdate
   SHOWUNANSWERED  $choice->showunanswered
   LIMITANSWERS    $choice->limitanswers
   TIMEOPEN        $choice->timeopen
   TIMECLOSE       $choice->timeclose
   TIMEMODIFIED    $choice->timemodified

Below these, the choice options data are dumped into the OPTIONS section, with each option details being wrapped by the OPTION tag:

   ID              $cho_opt->id
   TEXT            $cho_opt->text
   MAXANSWERS      $cho_opt->maxanswers
   TIMEMODIFIED    $cho_opt->timemodified

The file moodle.xml will be parsed by a progressive parser. That basically means it will be read in a sequential order and each time some interesting path is reached, the data contained by that element are dispatched to a handler (on contrary to DOM like parsers where the whole file would be converted into a huge in-memory tree structure). To catch the choice data in moodle.xml we will have to handle /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE and /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION paths

Getting know how data change during 1.9 => 2.x upgrade

Now let us open mod/choice/db/upgrade.php in your Moodle 2.1 code. It contains all the upgrade logic that is happening during the upgrade of 1.9 site to 2.x. Reading the code, we realize that:

  • the database field text in the {choice} table is renamed to intro
  • the database field format in the {choice} table is renamed to introformat
  • a new field completionsubmit is added to the {choice} table with the default value 0

Getting know the structure of choice.xml

Finally look at mod/choice/backup/moodle2/ in your Moodle 2.1, particularly the file backup_choice_stepslib.php. Here you can see how the structure of the choice.xml is defined and what data it contains:

   $choice = new backup_nested_element('choice', array('id'), array(
       'name', 'intro', 'introformat', 'publish',
       'showresults', 'display', 'allowupdate', 'allowunanswered',
       'limitanswers', 'timeopen', 'timeclose', 'timemodified',
       'completionsubmit'));

The <choice> element has a child element <options> that in turn contains set of <option> elements:

   $option = new backup_nested_element('option', array('id'), array(
       'text', 'maxanswers', 'timemodified'));

Great! Now we have all the information needed to start with coding.

Summary

Do not continue your task unless

  • you have read the backuplib.php in 1.9 source of the module and you know how the <MOD> node in moodle.xml is constructed (do not rely on just looking at some example file)
  • you have read the db/upgrade.php and you known all steps needed to perform
  • you have read backup/moodle2/backup_xxx_stepslib.php and you know how the activity file in 2.0 is constructed (again, do not rely on examples)

Writing the conversion handler

Before we start, please make your own local branch that will track David's https://github.com/mudrd8mz/moodle/tree/backup-convert tree. That is the pre-integration branch where the development of backup conversion is happening and your work will have to be merged there before it gets into the moodle.git master (use Github's pull request feature once you want your work being included there).

To add this branch into your current moodle.git clone, you may want to use something like this:

   $ cd ~/public_html/moodle21
   $ git remote add mudrd8mz git://github.com/mudrd8mz/moodle.git
   $ git fetch mudrd8mz
   $ git checkout --track -b backup-convert mudrd8mz/backup-convert

Alternatively, if you want to create a pristine installation just for this project, you can simply run

   $ cd ~/public_html
   $ git clone git://github.com/mudrd8mz/moodle.git moodle21convert
   $ cd moodle21convert
   $ git checkout --track -b backup-convert origin/backup-convert

The backup conversion workflow

To be able to write the module conversion handler, you should understand what's going on in the background. This is a simplified description of the workflow:

UML sequence diagram of the backup conversion machinery
  • In its constructor, the restore_controller instance detects the format of the data to be restored and if it realizes it is not the standard moodle2 format, it knows the conversion will be needed (grep for backup::STATUS_REQUIRE_CONV)
  • The restore_controller::convert() method is called (@todo is it? where? by who?) and it in turn calls convert_helper::to_moodle2_format(). This helper method tries to find the most effective conversion path between the current format and the target moodle2 format as it constructs the chain of converters, though this is not much interesting yet as we have just one converter now ... well not yet even :-p
  • For each converter in the chain, convert_factory::get_converter() creates new instance of it and its public convert() method is called.
  • The core functionality of moodle1 converter is defined in backup/converter/moodle1/lib.php in the class moodle1_converter and its subclasses. When the class is instantiated, it prepares a progressive parser and a parser processor and it registers all available handlers of the parsed data. The incoming convert() call sets up the directory to write to and runs the instance's execute() method.
  • The execute() starts up the parser of the 1.9 moodle.xml. That file is parsed sequentially and whenever it reaches a node to which some handler is attached, it dispatches the parsed data via dispatch_chunk(). The parser also triggers notify_path_start() and notify_path_end() when it is entering some registered path element and when it is leaving it.
  • The parser processor re-dispatches the parsed data and events via path_start_reached(), process_chunk() and path_end_reached() methods defined by the moodle1_convert.
  • Finally, the parsed data and events are re-dispatched once more and they are handled by moodle1_handler subclasses via their on_xxx_start(), process_xxx() and on_xxx_end() methods. These are the places where the actual conversion of data must happen and the new data are written into the new XML files.
  • The moodle1_handler subclasses can use either xml_writer's begin_tag(), full_tag() and end_tag() method to construct the XML file contents. A helper method that dumps a complete tree-ish structure is available: write_xml().

After the end of the parsed moodle.xml file is reached, the working directory with the new XML files is renamed so that it replaces the previous format. If there was a chain of converters, it would be the next one's round now. For us at the moment, the job is done. Once the directory contains valid moodle2 format, normal restore process is executed as if the course backup come from a 2.0 server.

Highlights and places to look at

While working on the converter, always keep in mind:

  • The file moodle.xml is parsed sequentially by a progressive parser. If the current node's data are available, the handler must either write them immediately into a new XML file or stash them for processing them later - either by the on_element_end() event handler method or by other instance of the handler that is being executed later.
  • Minimise the memory footprint of the conversion job. Do not accumulate the incoming data in memory if it is not necessary. Especially those that may be huge (like all forum posts). Write the data to the new XML file as soon as possible (on-the-fly ideally but that is not always possible).
  • Look at various examples of moodle1 handler classes in backup/converter/moodle1/handlerlib.php

The handler library template

The moodle1_converter itself converts just part of the moodle.xml file via handlers defined in handlerlib.php (for example the course sections, course modules etc). To convert module-specific data, moodle1_handlers_factory searches for the handlers in modules' (and generally plugins') directories. The Choice module conversion logic is stored in a library mod/choice/backup/moodle1/lib.php. The basic template for such a file is:

<?php

// This file is part of Moodle - http://moodle.org/
//
// Moodle is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// Moodle is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with Moodle.  If not, see <http://www.gnu.org/licenses/>.

/**
 * Provides support for the conversion of moodle1 backup to the moodle2 format
 *
 * @package    mod
 * @subpackage choice
 * @copyright  2011 Your Name <your@email>
 * @license    http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
 */

defined('MOODLE_INTERNAL') || die();

/**
 * Choice conversion handler
 */
class moodle1_mod_choice_handler extends moodle1_mod_handler {

    /**
     * Declare the paths in moodle.xml we are able to convert
     *
     * The method returns list of {@link convert_path} instances. For each path returned,
     * at least one of on_xxx_start(), process_xxx() and on_xxx_end() methods must be
     * defined. The method process_xxx() is not executed if the associated path element is
     * empty (i.e. it contains none elements or sub-paths only).
     *
     * Note that the path /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE does not
     * actually exist in the file. The last element with the module name was
     * appended by the moodle1_converter class.
     *
     * @return array of {@link convert_path} instances
     */
    public function get_paths() {
        return array(
            new convert_path('choice', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE'),
            new convert_path('choice_option', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION'),
        );
    }

    /**
     * This is executed every time we have one /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE
     * data available
     */
    public function process_choice($data) {
    }

    /**
     * This is executed every time we have one /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION
     * data available
     */
    public function process_choice_option($data) {
    }
}

Let us look at this code template closer so that you understand the bits and will be able to use it as a template for your own work.

As you can see, the file defines a single class that extends moodle1_mod_handler. All moodle1 handler classes must define get_paths() public method that returns a list of convert_path instances. In our example, we declare that the handler is interested in those two paths we have identified above. We register them with "aliases" choice and choice_option. The developer can choose any reasonable alias here and provide it as the first parameter of the convert_path constructor.

For each convert_path instance, a corresponding processing method must exist in the class. This method must be named process_xxx() where xxx is the alias declared in the get_paths().

Setting-up the development environment

Use a real 1.9 backup file and unzip it into a folder within your $CFG->dataroot.'/temp/backup/' directory. Let us assume you have unzipped the backup into a folder called refcourse.

Put the following script into the root of your Moodle (as 'convert.php' for example):

<?php
define('CLI_SCRIPT', 1);
require(dirname(__FILE__).'/config.php');
require_once($CFG->dirroot.'/backup/util/helper/convert_helper.class.php');

convert_helper::to_moodle2_format('refcourse', 'moodle1');

(if you do not want to execute the script via CLI but from your browser, just remove the CLI_SCRIPT declaration).

Put the following into your config.php:

$CFG->keeptempdirectoriesonbackup = true;

The convert.php script in the Moodle root converts the 'refcourse' directory into the Moodle 2.0 MBZ format. To test it now, modify the process_choice() method so that it just dumps the $data:

public function process_choice($data) {
    print_object($data); // DONOTCOMMIT
}

and run the convert.php. If you are lucky enough, you should see an array dumped for each choice instance defined in moodle.xml file. Also look at $CFG->dataroot/temp/backup/ - you should see that the 'refcourse' now contains the converted version of the backup and the original 1.9 should be left there for the reference, too. Do not continue unless this works for you.

Data structure transformations

As you can from the print_object() output, the process_choice() method is passed the data contained in the corresponding <MOD>...</MOD> part of the moodle.xml file. We know that we need to rename fields text and format, add a new field completionsubmit and drop the field modtype (that one was already used to expand the XML path so that it looks as if the moodle.xml defined the data inside <MOD><CHOICE>...</CHOICE></MOD>). These transformations are pretty common and the converter can do them on behalf of us.

You can provide so called recipes that will be applied to the element data before they are dispatched to the proces_xxx() methods. These recipes are used to "pre-cook" the raw data from moodle.xml. In fact, one such recipe has been already applied by default - converting the tag names to lower case. Three other types of recipes are supported at the moment: to add new fields, to rename fields and to drop fields. All you need to do is to declare what recipes and in what order should be applied on the parsed data. You can do so by adding the third parameter to the constructor of convert_path instances in get_paths() method:

public function get_paths() {
    return array(
        new convert_path(
            'choice', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE',
            array(
                'renamefields' => array(
                    'text' => 'intro',
                    'format' => 'introformat',
                ),
                'newfields' => array(
                    'completionsubmit' => 0,
                ),
                'dropfields' => array(
                    'modtype'
                ),
            )
        ),
        new convert_path('choice_option', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION'),
    );
}

As you can see, the third parameter is an associative array. The keys of the array are the names of the recipes. The renamefields recipe will be applied first and will rename text to intro and format to introformat. The newfields recipe will add a new field completionsubmit with the value set to 0. And finally, the dropfields recipe will delete the field modtype from the data.

In case you would need to know how the data looked like before applying the recipes, your process_xxx() method can get them via its second parameter:

public function process_choice($data, $raw) {
    print_object($data); // DONOTCOMMIT
    print_object($raw); // DONOTCOMMIT
}

This may be required if, for example, you need to calculate a value of a new field from the value of some field that was dropped by the recipe.

Writing the transformed data into the new XML file

Now we have the choice data transformed and passed to process_choice() method and we can write them into the new XML file in the MBZ format directory. As you know, the module instance data are saved in a file like activities/choice_x/choice.xml where x is the course moduleid of our current instance (note that we do not know this cmid yet). Following the principle "write data as soon as possible", let us go and write them into the file:

/**
 * This is executed every time we have one /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE
 * data available
 */
public function process_choice($data) {

    // get the course module id and context id
    $instanceid = $data['id'];
    $cminfo     = $this->get_cminfo($instanceid);
    $moduleid   = $cminfo['id'];
    $contextid  = $this->converter->get_contextid(CONTEXT_MODULE, $moduleid);

    // we now have all information needed to start writing into the file
    $this->open_xml_writer("activities/choice_{$moduleid}/choice.xml");
    $this->xmlwriter->begin_tag('activity', array('id' => $instanceid, 'moduleid' => $moduleid,
        'modulename' => 'choice', 'contextid' => $contextid));
    $this->xmlwriter->begin_tag('choice', array('id' => $instanceid));

    unset($data['id']); // we already write it as attribute, do not repeat it as child element
    foreach ($data as $field => $value) {
        $this->xmlwriter->full_tag($field, $value);
    }
}

Let us look closely on what's going on here. At the top, we prepare the values needed for attributes of the <activity> tag that is the root element of the file. Our parent class defines a method get_cminfo() that returns a structure containing the needed cmid for our instance. The get_contextid() method is a bit more tricky. In Moodle 1.9 backup files, there is no context information stored at all. But Moodle 2.0 format contains it. So the converter must actually generate fictive context ids for various levels (course, modules, blocks, ...) as if they were stored in moodle.xml.

In the next step, we open a new XML file for writing. Then we use xml_writer's method begin_tag() to dump the first lines of the file. Then, in the foreach loop, we write all cooked data into the opened file. Note that we leave the file open so that other methods in our class can continue adding their own data to it.

The process_xxx() method should return the $data if it modifies them (so they could be used later by on_choice_end() method, if it was needed). But as long as we do not do any other transformation, we do not return anything. In that case, our $this->converter stores just the cooked data for eventual later processing.

On-path-start and on-path-end events

So far we've worked with the method process_choice() that is executed every time we have the attached element's data available. For each registered path, two more methods are supported in our handler. Assuming that 'xxx' is the alias for the path we declared in get_paths(), the methods on_xxx_start() and on_xxx_end() will be executed every time we reach the opening tag and closing tag of that path, respectively.

The on_xxx_start() does not accept any parameters and it is executed when the opening element for the given path is reached by the parser. The on_xxx_end() may accept one array $data that contains the element's data processed previously by the process_xxx() method. Typical usage for these two handlers is to write wrapping tags into the target XML file. Let us demonstrate this case.

Let us register yet another path in the get_paths() method (note the plural - choice_options)

...
new convert_path('choice_options', '/MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS'),
...

Note that the <OPTIONS> itself does not contain any data (leaves in the XML tree), it is just a wrapper for the nested <OPTION> elements. So there is no point of having process_choice_options() method because it would be never executed. However we register this path so that we can declare two event listeners for it:

/**
 * This is executed when the parser reaches the <OPTIONS> opening element
 */
public function on_choice_options_start() {
    $this->xmlwriter->begin_tag('options');
}

/**
 * This is executed when the parser reaches the closing </OPTIONS> element
 */
public function on_choice_options_end() {
    $this->xmlwriter->end_tag('options');
}

As you can see, we use the on-start and on-end event listeners here to generate the wrapping <options></options> tag pair. And the next step is just to make sure that the inner <option>...</option> data are written into the file, too.

Writing a complete tree-ish structure into the XML file in one step

As the parser continues parsing the moodle.xml file, our handler's method process_choice_option() is called once there are data for a single <OPTION>...</OPTION> element available (because we registered it as a path we are interested in). In this case, the situation is much much simpler. There are no transformations needed (but converting tags to lower case and it is done automatically). Therefore we we can write the whole array including the closing </option> tag. In such case there is a nice helper method available:

/**
 * This is executed every time we have one /MOODLE_BACKUP/COURSE/MODULES/MOD/CHOICE/OPTIONS/OPTION
 * data available
 */
public function process_choice_option($data) {
    $this->write_xml('option', $data, array('/option/id'));
}

This write_xml() method writes the given $data tree into the <option>...</option> element and $data->id will be written as the option's attribute instead of a child element.

Finishing the job

When all options are written, we need to generate the closing tags, close the file and release our xml writer. For this we can use the on-path-end event listening method again. This one is executed as the parsers reaches the closing </MOD> tag in moodle.xml (or rather think of it as if it reached the closing </CHOICE> tag in that virtual </CHOICE></MOD> path.

/**
 * This is executed when we reach the closing </MOD> tag of our 'choice' path
 */
public function on_choice_end() {
    // close choice.xml
    $this->xmlwriter->end_tag('choice');
    $this->xmlwriter->end_tag('activity');
    $this->close_xml_writer();
}

Note that this on_xxx_end() function can take a parameter $data that would contain the data returned previously by process_xxx(). If the processing method does not return anything, just the cooked raw data are available to the on_xxx_end() method.

And we are done! Execute the convert.php again and you shall find the choice.xml file converted correctly there. Was easy, heh? :-)

Important notes

  • Note that you can't use on_choice_start() event listening method for the root <MOD> element itself. There is a reason for that described in the phpdoc of the dispatching moodle1 methods, if you are interested. Just live with that that the process_choice() in our example is the first executed entry point in your class.
  • Note that in our case, there is actually no need to use on_choice_options_start() and on_choice_options_end() methods. We could simply write the opening <options> at the end of process_choice() and then the closing </options> at the top of on_choice_end(). We describe the start/end listeners here to demonstrate the feature mainly. And also, some may find the resulting code cleaner.

Files migration

While studying the upgrade steps in db/upgrade.php, you might notice that your module needs to migrate its files into the new file API framework. Typical cases are for example forum posts attachments, submitted files in the assignment etc. You module converter has to apply the similar logic, pretending that the final XML files were created in a virtual 2.0 site with the files already migrated.

Where files can be stored in the 1.9 backup ZIP file

  • All the course files are copied into the course_files folder within the ZIP, see backup_copy_course_files().
  • All the site files are copies into the site_files folder, if the user used this feature, see backup_copy_site_files().
  • Needed user files and group files are copied into user_files and group_files folders, respectively. See backup_copy_user_files() and backup_copy_group_files().
  • Module specific files (forum posts attachments, ...) as copied by the modules themselves in their backuplib, See for example backup_workshop_files().

How files are stored in the 2.0 backup MBZ file

The files related information is stored at several places in the MBZ file:

  • The /files folder is a pool for all the needed files. It is a content-addressable storage as the file name is SHA1 hash of the file contents and files are grouped into directories according the first two characters of their name.
  • The /files.xml file contains records from the mdl_files table where each file usage is described. This is similar to hard-linking mechanism in file systems. Note that the number of files listed in files.xml is greater than the number of physical files in the /files folder for two reasons: (1) the directories themselves are listed in files.xml (see the filename '.') and (2) one physical file in the pool can be referenced from two different places in files.xml
  • Every module has to declare the files it needs in its inforef.xml files (among other references).

So to convert your module files, you must make sure that the files are copied into the pool, are listed in the common files.xml and are correctly referenced in your inforef.xml file. The files.xml file is generated automatically by one of the moodle1_converter core handlers but your module must write your inforef.xml yourself. To do so, you should always use the moodle1_file_manager instance.

Files migration manager

The easiest way to migrate your files and get their ids for dumping them into inforef.xml is to use one instance of moodle1_file_manager per each module instance you convert. Every instance of the file manager keeps track of file ids it creates. The file manager can be seen as an object with several public properties that represent the values of the records in files.xml file. These properties can be set either implicitly via the factory method (that passes them to the manager constructor) or explicitly. The file manager has two public methods: migrate_file() used to migrate one particular file and migrate_directory() to migrate whole directory of files (including subdirectories).

Let us look at a particular example of how the Workshop module uses the file manager.

Use case 1: Migrating workshop submission attachments

Workshop module stores submission attachments in files like /moddata/workshop/xxx/filename.ext where 'xxx' is the submission id. Multiple files are allowed there. During the migration, these files must be transformed into records with the:

  • contextid set to the workshop course module context
  • component set to 'mod_workshop'
  • filearea set to 'submission_attachment'
  • itemid set to the submission id
  • userid set to the id of the user who submitted the file
  • filename set to the original file name
  • filepath set to '/' (all files are in the root directory in the given filearea/itemid

It is obvious that all the submission file records in the given workshop instance share the same context, component and filearea. For each submission, the itemid and userid must be set based on the data in <SUBMISSION> element.

Let us create a new instance of the file manager at the bottom of the process_workshop() method that is attached to the root <MOD> element of one workshop instance (the following code shows just the important parts of the whole class):

class moodle1_mod_workshop_handler extends moodle1_mod_handler {

    /** @var array in-memory cache for the course module information for the current workshop  */
    protected $currentcminfo = null;

    /** @var moodle1_file_manager instance for the current workshop */
    protected $fileman = null;

    public function get_paths() {
        return array(
            new convert_path('workshop', '/MOODLE_BACKUP/COURSE/MODULES/MOD/WORKSHOP'),
            new convert_path('workshop_submissions', '/MOODLE_BACKUP/COURSE/MODULES/MOD/WORKSHOP/SUBMISSIONS'),
            new convert_path('workshop_submission', '/MOODLE_BACKUP/COURSE/MODULES/MOD/WORKSHOP/SUBMISSIONS/SUBMISSION')
        );
    }

    public function process_workshop($data, $raw) {
        $instanceid          = $data['id'];
        $this->currentcminfo = $this->get_cminfo($instanceid);
        $moduleid            = $this->currentcminfo['id'];
        $contextid           = $this->converter->get_contextid(CONTEXT_MODULE, $moduleid);

        // prepare the file manager for this instance
        $this->fileman = $this->converter->get_file_manager($contextid, 'mod_workshop');
    }

    public function on_workshop_submissions_start() {
        $this->fileman->filearea = 'submission_attachment';
    }

    public function process_workshop_submission($data) {
        // migrate submission attachments
        $this->fileman->itemid = $data['id'];
        $this->fileman->userid = $data['userid'];
        $this->fileman->migrate_directory('moddata/workshop/'.$data['id']);
    }
}

The main job happens in process_workshop_submission() method. When it is executed, $this->fileman is correctly configured to migrate files into the correct context, component and filearea. If the directory moddata/workshop/xxx/ exists within the backup for the current submission, we set the relevant properties of the fileman and ask it to migrate all files in that directory.

Note that because we migrate files to the 'submission_attachment' filearea only, we could actually set it via get_file_manager() and get rid of the listener of <SUBMISSIONS> completely. The code above should illustrate how you can change the manager's properties to desired values as the parser goes down the moodle.xml file. Usually you will set at least contextid and the component name in the factory method get_file_manager() because those will not change. Other properties are set to correct values as soon as they are available.

The last step is to generate inforef.xml where we annotate all migrated files as being referenced by this workshop instance.

    public function on_workshop_end() {
        // close workshop.xml
        $this->xmlwriter->end_tag('workshop');
        $this->xmlwriter->end_tag('activity');
        $this->close_xml_writer();

        // write inforef.xml
        $moduleid = $this->currentcminfo['id'];
        $this->open_xml_writer("activities/workshop_{$moduleid}/inforef.xml");
        $this->xmlwriter->begin_tag('inforef');

        $this->xmlwriter->begin_tag('fileref');
        foreach ($this->fileman->get_fileids() as $fileid) {
            $this->write_xml('file', array('id' => $fileid));
        }
        $this->xmlwriter->end_tag('fileref');

        $this->xmlwriter->end_tag('inforef');
        $this->close_xml_writer();

        // get ready for the next instance
        $this->currentworkshop = null;
        $this->currentcminfo   = null;
        $this->newelementids   = array();
    }

Here we use the file manager's method get_fileids() that returns the list of all files it has ever converted. And because we create new file manager instance for every single workshop instance, the list is exactly what must be put into inforef.xml.

Note that inforef.xml will contain more data later - the list of annotated users, roles, scales and other items used by the given instance is used there.