Backup 2.0 - Provide upwards compatibility of Moodle 1.9.x backups

Revision as of 12:21, 10 November 2013 by Petr Škoda (škoďák) (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.


While the Backup 2.0 multiple formats document describes how different formats will be supported by the restore process in Moodle 2.0, it's going to be a highly complex task to perform the transformation of Moodle 1.9.x backups into the new, improved, Moodle 2.x backup format.

Too many things have changed between both versions to be able to achieve this easily. Surely it's computable (as long as upgrade is being able to apply the correct logic, so restore will) but it is going to be handled as a separate development that will allow us to 1) be centered in the 2.0 backup/restore and 2) be free (both in mind and implementation) from any 1.9.x XML format dependency.

Goals and basis

  • The main goal is to create one conversion tool, able to get any Moodle 1.9 backup file and convert it to one Moodle 2.0 backup file. The tool will not perform any modification in the Moodle instance where it is being executed (i.e.the tool won't import anything). Once the conversion has finished, the standard Moodle 2.0 restore functionality will perform its job.
  • The tool will get one already-uncompressed directory containing one Moodle 1.9 backup file and will create all the needed stuff in the same directory.
  • The tool will have one unique entry point to perform the conversion and will be able to be invoked from within Moodle 2.0 (restore, performing conversion on the fly) or standalone (from CLI interface of any sort of batch processing).
  • The tool will be built using as much infrastructure from Moodle 2.0 as possible (xml processors, plugins, core stuff, logging, debugging...) and will be full OOP (and php, of course).
  • Ideally the conversion will happen in only 1-pass of xml parsing.
  • The tool will be licensed under GPL3 or later and bundled with future releases of Moodle.

Introduction

Before any further analysis is highly recommended to:

  • Know as much as possible about the Moodle 2.0 backup infrastructure. Reading the general architecture page, looking at all the stuff available under the backup/moodle2 directories (each plugin has its own) and understanding how information flows and is stored.
  • Pay special attention and understand these parts of the Moodle 2.0 backup infrastructure:
    • XML parser: In charge of reading all the XML file and dispatching pieces to different processors (backup/util/xml/parser).
    • XML writer: In charge of writing XML files (util/xml/xml_writer.class.php and util/xml/output)
    • Chained loggers: To output any information about the process (backup/util/loggers).
    • Temporary tables, to store transient information if needed.

Physical differences

After reading the general architecture page, it's clear that there are huge differences between the XML used in Moodle 1.9 backups and their Moodle 2.0 counterparts. Let's dissect them briefly:

  • We have changed from one unique monolithic moodle.xml file to a multiple xml files (and directories) alternative.
  • The course, each section and each activity has its own directory.
  • There are some top-level XML files (groups, users, files, questions, roles, scales...) acting as "containers" of information referenced from any course/section/activity (by their inforef.xml files).
  • We have moved from uppercase tags to lowercase everywhere
  • We are using XML attributes for some pieces of information (mainly ids).
  • When possible (modules mainly) the old XML-tree structure has been observed.

Logical differences

A lot of things have changed from Moodle 1.9 to Moodle 2.0, causing a lot of modifications in the DB schema and, as backup is highly dependent of that (it's basically one representation of the DB), the information present in the physical xml files above has changed in a noticeable way too.

One good exercise is to create one, more or less, complete test course, backup it in Moodle 1.9, upgrade the site to Moodle 2.0 and backup it again. That way you'll see where each bit of information has ended in the new format. Also is a good task, in order to understand the changes, to read all the db/upgrade.php scripts present in Moodle 2.0. They show all the modifications performed along the time.

At this point it's important to avoid get stressed, yes, there are many differences and tons of modifications, some of them really complex. But good news is that, if the upgrade.php scripts have been able to perform the change then, for sure, it is computable and the conversion tool will be able to do the same.

Back to the logical differences, there are some of them needing special attention since day 0 (now):

  • files: Under Moodle 1.9, files were stored both in the named "course files" directory (one physical directory) and also under the "moddata" directory (files belonging to modules, like forum attachments, assignment submissions...). Under Moodle 2.0, all the files are physically stored into one file pool (a really cryptic hash-based storage solution) with one logical store in DB, with contexts, components, file areas, hashes and items. The conversion tool will need one sort of "file conversion tool" able to handle any transformation from the old to the new approach, used everywhere.
  • roles and enrolments: Moodle 1.9 enrolments were, simply, some role assignments performed at course level. This has been changed to a more powerful (dual) structure of both role assignments and proper enrollment-plugins. The conversion toll will need to do some "assumptions" in order to convert this information. Also it's important to note that, while in Moodle 1.9 backups the complete roles were being sent to XML (with all their capabilities and friends), this doesn't' happen anymore in Moodle 2.0 (where we only store basic information about them).
  • contexts: Under Moodle 1.9, the use of contexts (the course, each activity, each user is one, unique context) was really limited (basically only used for roles and capabilities. Under Moodle 2.0, there are way more information based on contexts (files, comments, ratings, filters, blocks...). Hence we'll need to assign one context for each component in the backup to be able to associate information to it along the conversion.
  • categories and questions: while the internal representation is pretty similar, now they are handled via standard plugins backup & restore in Moodle 2.0 (see question/type/xxxx/backup/moodle2) so some extra information (extra xml-levels) is required. The tool also should be handling the conversion in a plugin way, allowing other 3rd-part question types conversions to be implemented.
  • blocks: 90% of blocks haven't any special DB representation nor need especial processing, but the rest (see, for example, the html block or the rss_client one) need custom conversion. So the tool must allow conversion of blocks also in a plugin way.
  • activities: This is the most important part, where all the (educational POV) information resides. Of course, it must work in a plugin way. Here we'll find activities really easy to convert, because the only noticeable change has been their conversion to the new "files" (commented above). For example, the choice, glossary, forum... modules are good candidates to start with. But there are also some activities that have been completely rewritten / modified since Moodle 1.9. And they will be, without doubts, the most difficult ones to be handled by the conversion tool. Let's see some of them:
    • resource: Under Moodle 1.9 all we had were types of resources (page, directory...) but all them being types of the (unique) resource module. Under Moodle 2.0, the the module has been split into different modules, each one with its own DB representation and unique features (file, folder, page and url). This split will need to be detected and handled specially by the conversion tool.
    • wiki: Completely rewritten from scratch. Will need important changes both in the xml representation (100% different DB structure) and special process of all the contents (to translate from the old ewiki dialect to the new html implementation). Not to talk about binary contents that will also need special handling.
    • workshop: Also completely rewritten from scratch. Will need important changes in the xml representation, use of subplugins and special process of some contents.

Overall process

Here there is one ordered list that could serve as technical/development roadmap for the analysis and implementation of the Moodle 1.9 conversion tool. Surely some (practically all) the points below can be split into smaller tasks in order to allow work to advance in parallel. Ideally all them should have one representation as subtasks of MDL-22414. Each task has one alphanumeric code to be used as reference in the future:

  • A: Preliminary Tasks
    • A1: Initial report (this document)
    • A2: Organization / roles / actors involved
    • A3: Initial timeline specification
    • A4: Documentation / research / learn
    • A5: Testing use cases, QA specifications
  • B: Infrastructure Tasks
    • B1: XML Parser
    • B2: Conversion dispatcher
    • B3: XML writer
    • B4: Support for plugins (questions, blocks, modules...)
    • B5: Support for subplugins (workshop, assignments...)
    • B6: Integration with Moodle 2.0 restore, logging, output...
    • B7: Progress review / discuss / adjust / timeline
  • C: Conversion Development Tasks
    • C1: Globals (files / users / groups / scales / outcomes / roles / gradebook ...)
    • C2: Course level information
    • C3: Section level information
    • C4: Activity level information
    • C5: Commons (comments / completion / filters / grades / roles /logs ...)

Followups

See also

  • MDL-22414 - the main tracker issue for this task