Difference between revisions of "Backup 2.0"

Jump to: navigation, search
Line 39: Line 39:
  
 
[[Category:Backup]]
 
[[Category:Backup]]
 +
{{Template:Development:Backup 2.0}}{{Moodle_2.0}}__NOTOC__{{Work in progress}}
 +
== Summary & Objectives ==
 +
The backup & restore functionality has been present since Moodle 1.1 (29 August 2003, more than six years ago!) and, although has been continuously improved and renewed, its core has remained basically the same along the time.
 +
 +
Many things have evolved since then both in Moodle and PHP-land so this project is an attempt to rework the backup & restore functionality in order to achieve these goals:
 +
 +
* '''Modernise''' code base, using actual PHP/DB techniques: OOP, exceptions, temporary DB tables, transactions... aimed to reduce current ''"spaghetti code"'' in various places.
 +
* '''Unify''' current developments, so all the backup alternatives (manual, export, silent, scheduled...) will be executed by the same code base (also for restore/import, of course).
 +
* '''Improve''' the process, making it more reliable, being able to pre-detect wrong situations, rollback (if possible) to previous states, and applying any possible benefit both for speed (some interim in-memory caches) and memory (better parsing techniques).
 +
* '''Security''', so all the Moodle permissions structure will be 100% respected and any disclosure of privacy information will be avoided, of course, giving the admins the flexibility to change default behaviours by configuring them in a standard way.
 +
* '''New features''' requested from the Community will be added. Things like fine grained control of various restore parts, abilities to backup not only course structures, but sections or individual activities, anonymised backups and so on will debut with Moodle 2.0. See the [[Backup 2.0 requirements|requirements]] section for more info.
 +
 +
== Requirements ==
 +
 +
This section summarises the  [[Backup 2.0 requirements|Backup 2.0 requirements]] document, in order to get a labelled list of requirements which consecution will lead to fulfil all the original [[Backup 2.0 drop in ideas|drop in ideas]] detected/suggested along the last months. Each requirement, if necessary, will have is own explanation/page to have it properly defined.
 +
 +
== Overview ==
 +
 +
- Note about 1.9 pending tasks: Creation of users & creation and assignment of roles/caps. Must be in 1.9.8.
 +
 +
About 2.0 backup/restore:
 +
* Change format dramatically, splitting current monolithic moodle.xml into smaller pieces (1xplugin, 1xusers, 1xroles, 1x...). Working in experimental way since 1.9, proved to be great for speed (saves ~20 repeated parsings of the whole file). No dependencies from previous stuff at all.
 +
* Change the current "xmlize" approach when restoring files to one new progressive_parser instead. Each part of restore will decide how it wants to receive the information and who is in charge of processing it. Great for memory.
 +
* Allow 1-activity, 1-section and 1-course backup. Use different shells/envelopes for any type of backup, keeping the "content" the same.
 +
* Unify all multiple backup variations (course backup, course export, silent backup, scheduled backup) into one unique codebase for all them. Same for multiple restore.
 +
* Backup and restore execution function (only ONE function, the executor!!) will be "blind": They won't perform decisions based on (changing) roles/capabilities/configuration settings any more. One master "backup_director" object will root them once the object has performed all validations and ensured the process is executable. That director will handle/decide/instruct the executor about output/logging and everything else.
 +
* Store complete information about any backup/restore performed in the site. That will include all the information for the the director (array-serialised) and the results of the executor. And obviously all the logged info cached when running.
 +
* Important UI changes:
 +
** UI won't make any process! (like pre-calculating, or post-checks and friends). Separate it completely from the "director" and the "executor".
 +
** Complete site/user defaults
 +
** view the backup/restore form in "course" organisation (inline? nah for 2.0 IMO) instead of grouped by activity mode. Surely over multiple configuration "steps".
 +
** Bunch of new detail-options (restore section descriptions, overwrite course settings..., usually requested).
 +
* Reuse code like crazy. Some things like the "id-remapping" on restore, the logs output, the "flush" output and so on will be central and any point of restore will be able to request that "services" or "utilities" when desired in a easy way (mini PHP-API to define things in a declarative way VS one/multiple helpful backup/restore "utilities" classes).
 +
 +
== General architecture ==
 +
 +
== Known problems ==
 +
 +
* '''[[Backup 2.0 - Improve XML parsing|Improve XML parsing]]''': One of the major bottlenecks in backup and, specially, in restore reliability under Moodle 1.x has been the big amount of memory needed to handle those operations (see bugs like MDL-14302, MDL-15489, MDL-9838... and many others). While the whole XML file (moodle.xml) is parsed in a [http://en.wikipedia.org/wiki/Simple_API_for_XML SAX] way (hence, "streamed" and requiring small amounts of memory),  we use to group some XML contents into "parts" in order to delegate the operations over those "parts" to different plugins (modules, blocks...). And problems arrive when some of those "parts" is big enough and processing them with the [http://php.net/xml_parse_into_struct xml_parse_into_struct()] function and the [http://www.hansanderson.com/php/xml/ xmlize] library uses exaggerated amounts of memory (for example, to process a 12.5MB file requires 311MB of memory, crazy!). So we need to switch to an alternate method to parse those "parts" using much less memory and to build the corresponding in-memory object with one acceptable throughput (speed).
 +
* '''[[Backup 2.0 - Provide upwards compatibility of Moodle 1.9.x backups|Upwards compatibility of Moodle 1.9.x backups]]''': While the [[Backup 2.0 multiple formats|Backup 2.0 multiple formats]] document describes how different formats will be supported by the restore process in Moodle 2.0, it's going to be one highly complex task to perform the transformation of Moodle 1.9.x backups into the new, improved, Moodle 2.x backup format. Too  many things have changed between both versions to be able to achieve this easily. Surely it's computable (as long as upgrade is being able to apply the correct logic, so restore will) but it is going to be handled as one separate development that will allow us to 1) be centred in the 2.0 backup/restore and 2) be free (both in mind and implementation) from any 1.9.x XML format dependency.
 +
 +
== Implementation plan ==
 +
 +
== See also ==
 +
 +
=== To recollect ===
 +
 +
* All the existing/required caps define behaviour for each one.
 +
* All the zones and splitter.
 +
* Files backup/ restore.
 +
* 1.9 => 2.0 transformations.
 +
* Shells, separate structure and activities/blocks.
 +
* Backup by "context". Each "context"(or inner items) will have:
 +
** users
 +
** files
 +
** logs
 +
** blocks
 +
** role assignments
 +
** role overrides
 +
** completion info
 +
** conditional info
 +
** comments
 +
** tags
 +
** questions and categories
 +
** filters
 +
** metadta
 +
(alternatively, all this parts can be found at the end of the backup file, globally for all contexts)
 +
 +
=== Some points ===
 +
 +
* Need stamps for users (perhaps not because of MDL-16658) and roles (like question/categories ones) to guarantee uniqueness.
 +
* Need to decide about backup file names (moving from .zip to .xxx). Multiple extensions or prefixed info to "tell more" about the backup type.
 +
* 1.9 => 2.0 restore:
 +
** Really complex, needs to mimic all current logic in upgrade.php scripts:
 +
*** xml transformations --> relatively easy
 +
*** files migration (course and moddata)
 +
*** wiki, resources, workshop transformations (complex)
 +
*** incoming question bank changes.
 +
*** surely other stuff (comments, messages, blogs)
 +
** Possible solutions:
 +
*** How:
 +
**** Reuse as much as possible code from upgrade scripts. Not everything is re-usable (bulk changes).
 +
**** Re-implement all the logic in restore.
 +
**** Forgetting about it
 +
*** When:
 +
**** In one step, so Moodle 2.0 will natively-on-the-fly be able to process 1.9 files. Impossible!
 +
**** Keep it within Moodle, so conversion will be executed transparently immediately before restore (in the original plan at [[Backup 2.0 multiple formats]].
 +
**** Completely separate utility, so files need to be transformed before being accepted by restore.
 +
**** Never
 +
*** Where:
 +
**** Restore everything into some duplicate tables in order to do the conversion there.
 +
**** Do that into memory/some specialised tables not duplicating anything.
 +
**** Nowhere
 +
* Scheduled backups (it's just a "timed" backup iterator). Do we want it to continue being part of Moodle or separate utility. Using moodle cron or its own one?
 +
* Incremental backup/restore. Is also a external thing. We provide constant "content" by separating envelopes and content, but all the "diff", "patch" stuff is no related with Moodle at all.
 +
* Structures over course level. They are a pain. Can lead to non-restorable courses or differently-behaviour courses.
 +
* ... more ideas/things..

Revision as of 15:08, 8 December 2009

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

Moodle 2.0


Drop in ideas

  • Enable Backup and restore of a Course and it's related meta courses IN ONE HIT to save on adminstration time.
  • Support incremental backups & restore.
  • Support 1-activity backup.
  • Backup/restore one topic only (not just one activity) --Samuli Karevaara 21:29, 5 March 2009 (CST)
  • Support anonymisation of personal data on backup.
  • Separate process and progress. (Perhaps these classes I wrote can help)--Tim Hunt 19:41, 1 March 2009 (CST)
  • XML format doesn't need to be radical different (IMO).
  • Support restore of old (1.9 only?) backups.
  • Hook in backup & restore to intercept & support other formats (BB, IMS-CC...)
  • One code base both for manual and scheduled backup
  • Scheduled restore
  • Fix restore so that one can also select to restore course settings (they are currently backed up, but not restored in any restore function.)
  • Fileless export/import (aka fix import function so that one can choose to import blocks, course settings, etc. in addition to resources/activities)
  • Export/import over mnet
  • OOP.
  • Secure (don't handle non-allowed data, user matching on restore, salted passwords..)
  • Safe (some sort of "rollback" on failure - requires to have everything annotated somewhere - backup_ids or so).
  • file-less backups - if you have separate backups, import/export etc.
  • Allow to backup one category of courses (at the same time, one zip per course). MDL-17187
  • Allow to mark courses to be excluded from scheduled backup manually.
  • Avoid anti-timeout output when not running from browser. MDL-17282
  • Review related caps, cleaning them and adding missing bits, improving sec. consistency.
  • Log all backups (both manual and scheduled). Improve logging in general.
  • Separate each module's portion of XML into its own namespace.
  • Use the namespaces to allow each module the option of validating it's XML content before processing.
  • Roll dates: on restore or import, allow instructor to input a start date, and roll all assignment, quiz, etc. dates forward based on the new start date.
  • Backup would need to be called when publish a course on the community hub (course template)
  • Metadata... to be included in backup/restore... something like: Metadata
  • Allow backup of roles with permissions, assigns and overrides. MDL-17081

General prerequisites

Improve XML parsing

One of the major bottlenecks in backup and, specially, in restore reliability under Moodle 1.x has been the big amount of memory needed to handle those operations (see bugs like MDL-14302, MDL-15489, MDL-9838... and many others). While the whole XML file (moodle.xml) is parsed in a SAX way (hence, "streamed" and requiring small amounts of memory), we use to group some XML contents into "parts" in order to delegate the operations over those "parts" to different plugins (modules, blocks...). And problems arrive when some of those "parts" is big enough and processing them with the xml_parse_into_struct() function and the xmlize library uses exaggerated amounts of memory (for example, to process a 12.5MB file requires 311MB of memory, crazy!). So we need to switch to an alternate method to parse those "parts" using much less memory and to build the corresponding in-memory object with one acceptable throughput (speed).Template:Development:Backup 2.0Moodle 2.0

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

Summary & Objectives

The backup & restore functionality has been present since Moodle 1.1 (29 August 2003, more than six years ago!) and, although has been continuously improved and renewed, its core has remained basically the same along the time.

Many things have evolved since then both in Moodle and PHP-land so this project is an attempt to rework the backup & restore functionality in order to achieve these goals:

  • Modernise code base, using actual PHP/DB techniques: OOP, exceptions, temporary DB tables, transactions... aimed to reduce current "spaghetti code" in various places.
  • Unify current developments, so all the backup alternatives (manual, export, silent, scheduled...) will be executed by the same code base (also for restore/import, of course).
  • Improve the process, making it more reliable, being able to pre-detect wrong situations, rollback (if possible) to previous states, and applying any possible benefit both for speed (some interim in-memory caches) and memory (better parsing techniques).
  • Security, so all the Moodle permissions structure will be 100% respected and any disclosure of privacy information will be avoided, of course, giving the admins the flexibility to change default behaviours by configuring them in a standard way.
  • New features requested from the Community will be added. Things like fine grained control of various restore parts, abilities to backup not only course structures, but sections or individual activities, anonymised backups and so on will debut with Moodle 2.0. See the requirements section for more info.

Requirements

This section summarises the Backup 2.0 requirements document, in order to get a labelled list of requirements which consecution will lead to fulfil all the original drop in ideas detected/suggested along the last months. Each requirement, if necessary, will have is own explanation/page to have it properly defined.

Overview

- Note about 1.9 pending tasks: Creation of users & creation and assignment of roles/caps. Must be in 1.9.8.

About 2.0 backup/restore:

  • Change format dramatically, splitting current monolithic moodle.xml into smaller pieces (1xplugin, 1xusers, 1xroles, 1x...). Working in experimental way since 1.9, proved to be great for speed (saves ~20 repeated parsings of the whole file). No dependencies from previous stuff at all.
  • Change the current "xmlize" approach when restoring files to one new progressive_parser instead. Each part of restore will decide how it wants to receive the information and who is in charge of processing it. Great for memory.
  • Allow 1-activity, 1-section and 1-course backup. Use different shells/envelopes for any type of backup, keeping the "content" the same.
  • Unify all multiple backup variations (course backup, course export, silent backup, scheduled backup) into one unique codebase for all them. Same for multiple restore.
  • Backup and restore execution function (only ONE function, the executor!!) will be "blind": They won't perform decisions based on (changing) roles/capabilities/configuration settings any more. One master "backup_director" object will root them once the object has performed all validations and ensured the process is executable. That director will handle/decide/instruct the executor about output/logging and everything else.
  • Store complete information about any backup/restore performed in the site. That will include all the information for the the director (array-serialised) and the results of the executor. And obviously all the logged info cached when running.
  • Important UI changes:
    • UI won't make any process! (like pre-calculating, or post-checks and friends). Separate it completely from the "director" and the "executor".
    • Complete site/user defaults
    • view the backup/restore form in "course" organisation (inline? nah for 2.0 IMO) instead of grouped by activity mode. Surely over multiple configuration "steps".
    • Bunch of new detail-options (restore section descriptions, overwrite course settings..., usually requested).
  • Reuse code like crazy. Some things like the "id-remapping" on restore, the logs output, the "flush" output and so on will be central and any point of restore will be able to request that "services" or "utilities" when desired in a easy way (mini PHP-API to define things in a declarative way VS one/multiple helpful backup/restore "utilities" classes).

General architecture

Known problems

  • Improve XML parsing: One of the major bottlenecks in backup and, specially, in restore reliability under Moodle 1.x has been the big amount of memory needed to handle those operations (see bugs like MDL-14302, MDL-15489, MDL-9838... and many others). While the whole XML file (moodle.xml) is parsed in a SAX way (hence, "streamed" and requiring small amounts of memory), we use to group some XML contents into "parts" in order to delegate the operations over those "parts" to different plugins (modules, blocks...). And problems arrive when some of those "parts" is big enough and processing them with the xml_parse_into_struct() function and the xmlize library uses exaggerated amounts of memory (for example, to process a 12.5MB file requires 311MB of memory, crazy!). So we need to switch to an alternate method to parse those "parts" using much less memory and to build the corresponding in-memory object with one acceptable throughput (speed).
  • Upwards compatibility of Moodle 1.9.x backups: While the Backup 2.0 multiple formats document describes how different formats will be supported by the restore process in Moodle 2.0, it's going to be one highly complex task to perform the transformation of Moodle 1.9.x backups into the new, improved, Moodle 2.x backup format. Too many things have changed between both versions to be able to achieve this easily. Surely it's computable (as long as upgrade is being able to apply the correct logic, so restore will) but it is going to be handled as one separate development that will allow us to 1) be centred in the 2.0 backup/restore and 2) be free (both in mind and implementation) from any 1.9.x XML format dependency.

Implementation plan

See also

To recollect

  • All the existing/required caps define behaviour for each one.
  • All the zones and splitter.
  • Files backup/ restore.
  • 1.9 => 2.0 transformations.
  • Shells, separate structure and activities/blocks.
  • Backup by "context". Each "context"(or inner items) will have:
    • users
    • files
    • logs
    • blocks
    • role assignments
    • role overrides
    • completion info
    • conditional info
    • comments
    • tags
    • questions and categories
    • filters
    • metadta

(alternatively, all this parts can be found at the end of the backup file, globally for all contexts)

Some points

  • Need stamps for users (perhaps not because of MDL-16658) and roles (like question/categories ones) to guarantee uniqueness.
  • Need to decide about backup file names (moving from .zip to .xxx). Multiple extensions or prefixed info to "tell more" about the backup type.
  • 1.9 => 2.0 restore:
    • Really complex, needs to mimic all current logic in upgrade.php scripts:
      • xml transformations --> relatively easy
      • files migration (course and moddata)
      • wiki, resources, workshop transformations (complex)
      • incoming question bank changes.
      • surely other stuff (comments, messages, blogs)
    • Possible solutions:
      • How:
        • Reuse as much as possible code from upgrade scripts. Not everything is re-usable (bulk changes).
        • Re-implement all the logic in restore.
        • Forgetting about it
      • When:
        • In one step, so Moodle 2.0 will natively-on-the-fly be able to process 1.9 files. Impossible!
        • Keep it within Moodle, so conversion will be executed transparently immediately before restore (in the original plan at Backup 2.0 multiple formats.
        • Completely separate utility, so files need to be transformed before being accepted by restore.
        • Never
      • Where:
        • Restore everything into some duplicate tables in order to do the conversion there.
        • Do that into memory/some specialised tables not duplicating anything.
        • Nowhere
  • Scheduled backups (it's just a "timed" backup iterator). Do we want it to continue being part of Moodle or separate utility. Using moodle cron or its own one?
  • Incremental backup/restore. Is also a external thing. We provide constant "content" by separating envelopes and content, but all the "diff", "patch" stuff is no related with Moodle at all.
  • Structures over course level. They are a pain. Can lead to non-restorable courses or differently-behaviour courses.
  • ... more ideas/things..