Development:Backup 2.0 - Improve XML parsing
Development:Backup 2.0 -> Improve XML parsing
Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.
Original: https://docs.moodle.org/en/Development:Backup 2.0 - Improve XML parsing
Summary
Current (Moodle 1.x) restore uses too much memory parsing some "parts" of the XML information. We need to change current approach to one providing optimal memory usage and acceptable throughput. In this page you'll find the different alternatives researched, with their strengths and weaknesses. Finally one solution will be provided in order to be implemented in new Moodle 2.0 backup & restore. Also, if possible, some changes will be performed in Moodle 1.9 in order to achieve better results and solve bugs like MDL-14302, MDL-15489, MDL-9838...).
Research
Below, you will find some information about different methods used to perform the XML parsing of one 12.5MB file, corresponding to one real quiz module with only 788 attempts and 20115 states (answers). It has been proved to be the problematic "part" in one production server using current Moodle 1.9.x (the file isn't available here for privacy matters, obviously).
Each method, in order to be considered valid must fulfil these basic objectives:
- Parse the XML file.
- Provide one in-memory object with all the needed info in order to be processed later by the corresponding restore plugin.
For each method, some common information bits are provided (to allow comparisons later).
- name: mnemonic to easily reference each method later in comments.
- file size: the size of the original XML file being parsed.
- memory: max memory used by PHP in the execution (provided by memory_get_peak_usage()).
- time: time required to perform the parsing to memory (measured in seconds).
- data size: size of the in-memory generated object (final result of the execution).
- data format: specifications of the in-memory generated object (xmlize compatible, custom...).
Table of Results
This table summarises the raw results obtained running each method, providing links to details about each of them.
Method | File size | Memory | Time | Data size | Format |
---|---|---|---|---|---|
Method 0: Current behaviour (xmlize) | 14.5MB | 311.3MB | 4.8 seconds | 14MB | xmlize |
Method 1: SimpleXML parsing + conversion to xmlize format | 14.5MB | 165.8MB | 5.2 seconds | 14MB | xmlize |
Method 2: Method 2: SimpleXML parsing, no conversion | 14.5MB | 36.5MB | 0.5 seconds | 8.5MB | simplexml |
Method 3: Custom SAX parser + conversion to xmlize format | 14.5MB | 158.3MB | 47.5 seconds | 14MB | xmlize |
Method 4: Custom SAX parser + conversion to xmlize-reduced format | 14.5MB | 72MB | 51.9 seconds | 7.7MB | xmlize-reduced |
Method 5: Custom SAX parser + conversion to custom (simple) format | 14.5MB | 64.5MB | 15.1 seconds | 5.4MB | custom-simple |
The alternatives analysed
Here each of the alternatives is analysed and compared
Method 0: Current behaviour (xmlize)
Summary: file size: 12.5MB, memory: 311.3MB, time: 4.8 seconds, data size: 14MB, data format: xmlize format
Method 1: SimpleXML parsing + conversion to xmlize format
Summary: file size: 12.5MB, memory: 165.8MB, time: 5.2 seconds, ata size: 14MB, data format: xmlize format
Method 2: SimpleXML parsing, no conversion (use simplexml as final object)
Summary: file size: 12.5MB, memory: 36.5MB, time: 0.5 seconds, data size: 8.5MB, data format: simplexml format
Method 3: Custom SAX parser + conversion to xmlize format
Summary: file size: 12.5MB, memory: 158.3MB, time: 47.5 seconds, data size: 14MB, data format: xmlize format
Method 4: Custom SAX parser + conversion to xmlize-reduced format
Summary: file size: 12.5MB, memory: 72MB, time: 51.9 seconds, data size: 7.7MB, data format: xmlize-reduced format
Method 5: Custom SAX parser + conversion to custom (simple) format
Summary: file size: 12.5MB, memory: 64.5MB, time: 15.1 seconds, data size: 5.4MB, data format: custom simple format
Formats
In this section there are some explanations about the different object formats generated by the different methods:
XMLize format
SimpleXML format
XMLize-reduced format
Custom-simple format
Code
Here its' the code used for the different parsing methods commented above:
Method 0: Current behaviour
$contents = file_get_contents('moodle.xml');
$data = xmlize($contents);