Talk:Backup 2.0 - Improve XML parsing

Jump to: navigation, search

I moved this talk to the talk page.

Eloy, is that second one really necessary? Can't we find a way to process the data as we parse it? Mind you, a quick back of the envelope calculation shows that 12.5MB of XML is extreme from one quiz, so if Method 2 is that good, perhaps we can by lazy and load each activity into memory one at a time.--Tim Hunt 19:05, 1 March 2009 (CST)

Hi Tim, I'm analysing more things than simply the memory/speed considerations of current parsing. If I started with that, it's because of current bugs preventing people to restore 1.9 courses and wanted to prospect that ASAP. For 1.9 there is no chance to change the architecture, but perhaps we could use Method 2 or Method 5 selectively when restoring quizzes. About 2.0, the more I think on it, the more I'm about to split current monolithic moodle.xml into smaller piezes. That will cause another immediate memory reduction and speedup (it's order of magnitude more efficient to process 20*1MB files than 1*20MB file). Anyway, I'm not sure if we can made the whole process of parsing and restoring a pure-SAX process, because, for example, the attempt must be created BEFORE the states and, being formal, if we follow the SAX approach, the attempt tag hasn't been closed, hence, we haven't created it. So, perhaps we'll need to continue loading 1 module into memory. Luckly Method 2 looks really nice, lightspeed, compressed enough and cheap in memory usage). Let's see how the thing evolves, thanks! --Eloy Lafuente (stronk7) 08:22, 2 March 2009 (CST)
The option might be to make it possible for modules choose a smaller unit into which they could be divided. So, perhaps forum could somehow say it wants to work one thread at a time, and quiz might say it wants to process one attempt at a time (after restoring the core info as the first chunk.) And perhaps for a big group wiki, you might want to restore one group at a time.
I guess, what I am saying is, when creating the API, perhaps modules should only be able to access the data coming from the XML through an API that looks like get_recordset, rather than get_records. Then, even if at first we implement what goes on behind the API as just loading everything into memory; if this issue becomes a problem again, we will be able to do something about it in future.--Tim Hunt 08:55, 2 March 2009 (CST)
Yes, yes. Agree. Changing the "atom" loaded by each module (attempt for quiz, discussion for forum...) can be a good overall solution. And having one standard get_backup_parts(module, part, whatever) + iterator is a good idea. 100% agree. In any case, independent of the "atom" used and how we load them (together, separated...) the change to Method2 and SimpleXML adoption is, per se, one big improvement not available in previous Moodle versions. That's the bit trying to be defined in these early stages. But, as said, agree 100% with your previous comment (different atoms + api to handle them hiding internals). --Eloy Lafuente (stronk7) 11:46, 2 March 2009 (CST)