Datalib notes

Revision as of 05:15, 17 May 2006 by Martin Langhoff (talk | contribs) (General Datalib usage Notes)

Jump to: navigation, search

General Datalib usage Notes

TODO!

Advanced Usage: get_recordset()

There's a new series of function calls in datalib that were introduced shortly after 1.5: get_recordset() and friends. It takes a bit of work to make good use of them, they are definitely useful.

The function calls themselves take similar parameters to the get_records() family, and return an AdoDB recordset. In most situations, the right thing is to use get_records(), making sure that you are selecting a limited number of records (use LIMIT of course wink ), and that you are clearly fetching only the fields you need. That is enough for 99% of the cases. Stop reading now, use get_records(), and done.

Easy!

Now, there are some cases where we need to perform huge selects. And the downside of the get_records() machinery is that it allocates it all in RAM -- and if indeed records > than RAM, we are in trouble. This mostly affects batch processing in admin/cron.php (backups for instance) and the silly sync scripts we have in some auth and enrolment plugins.

In those cases, where you have no option but to SELECT all the users, or all the courses, get_recordset() really shines. As you get an AdoDB recordset, you can iterate through it very fast, and with very low memory usage. I have prepared and profiled some samples and found that to keep that advantage you have to be very careful how you use it.

I started testing with a large mdl_user table with 46K users, stored in PostgreSQL locally (socket connection). Here is a rundown of the code and rough timings...

The traditional get_records() is fast (~14s), but it consumes 296MB of memory. It looks like:

 $users = get_records('user');

So the next thing was to check that just triggering the SELECT was fast, and that fetching and re-shaping all that data into memory was an expensive step. get_recordset() in itself never fetches the data into PHP-space, so it takes only 4.8MB and 1.4s:

 $users = get_recordset('user');

Next I tried using get_recordset() with FetchNextObj() which looked quite convenient, because it will yield each record in an object, similar to the objects that get_records() returns inside the array. The memory usage stayed the same (4.8M), but this is very slow (~65s):

 $users = get_recordset('user');
 while($v = $users->FetchNextObj()){
     // do something 
 }

Internally, get_records() is using GetAssoc(), so I looked in there to se faster ways of getting at the data. Apparently, the trick is in reading the $rs->fields array and using MoveNext(). So this loop still takes 4.8MB and walks the 46K user records in 4s:

 $users = get_recordset('user');
 while(!$users->EOF) {
     $user = $users->fields;
     // do something
     $users->MoveNext();
 }

Naturally, as soon as you start doing useful things, this will take longer to exec. The core advantage, however, is that it can give you comparable performance to get_records() but with very low memory usage.

There is an interesting discussion about get_recordset() in the General Developer Forum