Fast portable SQL

Revision as of 05:10, 14 January 2008 by Martin Langhoff (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This is a work in progress! We keep notes here of bits of SQL that we know that

  • Work on all supported DBs (this is tricky!)
  • Are scalable and fast

A key strategy for scalability is to ensure we do as much work as possible in the DB with minimal back-and-forth with PHP. Modern DB engines have very smart optimizers and if we play to their strenghts, can process millions of records in milliseconds. To hit those strenghts we want to reduce chattiness with the DB -- one "smart" SQL query (even if complex) is thousand of times better than a foreach loop in PHP that issues a thousand SQL queries. At least it will be more than a thousand times faster ;-)


UPDATE with JOINs

A good scalable way to update a lot of data is to combine the UPDATE with a SUBSELECT, to UPDATE the data directly from the DB itself. The rows never need to go back and forth to PHP, so this is extremely fast. The problem is that it is very hard to get it portable across DBs.

Problems

  • Pg and MySQL disagree on the extended syntax of UPDATE with JOIN
  • MySQL does not support to have the table being updated in a subselect (but it supports it as JOINed table with a different alias - Pg does not support this)

Portable examples

Fix a broken relation - this is roughly equivalent to a subselect with a LEFT OUTER JOIN:

   UPDATE {$CFG->prefix}course 
   SET    category =  $defaultcategory
   WHERE
   NOT EXISTS (SELECT id FROM {$CFG->prefix}course_categories WHERE id={$CFG->prefix}course.category)
   AND mdl_course.id != $SITEID