Fast portable SQL

Jump to: navigation, search
Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.

This is a work in progress! We keep notes here of bits of SQL that we know that

  • Work on all supported DBs (this is tricky!)
  • Are scalable and fast

A key strategy for scalability is to ensure we do as much work as possible in the DB with minimal back-and-forth with PHP. Modern DB engines have very smart optimizers and if we play to their strengths, can process millions of records in milliseconds. To hit those strengths we want to reduce chattiness with the DB -- one "smart" SQL query (even if complex) is thousand of times better than a foreach loop in PHP that issues a thousand SQL queries. At least it will be more than a thousand times faster ;-)


A good scalable way to update a lot of data is to combine the UPDATE with a SUBSELECT, to UPDATE the data directly from the DB itself. The rows never need to go back and forth to PHP, so this is extremely fast. The problem is that it is very hard to get it portable across DBs.


  • Pg and MySQL disagree on the extended syntax of UPDATE with JOIN
  • MySQL does not support having the table being updated in a subselect (but it supports it as JOINed table with a different alias - Pg does not support this)

Portable examples

Fix a broken relation - this is roughly equivalent to a subselect with a LEFT OUTER JOIN:

   UPDATE {$CFG->prefix}course 
   SET    category =  $defaultcategory
   NOT EXISTS (SELECT id FROM {$CFG->prefix}course_categories WHERE id={$CFG->prefix}course.category)
   AND {$CFG->prefix} != $SITEID

Delete chat_messages older than the (optional) keepdays setting in the related table chat. chat.keepdays the setting can be NULL or 0, both of which mean 'keep forever', so we have to be careful. This is somewhat clunky, but replaces hundreds of queries with 1. Notes...

  • If the subselect returns NULL or 0 it won't satisfy '>0' - so we handle both cases in one check
  • We substract $keepdays - 24 * 3600 ... if we add it we can sometimes exceed the boundaries of an int in PostgreSQL
  • Could be improved to run the subselect only once
    • Is there any DB that won't optimize this?
    • How can we do this portably?
  • The subselect works on MySQL because it does not refer to the table we are deleting from
   $subselect = "SELECT c.keepdays
                 FROM {$CFG->prefix}chat c
                 WHERE = {$CFG->prefix}chat_messages.chatid";
   $sql = "DELETE
           FROM {$CFG->prefix}chat_messages
           WHERE ($subselect) > 0 AND timestamp < ( ".time()." -($subselect) * 24 * 3600)";