XML database schema > XMLDB Problems
Every section on this page will show important problems present in the adoption of the new database layer planned for 1.7. Each section will define the problem, its impact and, when available, its solution. Every section will include one line containing its status ('Open', 'Decided, 'Work in progress', 'Finished', 'Tested' and 'Closed').
Of course, fell free to use the Bug Tracker to follow the daily evolution of each one of these problems by adding one link inside each section pointing to the proper bug.
Prior to change the status from 'Decided' to 'Work in progress' the corresponding page of the roadmap must be modified, including all the actions/steps needed to solve the problem. Obviously, at the end, the expected status for each section is 'Closed'.
- 1 Regular expressions
- 2 SQL Limit performance
- 3 Naming conventions
- 4 Naming conventions II
- 5 NOT NULL fields using a DEFAULT '' clause
- 6 Indexes on TEXT columns
- 7 ORDER BY text columns
- 8 MSSQL, PHP, UTF-8 and UCS-2
- 9 Reserved Words in Moodle DB
- 10 ADOdb fetch mode (ASSOC, NUM and BOTH)
- 11 Oracle, PHP 5.1.x and LOBs
- 12 Case-insensitive searches
- 13 Table and column aliases - the AS keyword
- 14 See also
(Status: Open - Severity: Critical - Bug: not defined)
Problems with Oracle (not available until 10g) and MSSQL (not available). Oracle 10g implements them using directly and one package existed since ages (Oracle 8i?) to handle them (owa_pattern). MSSQL can execute them by installing some stored procedures. Oh, oh, problems with some (a few) queries...
SQL Limit performance
(Status: Open - Severity: Not critical - Bug: not defined)
Analyse the impact of such SelectLimit() calls under SQL*Server, Oracle... because it's emulated by ADOdb on those DBs, because their lack of support for the LIMIT clause. As the offset parameter grows, ADOdb must iterate over more records to get the desired window and it could be a problem under long sets of records!
(Status: Open - Severity: Critical - Bug: not defined)
What to do with all the previously created DB objects (indexes, unique indexes, sequences...) if their naming schema doesn't fit with the implemented by ADOdb. Drop/recreate everything? Leave it unmodified? Can we use different naming schemes for each DB. PostgreSQL one seems more formal, is it the way for Oracle and SQL*Server. Also I would propose to add some option to the Health Center in order to be able to drop all the indexes and regenerate them following the exact naming rules specified in the XMLDB naming page.
Naming conventions II
(Status: Closed - Severity: Critical - Bug: 6230)
Oracle imposes one limit of 30cc. for naming DB objects (table, field, index...), and currently we have some objects over such limit: mdl_glossary_entries_categories and mdl_question_dataset_definitions. They must be renamed to anything else ASAP. No alternate solution exists. Also, once renamed we should restrict the maximum length of $CFG->prefix to avoid undesired situations, at least for new installations to, say, 5cc, limiting table names to 25cc (without the prefix). PostgreSQL before 7.4 limit seems quite near too (32 cc.).
Solution: Limit table/field/index/key/sequence names to 30cc and deny the use of prefixes for Oracle installations. See key and index naming to get info about how XMLDB constructors will be naming all the remaining objects in DB.
NOT NULL fields using a DEFAULT '' clause
(Status: Open - Severity: Critical - Bug: 6218 )
Under Moodle 1.6 there are tons of char/varchar/text/blob columns that are defined as NOT NULL and contains one DEFAULT clause with the value '' (empty string).
There are two important problems with this type of definitions:
- They haven't to much sense from a design perspective. If one field is defined as NOT NULL it shouldn't contain empty values (yep, agree that NULL != everything, included empty, but in a logical world I cannot find any reason to fill fields with empty values).
- Oracle considers empty strings as NULL values the DEFAULT application will crash in that RDBMS. (http://www.techonthenet.com/oracle/questions/empty_null.php)
The proper way to solve this situation should be to annihilate all those inconsistent/wrong combinations from the whole DB schema (NOT NULL + DEFAULT ''), but it can be really hard and risky to do it now (Moodle 1.6-1.7). So here it's the proposed solution:
- Don't modify 1.6 at all. Everything should continue working properly.
- In the XMLDB files, for 1.7, use the correct approach, i.e. all those fields won't have a default value in the XML structure, although they'll continue as NOT NULL there.
- In the XMLDB generator (create table), for MySQL, PostgreSQL and MSSQL, add the DEFAULT clause for all the char/text columns that haven't a DEFAULT defined in the XML structure.
- Add some check in the installation script to look for that MySQL modes being problematic.
With this approach we'll have MySQL and PostgreSQL servers running exactly without modification in 1.7, with MSSQL working in the same manner, and Oracle version would be more correct, from the DB perspective, although sure it won't work in a lot of places. Also, if the Oracle isn't usable at all, we can trick it a bit, both declaring all those fields directly as NULL (+1 for it) or adding one dummy default value like one whitespace. They aren't perfect solutions but should improve a bit the Oracle experience, although, under 1.7, it's only going to be ALPHA-BETA-quality (too much changes are required across all the code to solve the NULL/NOT NULL/DEFAULT problem, and time for 1.7 is really short).
After 1.7, we should start to modify MySQL, PostgreSQL and MSSQL implementation to use the same (correct) field definition (nullable columns) with more time to change/test/fix everything, progressively moving NOT NULL stuff to NULL if it's really nullable, dropping all those empty defaults.
Updated: For 1.7 we have started the migration of TEXT fields that, being logic, can contain no data, from NOT NULL to NULL, while we'll maintain VARCHAR fields as explained above. This will give us a real idea of the impact of such change across Moodle code while will produce one better (real) DB Schema. After 1.7, all the rest of VARCHAR fields should follow the same process, being transformed from NOT NULL to NULL if the application logic says that they can have no contents. Then, all those empty () defaults will be out. Definitively!
Here you'll find a list of TEXT fields transformed from NOT NULL to NULL:
Indexes on TEXT columns
(Status: Closed - Severity: Not Critical - Bug: not defined)
Each RDBMS implementation has completely different mechanisms in order to create FULLTEXT indexes against TEXT columns. Also, the SQL commands to perform searches against such columns are pretty different (incompatible). This forces us to avoid TEXT column indexing for now. Here there are the list of fields we have changed in the transition to the XMLDB schema to avoid such indexing:
Also, it's important to differentiate between FULLTEXT indexes (where we are going to index and search freely any text content) and FUNCTION indexes where we just index the results of the execution of one function. This last indexes aren't a suitable replacement for TEXT indexes, and aren't used at all if the EXACT" function call used to define the index isn't used by the SQL query.
In the long term, the XMLDB schema should be extended to support this type of indexes to perform better fulltext searches. At the time of writing this, some efforts are being performed by the Global search project, part of the Student projects.
Finally, note that a lot of RDBMS flavours aren't able to keep all those FULLTEXT indexes updated online all they need to be synchronised/optimised continuously by "external" agents (stored procedures, command line executables, cron tasks...).
Updated: all the indexes currently existing under Moodle were present in the HotPot module. After some interesting emails with the author (Gordon Bateson), he is going to implement a workarround in the module in order to save the problem. It includes creating some new columns (of type VARCHAR, index-able) with some md5() hash from the original text. With this approach the module will be able to do comparisons of hashes using indexed fields (the hashes themselves). Note that this solution isn't suitable to perform searches in an optimised way, for them, FULLTEXT indexes are needed as explained above.
Some interesting links about FULLTEXT indexing/search/maintain:
- http://mysql.com/doc/refman/5.0/en/fulltext-search.html: Fulltext for MySQL
- http://www.devx.com/opensource/Article/21674/0/page/1 : Tsearch for PostgreSQL
- http://www.oracle.com/technology/products/text/htdocs/ctxcat_primer.html : Oracle CTXCAT indexes (simpler) and http://www.oracle-base.com/articles/9i/FullTextIndexingUsingOracleText9i.php : Oracle CONTEXT indexes (the complex alternative).
- http://www.databasejournal.com/features/mssql/article.php/1438211: Fulltext indexes for MSSQL.
ORDER BY text columns
(Status: Closed - Severity: Not Critical - Bug: not defined)
Not all RDBMS support SELECT sentences using the ORDER BY clause with any TEXT fields. To solve this problem this strategy will be used:
- Avoid ordering your queries by TEXT columns.
- If the previous one isn't possible and you need to order by any text column, use the sql_order_by_text($columname). They will create the needed SQL code to be able to run your queries and to get ordered results.
MSSQL, PHP, UTF-8 and UCS-2
(Status: Finished - Severity: Critical - Bug: not defined)
Practically all the modern RDBMS support more than one encoding to be used, both to store information in such encodings or to handle data between clients and servers with automatic conversions. More properly, all them support natively the UTF-8 encoding that is currently used by practically all the web applications, mobile widgets and all sort of devices.
But there is one notable exception, SQL*Server. It exclusively supports the UCS-2 encoding (another Unicode flavour) to be used against their special nchar columns. Worse is the fact that Microsoft doesn't offer any reliable solution for 3rd part clients inside their plethora of connectivity solutions (ODBC, OLE, COM...) to transparently perform the conversion from/to UCS-2 and the desired encoding (UTF-8 for Moodle).
You can see the solutions suggested by Microsoft in: http://support.microsoft.com/kb/232580/EN-US/ :-P
Solutions provided in that article, although could help to some people, aren't ok for Moodle. We need to use PHP and be able to send and receive UTF-8 data from/to the server, storing it properly in the server side (in UCS-2) to be able to handle it properly.
After a lot of time googling the net it seems that the basis of the problem is the called DBLIB library (the dll file has one unpronounceable name). This library is used by practically all the DB clients to build their interfaces. PHP and others rely on this library to build their own modules of access to SQL*Server databases. And this base library lacks UTF-8 support (or at least it doesn't offer conversion from/to UTF-8).
With this on mind, a research of alternative solutions began. Any solution should fulfil these characteristics:
- Convert transparently from UTF-8 (PHP and Moodle) to UCS-2 (SQL*Server) so everything was properly stored on each side.
- Obviously, be supported by PHP 4.3.x upwards (min req. for Moodle).
- Be supported by ADOdb, our Database Abstraction Library for PHP.
- Be available both for Unix and Windows environments.
- If possible, be able to replace the standard php_mssql extension that doesn't support the automatic conversion. This would be great because mssql support from ADOdb is pretty good (better that other alternatives like COM, OLE...).
And finally, they exist! We've found two products (OpenSource, of course) that fulfil all the requirements detailed above. Both perform the automatic conversion between UTF-8 and UCS-2, replace the standard php_mssql library so we'll continue using the well supported mssql ADOdb driver and are cross-platform. Anything more, nah, here they are:
- FreeDTS: This a 100% client product that it's based in the [:wiki:Tabular_Data_Stream Tabular Data Protocol] supported both SQL*Server and Sybase. We have compiled and used it both under Linux and MacOS X without problems and with one minimal configuration it replaces the standard php_mssql extension smoothly. The documentation says it works also under Win32 systems but we haven't tried it there. Tons of reports in Internet talk about it as a reliable solution.
- ODBTP: This is another protocol, Open Database Transport Protocol, that allows to connect virtually to any Win32 DB server. It requires both one client installation and one service to be installed in one Win32 machine (to act as bridge between the client and the DB, via ODBC). The client part is available for Linux and MacOS X (not tested) and also implements one replacement for the standard php_mssql extension. The service part is obviously, available only for Win32. After a minimal configuration it works perfectly and it's announced as reliable for critical missions.
(perhaps some HOWTO about installation could help here, plus some direct download links, although it isn't really difficult to configure/compile/install them)
- Some more explanations: http://forums.devshed.com/ms-sql-development-95/odbtp-good-solution-for-connecting-to-mssql-from-linux-unix-127004.html
Reserved Words in Moodle DB
(Status: Work in Progress - Severity: Critical - Bug: [http://tracker.moodle.org/browse/MDL-6307 MDL-6307])
Some of the DB names used in Moodle 1.6 are reserverd words (words that are forbidden to be used as identifiers for DB objects). The list of currently detected Reserved Words is:
- assignment_submissions->comment (oracle)
- forum->open (mssql) Done! (this field is out in 1.7)
- glossary_comments->comment (oracle) Done! (glossary_comments->entrycomment)
- journal_entries->comment (oracle)
- question_dataset_items->number (oracle)
- question_sessions->comment (oracle) Done! (question_sessions->manualcomment)
- resource (oci8po)
- timezone->rule (mssql) Done! (timezone->tzrule)
- user (mssql, oracle, postgres)
(the format of the list is table->field (reserved on DB)
Note that reserved words aren't dangerous if we are using prefixes for tables (so we should force prefix usage under Moodle 1.7 for NEW installations, 1-2cc for oracle and 1-10cc for the rest).
- See XMLDB reserved words for more details and info.
- See [http://tracker.moodle.org/browse/MDL-6307 MDL-6307] about developers discussion and implementation details.
ADOdb fetch mode (ASSOC, NUM and BOTH)
(Status: Work in Progress - Severity: Critical - Bug: not defined)
By default we use ADOdb under FETCH_BOTH mode. This returns mixed recordsets with both numeric keys (0, 1, 2...) and literal keys (the name of the fields). And then, we get an associative array from such recordset.(for all the get_records_XXX functions). Let's see one example:
If we have one table, call it "people" with three columns (id, name, address) and with this data:
666, 'Bill', 'Medina 777, 'Steve', 'Palo Alto'
So, one simple query like this:
SELECT id, name, address FROM people;
Will return one recordset with this structure in PHP:
array (  => stdClass Object (  = 666 [id] = 666  = Bill [name] = Bill  = Medina [address] = Medina}  => stdClass Object (  = 777 [id] = 777  = Steve [name] = Steve  = Palo Alto [address] = Palo Alto) )
And, when we get the associative array it will be transformed to:
array (  => stdClass Object ( [id] = 666  = Bill [name] = Bill  = Medina [address] = Medina}  => stdClass Object ( [id] = 777  = Steve [name] = Steve  = Palo Alto [address] = Palo Alto) )
i.e. the first attribute of the original recordset is used to build the key of the final array, disappearing from the array itself. In our example above, the first column in the recordset was the index .Their values (666, 777) have been used as key in the associative array and it has disappeared from it.
We have been using this mechanism to work with associative arrays inside Moodle, where both the key and the [id] element were pointing to the same value, being possible to handle both fields. But this trick has two major drawbacks:
- All the data is transmitted and stored twice. As you see in the original recordset, the data of all the fields is duplicated, while we aren't using the , ... indexes at all.
- Although the order until now, both for MySQL and PostgreSQL always retrieve the numeric indexes first, and then the named indexes, some drivers invert this order, returning the named indexes first and then the numerical ones. And this have highly negative consequences in the the final structures that are returned by the get_records_XXX() functions, losing completely the [id] key, that is used across all Moodle.
So, this solution is proposed:
- Done!: In order to solve (2) ASAP, we'll duplicate the first field of each record in the recordset before calling to GetAssoc(). With this we guarantee that nothing is lost and the field becoming key in the associative array will continue in the record.
- Done!: Also, the ODBTP driver has been "fixed" to return fields in the correct order. Thanks to Bob for his support.
- After 1.7, we should change our approach from FETCH_BOTH to FETCH_ASSOC to solve (1), retrieving/handling exactly half the data. All the places using the numerical keys will be updated (although they should be practically non-existent). The hack in the previous point will allow us to work with only named fields while our dear first field (id) well be maintained after building the associative array.
- UPDATE: This has been implemented (but not committed) by Eloy and Luke and is working well. Additionally, the patch is small and only touches dmlib.php. We will implement the changes in CVS HEAD (for 1.7) and it will be tested thoroughly -- as part of the whole set of changes. If it proves to be a problem, it is trivial to revert.
Oracle, PHP 5.1.x and LOBs
(Status: Closed - Severity: Critical - Bug: not defined)
Running under Oracle, it seems that it's impossible to handle fields having CLOB/BLOB data. While it's possible to send data to LOB columns, it cannot be retrieved from them and the PHP script ends with a timeout.
Process: After spending a lot of time tracing the problem under ADOdb internals, it seemed to be produced by the OCI function: ocifetchinto() when used with the OCI_RETURN_LOBS setting in order to retrieve contents of LOB columns in a single pass. We've tried some alternatives like ociloadlob(), OCI-Lob->read() and OCI-Lob->load() but all them were crashing. Finally we found one bug documented in PHP. So the solution seems to avoid PHP <= 5.1.4 completely if Oracle is going to be used. Just trying it now.
Bad news, 5.1.6 doesn't seems to solve the problem. :-( Testing with 4.3.11 now. Also we've filled this PHP Bug trying to get a solution for PHP 5.1.x).
More news: Oracle LOBs seem to be working back with PHP 5.2.x (release candidate at the time of writing this). I hope they'll back-port it to 5.1.x series too. Stay tuned on http://bugs.php.net/bug.php?id=38612
Solution: To use PHP 4.3.x, 4.4.x or PHP 5.2 standard distributions or to build PHP 5.1.x with oci8-1.2.2 or later (download it from http://pecl.php.net/package/oci8 or install it dynamically with PEAR/PECL).
(Status: Open - Severity: NotCritical - Bug: not defined)
There are important differences about how each RDBMS performs searches, both using the "=" and the "LIKE" operators. Both MySQL and MSSQL performs case-insensitive searches, while PostgreSQL and Oracle perform case-sensitive searches.
For PostgreSQL, the "ILIKE" operator can be used and it's being used since ages in Moodle. For Oracle, the main problem is that a reliable solution isn't available before Oracle 10gR2. If we stabilish such version as minimum requirement for Moodle, we could use both the regexp_like() function with the i (case-insensitive) switch or the NLS_COMP=LINGUISTIC setting that produces makes searches case-insensitive.
So, the solution, is to raise min reqs. for Oracle up to 10gR2 and use one of the solutions above (2nd one being simpler and more transparent).
Table and column aliases - the AS keyword
(Status: Open - Severity: Critical - Bug: not defined)
Every DB use it's own approach about using table and columns aliases using the AS keyword. This is the summary:
- MySQL: supports AS in tables, supports AS in fields.
- PostgreSQL: supports AS in tables, requires AS in fields.
- Oracle: forbids AS in tables, supports AS in fields.
- MSSQL: supports AS in tables, supports AS in fields.
So, with this information, the only cross-db strategy is:
- Don't use the AS keyword for all the table aliases.
- Use' the AS keyword for all the column aliases.
Other solutions could be to use some constants to apply the keyword as necessary, but it would make a lot of SQL statements to look awful. Until we don't get another RDBMS breaking the previous strategy, it will be applied.
Also, a big search and destroy in required in order to fine-applying the above premises across Moodle code. They shouldn't be really too many wrong uses.
Eloy, should you be editing the Coding page as you find things like this. Maybe add a new section for rules for writing custom SQL where necessary, and change General rule number 8.
I would suggest that make a new policy: when a module needs custom SQL statements, they should be in funcitons in a datalib.php file within that module folder.Tim Hunt 11:17, 11 September 2006 (CDT)