UTF-8 PostgreSQL

Jump to: navigation, search

Note: You are currently viewing documentation for Moodle 2.0. Up-to-date documentation for the latest stable version is available here: UTF-8 PostgreSQL.

UTF-8 migration > Database conversion > PostgreSQL notes

This page is under construction!!

PostgreSQL UTF8 Migration How-to

This is a guide on how to migrate your Moodle installation PostgeSQL database to UTF8.

All the data in the existing database must be translated to valid UTF8 characters. As PostgreSQL does not allow you to change the database encoding of an existing database, you need to create a new database with encoding UNICODE (PostgreSQL 7.x) or UTF8 (PostgreSQL 8.x). If you are already using a database with UNICODE/UTF8 encoding, the migration will process the data and save back to your existing database.

This process requires a *nix operating system, like any linux distribution or MacOSX. It will not work on a Windows version of PostgreSQL as Windows does not have the command line tools "grep" and "iconv".

Migration Procedure

  1. Create a new database with encoding UNICODE or UTF8 if you are not using one.
  2. Go to the admin panel and click on the migration link.
  3. The script will check for the settings required to commence the migration.
  4. If your current database is not in UNICODE/UTF8, the script will require you to provide the connection parameters of the UTF8 database.
    NOTE: You might notice that there is an extra option called "PostgreSQL Cluster". This is intended to pass the "--cluster" parameter to the PostgreSQL command line tools while copying the database. If you do not have more than one PostgreSQL server installed on the system or you have no idea about it, just leave it empty.
  5. Start the process following the instructions.
    NOTE: If you happen to get any errors in the data copying process, you might want to do that manually. To do it manually please follow the instructions below this section.
  6. After the process finishes, modify your config.php to point to the new UTF8 database.

Manually Copy your current database to the new UNICODE database

  1. Make a plain text dump of your current db.
    Run command:
    PGCLIENTENCODING='UNICODE' PGDATABASE={dbname} pg_dump -Fp -O -x -U {dbuser} > tempfile1
    Replace {dbname} and {dbuser} with your own values for your old non-unicode database. Supply --cluster, -h, -p to pg_dump if necessary.
  2. Strip out "COMMENT ON SCHEMA" statement, which would cause an error while importing with a non-root user.
    Run command:
    grep -v "COMMENT ON SCHEMA" < tempfile1 > tempfile2
  3. Strip out any non-unicode characters with iconv
    Run command:
    iconv -f UTF-8 -t UTF-8 -c < tempfile2 > tempfile3
  4. Import the data to your new unicode database
    Run command:
    PGDATABASE={dbname} psql -q -U {dbuser} -v ON_ERROR_STOP=1 < tempfile3
    Replace {dbname} and {dbuser} with your own values for your new unicode database. Supply --cluster, -h, -p to psql if necessary. The ON_ERROR_STOP=1 parameter will make psql stop immediately upon any errors.

Now you have your unicode database with all your old data. Run the migration script again to finish the migration. NOTE: Please do not think the UTF8 migration is done at this stage. The migration script in Moodle should be executed to finish the process. Also please do not change your config.php to the new database until Moodle tells you to. The script is clever enough not to copy the data again if you have already done so.

Extra notes about errors that could happen during the copy process

If you get errors during the import process above about your current user not having permission to do something, this section is here to help.

It is reported that if you have installed some procedure language on your postgres server, the import process will fail if you are not using the superuser account. This is because pg_dump dumps everything in your previous database including any language definitions. If you have enabled language support, your tempfile1 file will contain this info, so the database you created will probably contain it, too. To restore language definitions you need the superuser account.

To avoid this error, open the tempfile3 file in step 3 above using any text editor (note the file could be a few to several hundred MBs). From the top of the file, remove SQL statements like CREATE PROCEDURAL LANGUAGE, CREATE FUNCTION, ALTER FUNCTION, COMMENT ON SCHEMA. Do not remove any SET statements unless they become blockers. Search through the entire file for them. They usually exist at the top of the file before any CREATE TABLE statements.

After cleaning up the file, continue on to step 4 as described above. Should any errors occur again, hunt for them in tempfile3.