Note: You are currently viewing documentation for Moodle 2.4. Up-to-date documentation for the latest stable version of Moodle may be available here: Converting your MySQL database to UTF8.

Converting your MySQL database to UTF8: Difference between revisions

From MoodleDocs
(Steps for Windows)
 
(12 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Installing Moodle}}
This document describes how to convert your MySQL database from the latin1 charset to UTF8. Moodle requires that your Database is now UTF8 and will not upgrade if your database is not.
This document describes how to convert your MySQL database from the latin1 charset to UTF8. Moodle requires that your Database is now UTF8 and will not upgrade if your database is not.


Line 20: Line 21:


To make mysql default to utf8 you can edit /etc/my.cnf as follows.
To make mysql default to utf8 you can edit /etc/my.cnf as follows.


(This was good for ubuntu server lucid 10.04  2.6.32-24-server Jan 2011)
(This was good for ubuntu server lucid 10.04  2.6.32-24-server Jan 2011)
Line 41: Line 43:


default-character-set=utf8
default-character-set=utf8
default-collation=utf8_unicode_ci


character-set-server=utf8
character-set-server=utf8
Line 51: Line 55:
Having made your default character set utf, a mysqldump restore of your database with the  
Having made your default character set utf, a mysqldump restore of your database with the  
--skip-character-set parameter, will restore the database with your new default character set of utf8.
--skip-character-set parameter, will restore the database with your new default character set of utf8.
utf8_unicode and utf8_general are not the same but similar. The difference is described in the unicode documentation page. (Link below)


==Converting an empty database==
==Converting an empty database==
Line 81: Line 87:


====Explained====
====Explained====
The following steps will guide you in creating a database dumb, editing the database dump so that the correct charset and collation are used and then restoring the new database.
The following steps will guide you in creating a database dump, editing the database dump so that the correct charset and collation are used and then restoring the new database.


To start please open a new terminal and move to a temp directory.
To start please open a new terminal and move to a temp directory.
Line 103: Line 109:


When you run this command a database dump will be generated into '''dump.sql'''
When you run this command a database dump will be generated into '''dump.sql'''
Next step is to copy dump.sql to dump-fixed.sql achieved by


<code bash>
<code bash>
cp dump.sql dump-fixed.sql
cp dump.sql dump-fixed.sql
</code>
</code>
Next step is to copy dump.sql to dump-fixed.sql.
 


We will make the desired changes within dump-fixed.sql and we will keep dump.sql as it is as a backup just in case.
We will make the desired changes within dump-fixed.sql and we will keep dump.sql as it is as a backup just in case.
Line 133: Line 141:


===Windows===
===Windows===
Could someone who is familiar with Windows please convert the above into something that will work on Windows?
The following steps will let you create a database dump, edit it so that the correct charset (utf8) and collation (utf8_unicode_ci) are used, and then restore the new database. For this, you'll first have to download [http://sed.sourceforge.net/grabbag/ssed/sed-3.62.zip Super Sed (Win32 executable, zipped)].
There are likely several additional arguments for mysqldump you will need to specify including setting the file for output using -r to avoid newline issues.
 
Start by opening a command window and move to a temporary folder.
 
[your mysql dir]\bin\mysqldump -u [username] -p[password] -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B [dbname] -r dbdump.sql
ssed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci/" dbdump.sql | ssed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/" >dbdump_w.sql
[your mysql dir]\bin\mysql -u [username] -p[password] [dbname] < dbdump_w.sql
 
====mysqldump parameters====
 
; username : Username used to access your database.
; password : Password for the above user.
; -c (--complete-insert) : Creates INSERT statements that include column names.
; -e (--extended-insert) : Uses a multiple-row INSERT syntax that includes several VALUES lists; this results in a smaller dump file and speeds up inserts when the file is reloaded.
; --default-character-set=utf8 : Use and set '''utf8''' as the default character set.
; --single-transaction : This option is useful only when using transactional tables such as InnoDB, as it helps minimize the locking time duration.
; --skip-set-charset : Suppress the SET NAMES statement (not needed because it is going to be changed).
; --add-drop-database : Adds a DROP DATABASE statement before each CREATE DATABASE statement (to restore over the top of our existing database).
; --add-drop-table : Adds a DROP TABLE statement before each CREATE TABLE statement.
; -B (--databases) : Defines the name of the database to convert.
; -r (--result-file) : This option should be used on Windows to prevent newline “\n” characters from being converted to “\r\n” carriage return/newline sequences. The result file is created and its previous contents overwritten, even if an error occurs while generating the dump.
 
===Special chars won't import correctly===
Under certain circumstances, when restoring UTF8 encoded mysql dump, international special chars (you are able to correctly see using, say, vim editor), does not appears to be imported correctly. In such cases you may want to try the following under a *nix system: 1.login into mysql, 2.create a db with utf8 encoding and 3. import your dump using 'source':
<code bash>
# cd /folder_where_your_dump_is/
# mysql -u your_user -p
> create database yourdb charset=utf8;
> use yourdb;
> SET NAMES 'utf8';
> source db_dump.sql
> quit;
</code>
 
That should do the trick.


==More information==
==More information==

Latest revision as of 09:53, 2 March 2012

This document describes how to convert your MySQL database from the latin1 charset to UTF8. Moodle requires that your Database is now UTF8 and will not upgrade if your database is not.

For more information about UTF8 have a look at the doc on unicode.

Why?

You may see the following error when upgrading your Moodle.

It is required that you store all your data in Unicode format (UTF-8). New installations must be performed into databases that have their default character set as Unicode. If you are upgrading, you should perform the UTF-8 migration process (see the Admin page).

Default Mysql character set

Moodle requires UTF8 in order to provide better multilingual support and has done since Moodle 1.8. However the UTF8 check during install and upgrade was only been implemented in Moodle 2.0 and you may find you are unable to upgrade because your database was not set up originally as utf8 when you first installed Mysql or because you have been running Moodle since before 1.8 and haven't previously converted your database. It is perhaps worth noting that Mysql is nothing to do specifically with Moodle. It is a database engine that is very widely used in open source projects and it contains details of all the stuff in your Moodle such as usernames etc and pointers to all the files that have been uploaded to it.

You need to do two things. 1) Change your mysql to have utf8 as its character set and 2) Change your database to utf8.

The descriptions elsewhere in this section cover making the utf8 database versions using mysqldump.

To make mysql default to utf8 you can edit /etc/my.cnf as follows.


(This was good for ubuntu server lucid 10.04 2.6.32-24-server Jan 2011)

In the client section of my.cnf

[client] ... .... ...

default-character-set=utf8 ....


and further down in my.cnf

[mysqld] ... ...

default-character-set=utf8

default-collation=utf8_unicode_ci

character-set-server=utf8

collation-server=utf8_unicode_ci

... ...

Having made your default character set utf, a mysqldump restore of your database with the --skip-character-set parameter, will restore the database with your new default character set of utf8.

utf8_unicode and utf8_general are not the same but similar. The difference is described in the unicode documentation page. (Link below)

Converting an empty database

If you have created your database schema and are receiving the error during your initial installation your Moodle database will still be empty. You can simply run the below query in your database to resolve the issue. alter database mydatabasename charset=utf8;

Converting a database containing tables

If you have previously installed Moodle and are now getting the error the following process will allow you to convert your database.

Linux & Mac

mysqldump -uusername -ppassword -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql cp dump.sql dump-fixed.sql vim dump-fixed.sql

%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/
%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/
wq

mysql -uusername -ppassword < dump-fixed.sql

or alternatively using sed:

  1. $1-dbusername $2-password $3-dbname

mysqldump -u$1 -p$2 -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B $3 > dump.sql sed 's/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/' <dump.sql | sed 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/' >dump-fixed.sql mysql -u$1 -p$2 < dump-fixed.sql

Explained

The following steps will guide you in creating a database dump, editing the database dump so that the correct charset and collation are used and then restoring the new database.

To start please open a new terminal and move to a temp directory.

mysqldump -uusername -ppassword -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql

The first step is of course to dump out the database and of course we will use mysqldump for this. We do however need to set several arguments in order to clean up the charsets and provide a dump that is not going to cause you any problems if you are moving this database to a different database server or find yourself having to restore on a reverted system.

username
The username to access your database.
password
The password for the above user.
-c
Complete inserts for better compatibility.
-e
Extended inserts for better performance.
--default-character-set=utf8
To set the default character set.
--single-transaction
To reduce our workload if anything goes wrong.
--skip-set-charset
Obviously not wanted or needed as we are changing it anyway.
--add-drop-database
Required so we can restore over the top of our existing database.
-B
We use this option so that our dump will contain drop table and create table syntax (which we will change the syntax for).
dbname
The name of the database to convert.

When you run this command a database dump will be generated into dump.sql

Next step is to copy dump.sql to dump-fixed.sql achieved by

cp dump.sql dump-fixed.sql


We will make the desired changes within dump-fixed.sql and we will keep dump.sql as it is as a backup just in case.

vim dump-fixed.sql

%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/
%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/
wq

Now we need to edit the dump and correct the incorrect charsets that have been used. I have chosen to do this with VIM however you can use any search+replace editor or program. ( I choose VIM for this only because every linux user is/should be familiar with it).

First we open the file using VIM, and then run the three commands.

The first command replaces all instances of DEFAULT CHARACTER SET latin1 with DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci. This is used to fix up the database's default charset and collation.

The second command replaces all instances of DEFAULT CHARSET=latin1 with DEFAULT CHARSET=utf8. This converts all tables from using latin1 to using UTF8.

The third command simply saves it and exits.

mysql -uusername -ppassword < dump-fixed.sql Now that we've made the required changes we simply need to restore the database over top of the existing database. We can do this by running the above command.

Windows

The following steps will let you create a database dump, edit it so that the correct charset (utf8) and collation (utf8_unicode_ci) are used, and then restore the new database. For this, you'll first have to download Super Sed (Win32 executable, zipped).

Start by opening a command window and move to a temporary folder.

[your mysql dir]\bin\mysqldump -u [username] -p[password] -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B [dbname] -r dbdump.sql
ssed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci/" dbdump.sql | ssed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/" >dbdump_w.sql
[your mysql dir]\bin\mysql -u [username] -p[password] [dbname] < dbdump_w.sql

mysqldump parameters

username
Username used to access your database.
password
Password for the above user.
-c (--complete-insert)
Creates INSERT statements that include column names.
-e (--extended-insert)
Uses a multiple-row INSERT syntax that includes several VALUES lists; this results in a smaller dump file and speeds up inserts when the file is reloaded.
--default-character-set=utf8
Use and set utf8 as the default character set.
--single-transaction
This option is useful only when using transactional tables such as InnoDB, as it helps minimize the locking time duration.
--skip-set-charset
Suppress the SET NAMES statement (not needed because it is going to be changed).
--add-drop-database
Adds a DROP DATABASE statement before each CREATE DATABASE statement (to restore over the top of our existing database).
--add-drop-table
Adds a DROP TABLE statement before each CREATE TABLE statement.
-B (--databases)
Defines the name of the database to convert.
-r (--result-file)
This option should be used on Windows to prevent newline “\n” characters from being converted to “\r\n” carriage return/newline sequences. The result file is created and its previous contents overwritten, even if an error occurs while generating the dump.

Special chars won't import correctly

Under certain circumstances, when restoring UTF8 encoded mysql dump, international special chars (you are able to correctly see using, say, vim editor), does not appears to be imported correctly. In such cases you may want to try the following under a *nix system: 1.login into mysql, 2.create a db with utf8 encoding and 3. import your dump using 'source':

  1. cd /folder_where_your_dump_is/
  2. mysql -u your_user -p

> create database yourdb charset=utf8; > use yourdb; > SET NAMES 'utf8'; > source db_dump.sql > quit;

That should do the trick.

More information