Note: You are currently viewing documentation for Moodle 3.8. Up-to-date documentation for the latest stable version of Moodle may be available here: MySQL full unicode support.

MySQL full unicode support: Difference between revisions

From MoodleDocs
m (Indicate version required for mysql_collation.php to support utf8mb4 collation options.)
No edit summary
 
(18 intermediate revisions by 10 users not shown)
Line 1: Line 1:
==MySQL full unicode support==
{{Environment}}
==UTF-8==


==UTF-8==
UTF-8 is a character encoding that most websites use. It encodes each of the 1,112,064 valid code points. To store all of this information, four bytes is required. The most popular values are in the three byte region. MySQL by default only uses a three byte encoding and so values in the four byte range (eg. Asian characters and Emojis) can not be stored. Any attempt to enter a text that contains four byte characters will result in a Moodle database error.
 
MySQL does provide full four byte UTF-8 support, but it requires certain database settings to be configured. From version 3.3 on Moodle uses full UTF-8 for both MySQL and MariaDB by default. Existing databases will still run with partial support, but it is recommended to move over to full support.


UTF-8 is a character encoding that most websites use. It encodes each of the 1,112,064 valid code points. To store all of this information, four bytes is required. The most popular values are in the three byte region. MySQL by default only uses a three byte encoding and so values in the four byte range can not be stored. Any record that contains a four byte character will not be saved.
Moodle comes with a Command Line Interface (CLI) script for converting to full UTF-8 for MySQL (and MariaDB). Before Moodle versions 3.1.5 and 3.2.2 this conversion tool would only change the Collation to some variant of 'utf8_bin'. 'utf8_unicode_ci' was the recommended Collation. We now recommend using 'utf8mb4_unicode_ci' which supports four byte characters (utf8_unicode_ci only supports three).


MySQL does support full UTF-8 support. It requires certain settings to be configured. From Moodle 3.3 the default will be to use full UTF-8 for MySQL and MariaDB. Existing databases will still run with partial support, but it is recommended to move over to full support.
This script will attempt to change the database Collation, Character set, default table settings and column definitions.


Moodle does come with a Command Line Interface (CLI) script for converting to full UTF-8 for MySQL (and MariaDB). Before Moodle 3.3 this conversion tool would only change the collation to some variant of 'utf8_bin'. 'utf8_unicode_ci' was the recommended collation. We now recommend 'utf8mb4_unicode_ci'. 'utf8mb4_unicode_ci' supports 4 byte characters (utf8_unicode_ci only supports 3 byte characters) so four byte characters such as Asian characters and emoji should now be fully supported.
To summarise:


This script will attempt to change the database collation, character set, and default table settings.
* Fresh installs of Moodle 3.1.5 and 3.2.2 onwards will use utf8mb4 by default, if the database server is configured appropriately (see below).
* Sites upgrading to Moodle 3.1.5 or 3.2.2 can use the script to update to utf8mb4. In Moodle 3.3, 3.4 and 3.5 a warning will show that the database isn't using full UTF-8 support and suggest moving to 'utf8mb4_unicode_ci', but you may choose to keep using 'utf8_*'.


===File format===
===File format===


To allow for large indexes on columns that are a varchar, a combination of settings needs to be set. The file format for the system needs to be using "Barracuda". This allows for the row format to be set to "Compressed" or "Dynamic". To enable this setting see the upgrade steps listed below.
To allow for large indexes on columns that are a varchar, a combination of settings needs to be set. The file format for the system needs to be using "Barracuda". This allows for the row format to be set to "Compressed" or "Dynamic". To enable this setting see the upgrade steps listed below. Moodle will not install if you have large format enabled without the Barracuda file format.


===File per table===
===File per table===
Line 26: Line 30:
==Steps to upgrade==
==Steps to upgrade==


* Most important. Please backup your database before making any changes or running the CLI script.
Most important: Please backup your database before making any changes or running the CLI script.
 
* Change configuration settings for MySQL (exactly the same for MariaDB). This step is optional. You can run the script and it will try and make these changes itself. If errors occur then try manually changing these settings as listed below.
* Change configuration settings for MySQL (exactly the same for MariaDB). This step is optional. You can run the script and it will try and make these changes itself. If errors occur then try manually changing these settings as listed below.
** On Linux based systems you will want to alter my.cnf. This may be located in '/etc/mysql/'.
** On Linux based systems you will want to alter my.cnf. This may be located in '/etc/mysql/'.
** Make the following alterations to my.cnf:
** Make the following alterations to my.cnf:
Important Change; InnoDB: The following InnoDB file format configuration options were deprecated in MySQL 5.7.7 and are now removed:
    innodb_file_format
    innodb_file_format_check
    innodb_file_format_max
    innodb_large_prefix
<code>
<code>
[client]
[client]
Line 39: Line 56:
innodb_large_prefix
innodb_large_prefix


character-set-client-handshake = FALSE
character-set-server = utf8mb4
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
collation-server = utf8mb4_unicode_ci
skip-character-set-client-handshake


[mysql]
[mysql]
default-character-set = utf8mb4
default-character-set = utf8mb4
</code>
</code>
* Restart your mysql server.
* Restart your MySQL server.
* Run the CLI script to convert to the new character set and collation (requires Moodle 3.1.5, 3.2.2 or newer): '''php admin/cli/mysql_collation.php --collation=utf8mb4_unicode_ci'''
* Run the CLI script to convert to the new Character set and Collation (requires Moodle 3.1.5, 3.2.2 or newer):  
* The upgrade is now complete.
<pre>
$ php admin/cli/mysql_collation.php --collation=utf8mb4_unicode_ci
</pre>
* Make sure to repair and optimize all databases and tables.
<pre>
mysqlcheck -u root -p --auto-repair --optimize --all-databases
</pre>
NOTE: On very large sites this may take a long time to run. You should probably establish how long on a test install before taking your live site offline. In some cases you might consider dumping and re-importing your data.
 
* Adjust the $CFG->dboptions Array in your '''config.php''' to make sure that Moodle uses the right Collation when connecting to the MySQL Server:
<pre>
$CFG->dboptions = array(
  &hellip;
  'dbcollation' => 'utf8mb4_unicode_ci',
  &hellip;
);
</pre>
 
If you only have access to the database command line (or something like phpmyadmin) you can try the following sql commands:
<code>
SET GLOBAL innodb_file_format = barracuda
 
SET GLOBAL innodb_file_per_table = 1
 
SET GLOBAL innodb_large_prefix = 'on'
</code>
 
* Try adding some Emojis (e.g. 😂💩) to your Moodle site to verify that the upgrade was successful.
 
[[Category:Environment|UTF-8]]
[[Category:UTF-8]]
 
 
[[es:MySQL soporte unicode completo]]
[[fr:Support unicode complet pour MySQL]]
[[de:MySQL Unicode Unterstützung]]

Latest revision as of 21:31, 3 May 2020

UTF-8

UTF-8 is a character encoding that most websites use. It encodes each of the 1,112,064 valid code points. To store all of this information, four bytes is required. The most popular values are in the three byte region. MySQL by default only uses a three byte encoding and so values in the four byte range (eg. Asian characters and Emojis) can not be stored. Any attempt to enter a text that contains four byte characters will result in a Moodle database error.

MySQL does provide full four byte UTF-8 support, but it requires certain database settings to be configured. From version 3.3 on Moodle uses full UTF-8 for both MySQL and MariaDB by default. Existing databases will still run with partial support, but it is recommended to move over to full support.

Moodle comes with a Command Line Interface (CLI) script for converting to full UTF-8 for MySQL (and MariaDB). Before Moodle versions 3.1.5 and 3.2.2 this conversion tool would only change the Collation to some variant of 'utf8_bin'. 'utf8_unicode_ci' was the recommended Collation. We now recommend using 'utf8mb4_unicode_ci' which supports four byte characters (utf8_unicode_ci only supports three).

This script will attempt to change the database Collation, Character set, default table settings and column definitions.

To summarise:

  • Fresh installs of Moodle 3.1.5 and 3.2.2 onwards will use utf8mb4 by default, if the database server is configured appropriately (see below).
  • Sites upgrading to Moodle 3.1.5 or 3.2.2 can use the script to update to utf8mb4. In Moodle 3.3, 3.4 and 3.5 a warning will show that the database isn't using full UTF-8 support and suggest moving to 'utf8mb4_unicode_ci', but you may choose to keep using 'utf8_*'.

File format

To allow for large indexes on columns that are a varchar, a combination of settings needs to be set. The file format for the system needs to be using "Barracuda". This allows for the row format to be set to "Compressed" or "Dynamic". To enable this setting see the upgrade steps listed below. Moodle will not install if you have large format enabled without the Barracuda file format.

File per table

To enable this setting see the upgrade steps listed below.

Large prefix

This in conjunction with the row format being either "Compressed" or "Dynamic" allows for large varchar indexes above 191 characters. To enable this setting see the upgrade steps listed below.

Steps to upgrade

Most important: Please backup your database before making any changes or running the CLI script.

  • Change configuration settings for MySQL (exactly the same for MariaDB). This step is optional. You can run the script and it will try and make these changes itself. If errors occur then try manually changing these settings as listed below.
    • On Linux based systems you will want to alter my.cnf. This may be located in '/etc/mysql/'.
    • Make the following alterations to my.cnf:

Important Change; InnoDB: The following InnoDB file format configuration options were deprecated in MySQL 5.7.7 and are now removed:

   innodb_file_format
   innodb_file_format_check
   innodb_file_format_max
   innodb_large_prefix


[client] default-character-set = utf8mb4

[mysqld] innodb_file_format = Barracuda innodb_file_per_table = 1 innodb_large_prefix

character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci skip-character-set-client-handshake

[mysql] default-character-set = utf8mb4

  • Restart your MySQL server.
  • Run the CLI script to convert to the new Character set and Collation (requires Moodle 3.1.5, 3.2.2 or newer):
$ php admin/cli/mysql_collation.php --collation=utf8mb4_unicode_ci
  • Make sure to repair and optimize all databases and tables.
mysqlcheck -u root -p --auto-repair --optimize --all-databases

NOTE: On very large sites this may take a long time to run. You should probably establish how long on a test install before taking your live site offline. In some cases you might consider dumping and re-importing your data.

  • Adjust the $CFG->dboptions Array in your config.php to make sure that Moodle uses the right Collation when connecting to the MySQL Server:
$CFG->dboptions = array(
  …
  'dbcollation' => 'utf8mb4_unicode_ci',
  …
);

If you only have access to the database command line (or something like phpmyadmin) you can try the following sql commands: SET GLOBAL innodb_file_format = barracuda

SET GLOBAL innodb_file_per_table = 1

SET GLOBAL innodb_large_prefix = 'on'

  • Try adding some Emojis (e.g. 😂💩) to your Moodle site to verify that the upgrade was successful.