Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Reference: Difference between revisions

From MoodleDocs
(Note about intent to not migrate this page to moodledev.io)
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Template:WillNotMigrate}}
==This is the start of a few pages of largely version independent reference==
==This is the start of a few pages of largely version independent reference==


Line 15: Line 16:
=== Byte, character, character set, character encoding and collation ===
=== Byte, character, character set, character encoding and collation ===


A ''byte'' (or ''octet'') is the smallest unit of storage that can be allocated and, in almost all modern computers, it consists of 8 bits, where a bit can only have two values: 0 or 1.
A <span style="color:#800020;">byte</span> (or <span style="color:#800020;">octet</span>) is the smallest unit of storage that can be allocated and, in almost all modern computers, it consists of 8 bits, where a bit can only have two values: 0 or 1.


A ''character'' is a symbol used in a writing system, such as a letter of the alphabet.
A <span style="color:#800020;">character</span> is a symbol used in a writing system, such as a letter of the alphabet.


A ''character set'', or a ''coded character set'', refers to the set of characters and numbers, or ''code points'', used to represent them. A code point can be understood as the position the character occupies in the set.
A <span style="color:#800020;">character set</span>, or a <span style="color:#800020;">coded character set</span>, refers to the set of characters and numbers, or <span style="color:#800020;">code points</span>, used to represent them. A code point can be understood as the position the character occupies in the set.


A ''character encoding'', or a ''character encoding form'', refers to how the code points are converted (encoded) into ''code values'' to be saved or stored in a computer.
A <span style="color:#800020;">character encoding</span>, or a <span style="color:#800020;">character encoding form</span>, refers to how the code points are converted (encoded) into <span style="color:#800020;">code values</span> to be saved or stored in a computer.


''Unicode'' is a character set developed to consistently encode, represent and handle text in most of the world's writing systems in use today. Unicode can be implemented by different character encodings, where the two most commonly used are UTF-8 and UTF-16:
<span style="color:#800020;">Unicode</span> is a character set developed to consistently encode, represent and handle text in most of the world's writing systems in use today. Unicode can be implemented by different character encodings, where the two most commonly used are UTF-8 and UTF-16:


* '''UTF-8''' is a variable-width character encoding that can represent every character in the Unicode character set. UTF-8 encodes each of the 1,112,064 code points in the Unicode character set using one to four bytes.
* '''UTF-8''' is a variable-width character encoding that can represent every character in the Unicode character set. UTF-8 encodes each of the 1,112,064 code points in the Unicode character set using one to four bytes.
:'''Note'''. MySQL utf8 Unicode character set uses a maximum of three bytes per multi-byte character (http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8.html).
:* '''Note'''. MySQL utf8 Unicode character set uses a maximum of three bytes per multi-byte character (http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8.html).
* '''UTF-16''' is a variable-length character encoding for Unicode that uses one or two 16-bit code units, where each unit takes two 8-bit bytes, and the order of the bytes may depend on the byte order (''endianness'') of the computer architecture. To assist in recognizing the byte order of code units, UTF-16 allows a ''Byte Order Mark'' (BOM), a code unit with the hexadecimal value U+FEFF, to precede the first actual coded value.
* '''UTF-16''' is a variable-length character encoding for Unicode that uses one or two 16-bit code units, where each unit takes two 8-bit bytes, and the order of the bytes may depend on the byte order (<span style="color:#800020;">endianness</span>) of the computer architecture. To assist in recognizing the byte order of code units, UTF-16 allows a <span style="color:#800020;">Byte Order Mark</span> (BOM), a code unit with the hexadecimal value U+FEFF, to precede the first actual coded value.


Eventhough UTF-8 is the preferred character encoding nowadays, many others have been used, for example:
Eventhough UTF-8 is the preferred character encoding nowadays, many others have been used, for example:


* '''ISO-8859-1''' (or ''Latin-1''). A character encoding for the Latin script, where each character is encoded as a single 8-bit code value. This character-encoding scheme consists of 191 characters and it is used throughout The Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.
* '''ISO-8859-1''' (or <span style="color:#800020;">Latin-1</span>). A character encoding for the Latin script, where each character is encoded as a single 8-bit code value. This character-encoding scheme consists of 191 characters and it is used throughout The Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.
* '''ISO-8859-15''' (or ''Latin-9''). A character encoding similar to Latin-1, except that it substituted some rarely used characters with the euro sign and a few other letters.
* '''ISO-8859-15''' (or <span style="color:#800020;">Latin-9</span>). A character encoding similar to Latin-1, except that it substituted some rarely used characters with the euro sign and a few other letters.
* '''Shift JIS'''. A character encoding for the Japanese language.
* '''Shift JIS'''. A character encoding for the Japanese language.


To illustrate the concepts given above, the € (euro) character has a code point equal to 8364 (or U+20AC, in hexadecimal notation) in the Unicode character set. This code point can be stored in different ways, depending on the character encoding used:
To illustrate the concepts given above, the € (euro) character has a code point equal to 8364 (or U+20AC, in hexadecimal notation) in the Unicode character set. This code point can be stored in different ways, depending on the character encoding used:


{| class="nicetable"
{| class="wikitable"
! Character encoding
! Character encoding
! Code value
! Code value (hex)
|-
|-
| '''ISO-8859-15''' (hexadecimal)
| '''ISO-8859-15'''
| A4
| A4
|-
|-
| '''UTF-8''' (hex)
| '''UTF-8'''
| E2 82 AC
| E2 82 AC
|-
|-
| '''UTF-16''' (hex)
| '''UTF-16'''
| 20AC
| 20AC
|}
|}


Finally, a ''Collation'' determines how data is sorted and how strings are compared to each other.
Finally, a <span style="color:#800020;">Collation</span> determines how data is sorted and how strings are compared to each other.


=== Encoding and decoding ===
=== Encoding and decoding ===


One of the most important things one must keep in mind when working with databases (or files, for that matter) is to be aware of how characters (letters, numbers and non-alphanumeric characters) are actually being saved or encoded and how bytes (represented with hexadecimal values) are actually being interpreted or decoded.
One of the most important things one must keep in mind when working with databases (or files, for that matter) is to be aware of how characters (letters, numbers and non-alphanumeric characters) are actually being saved or encoded and how bytes (represented with hexadecimal values) are actually being interpreted or decoded.
'''Note'''. In a database, character encoding only applies to text type columns (e.g. CHAR, VARCHAR, TEXT).


For example, it is not enough to say that character “é” (small letter e with acute) has been saved in a text file, one also has to be aware of how was it encoded. Why? Because if it was encoded as Latin-1, the file will have one byte of hex value E9, but if it was encoded as UTF-8, the file would then have two bytes of hex value C3 and A9:
For example, it is not enough to say that character “é” (small letter e with acute) has been saved in a text file, one also has to be aware of how was it encoded. Why? Because if it was encoded as Latin-1, the file will have one byte of hex value E9, but if it was encoded as UTF-8, the file would then have two bytes of hex value C3 and A9:


{| class="nicetable"
{| class="wikitable"
! Character
! Character
! Latin-1 hex value
! Latin-1 hex value
Line 71: Line 74:
In the same way, it is not enough to say that an UTF-8 encoded character “é” is going to be read from a text file, because eventhough we might know that two bytes of hex value “C3 A9” will be read, if they are decoded (interpreted) as Latin-1, we will get characters é, but if they are decoded as UTF-8, we would then get character “é”:
In the same way, it is not enough to say that an UTF-8 encoded character “é” is going to be read from a text file, because eventhough we might know that two bytes of hex value “C3 A9” will be read, if they are decoded (interpreted) as Latin-1, we will get characters é, but if they are decoded as UTF-8, we would then get character “é”:


{| class="nicetable"
{| class="wikitable"
! Hex value
! Hex value
! Latin-1 characters
! Latin-1 characters
Line 81: Line 84:
|}
|}


Being aware of these two concepts will become relevant when trying to understand how data is encoded and decoded through the many stages (source, client, server) in order to be saved or restored.
Being aware of these two concepts will become relevant when trying to understand how data is encoded and decoded through the many database stages (source, client, server) in order to be saved or restored.


== MySQL ==
== MySQL ==
Line 89: Line 92:
Until version 4.1, MySQL tables used the latin1 character set by default. From version 4.1, not only the concepts of "character set" and "collation" were introduced, but the character set by default (at the configuration file) was changed to utf8.
Until version 4.1, MySQL tables used the latin1 character set by default. From version 4.1, not only the concepts of "character set" and "collation" were introduced, but the character set by default (at the configuration file) was changed to utf8.


In MySQL, a ''character set'' is a set of symbols and encodings, and a ''collation'' is a set of rules for comparing characters in a character set.
In MySQL, a <span style="color:#800020;">character set</span> is a set of symbols and encodings, and a <span style="color:#800020;">collation</span> is a set of rules for comparing characters in a character set.


'''Note'''. Eventhough in MySQL the term "character set" is used, actually it should be "character encoding" ("charset").
'''Note'''. Eventhough in MySQL the term <span style="color:#800020;">character set</span> is used, it should actually be <span style="color:#800020;">character encoding</span> (<span style="color:#800020;">charset</span>).


The MySQL server supports multiple character sets, like ascii, koi8r, latin1, sjis and utf8. Each character set has at least one collation, however, as it may have many more, a default one is always defined for each character set.
The MySQL server supports multiple character sets, like ascii, koi8r, latin1, sjis and utf8. Each character set has at least one collation, however, as it may have many more, a default one is always defined for each character set.


{| class="nicetable"
{| class="wikitable"
! Character set
! Character set
! Default collation
! Default collation
Line 121: Line 124:
|}
|}


Collation names follow a convention: they start with the name of the character set with which they are associated, they usually include a language name, and they end with '''_ci''' (case insensitive), '''_cs''' (case sensitive), or '''_bin''' (binary). From this, is should be evident that two different character sets cannot have the same collation.
Collation names follow a convention: they start with the name of the character set with which they are associated, they usually include a language name, and they end with <tt style="color:#0000b0;">_ci</tt> (case insensitive), <tt style="color:#0000b0;">_cs</tt> (case sensitive), or <tt style="color:#0000b0;">_bin</tt> (binary). From this, is should be evident that two different character sets cannot have the same collation.


=== Character set variables ===
=== Character set variables ===
Line 131: Line 134:
The following three determine the character encoding used to transform data as it flows from the client to the server and viceversa.
The following three determine the character encoding used to transform data as it flows from the client to the server and viceversa.


* <tt>character_set_client</tt>. This variable defines the character set used by the client and in which it sends statements to the server.
* <tt style="color:#0000b0;">character_set_client</tt>. This variable defines the character set used by the client and in which it sends statements to the server.
* <tt>character_set_connection</tt>. This variable defines the character set in which the server translates statements received from the client.
* <tt style="color:#0000b0;">character_set_connection</tt>. This variable defines the character set in which the server translates statements received from the client.
* <tt>character_set_results</tt>. This variable defines the character set in which the server returns query results to the client.
* <tt style="color:#0000b0;">character_set_results</tt>. This variable defines the character set in which the server returns query results to the client.


'''Note'''. For the <tt>character_set_connection</tt> variable there is also a related collation type variable: <tt>collation_connection</tt>.
'''Note'''. For the <tt style="color:#0000b0;">character_set_connection</tt> variable there is also a related collation type variable: <tt style="color:#0000b0;">collation_connection</tt>.


==== As default values ====
==== As default values ====
Line 141: Line 144:
The following two are used to determine the character encoding to be used when a database or a table is created, respectively.
The following two are used to determine the character encoding to be used when a database or a table is created, respectively.


* <tt>character_set_server</tt>. This variable defines the character set of a new schema when the database charset is not specified in the CREATE DATABASE statement.
* <tt style="color:#0000b0;">character_set_server</tt>. This variable defines the character set of a new schema when the database charset is not specified in the CREATE DATABASE statement.
* <tt>character_set_database</tt>. This variable defines the character set of a new table when the table charset is not specified in the CREATE TABLE statement.
* <tt style="color:#0000b0;">character_set_database</tt>. This variable defines the character set of a new table when the table charset is not specified in the CREATE TABLE statement.


'''Note'''. For these two variables there are also two related collation type variables: <tt>collation_server</tt> and <tt>collation_database</tt>, respectively.
'''Note'''. For these two variables there are also two related collation type variables: <tt style="color:#0000b0;">collation_server</tt> and <tt style="color:#0000b0;">collation_database</tt>, respectively.


==== Other uses ====
==== Other uses ====
Line 150: Line 153:
The following two have other uses.
The following two have other uses.


* <tt>character_set_filesystem</tt>. This variable defines the character set used by the file system (names of directories or names of files).
* <tt style="color:#0000b0;">character_set_filesystem</tt>. This variable defines the character set used by the file system (names of directories or names of files).
* <tt>character_set_system</tt>. This variable defines the character set used to save metadata (data about data, for example, names of tables).
* <tt style="color:#0000b0;">character_set_system</tt>. This variable defines the character set used to save metadata (data about data, for example, names of tables).


=== Default settings ===
=== Default settings ===
Line 158: Line 161:


; Server
; Server
: The server charset and collation are used as default values for schema definitions when the database charset and collation are not specified in <code>CREATE DATABASE</code> statements.
: The server charset and collation are used as default values for schema definitions when the database charset and collation are not specified in CREATE DATABASE statements.
: The server character set and collation can be determined from the values of the <tt>character_set_server</tt> and <tt>collation_server</tt> system variables.
: The server character set and collation can be determined from the values of the <tt style="color:#0000b0;">character_set_server</tt> and <tt style="color:#0000b0;">collation_server</tt> system variables.


; Database
; Database
: The database charset and collation are used as default values for table definitions when the table charset and collation are not specified in <code>CREATE TABLE</code> statements.
: The database charset and collation are used as default values for table definitions when the table charset and collation are not specified in CREATE TABLE statements.
: The character set and collation for the default database can be determined from the values of the <tt>character_set_database</tt> and <tt>collation_database</tt> system variables.
: The character set and collation for the default database can be determined from the values of the <tt style="color:#0000b0;">character_set_database</tt> and <tt style="color:#0000b0;">collation_database</tt> system variables.


; Table
; Table
: The table charset and collation are used as default values for column definitions when the column charset and collation are not specified in individual column definitions.
: The table charset and collation are used as default values for column definitions when the column charset and collation are not specified in individual column definitions.
: The table character set and collation are MySQL extensions; there do not exist in standard SQL.
: The table character set and collation are MySQL extensions; therefore they do not exist in standard SQL.


; Column
; Column
: A character set and a collation may be specified for every "character" column (that is, columns of type CHAR, VARCHAR, or TEXT).
: A character set and a collation may be specified for every character type column (that is, columns of type CHAR, VARCHAR, or TEXT).
: The column character set and collate clauses are standard SQL.
: The column character set and collate clauses are standard SQL.


Line 177: Line 180:
==== MySQL settings ====
==== MySQL settings ====


All MySQL settings, both for the server and the client components, are defined under one configuration or option file: <tt>my.ini</tt> (Windows) or <tt>my.cnf</tt> (Linux), although many copies of this file might be read at startup.
All MySQL settings, both for the server and the client components, are defined under one configuration or option file: <tt style="color:#0000b0;">my.ini</tt> (Windows) or <tt style="color:#0000b0;">my.cnf</tt> (Linux), although many copies of this file might be read at startup.


Settings in the configuration file are divided in option groups, where each one contains the settings for: the server, all clients and each particular client:
Settings in the configuration file are divided in option groups, where each one contains the settings for: the server, all clients and each particular client:


{| class="nicetable"
{| class="wikitable"
! Settings for
! Settings for
! Option group
! Option group
Line 197: Line 200:
==== Database ====
==== Database ====


For each database created, its character set and collation values are saved in a text file named <tt>db.opt</tt>, located in the corresponding database folder (''dbname'') under the ''datadir'' directory.
For each database created, its character set and collation values are saved in a text file named <tt style="color:#0000b0;">db.opt</tt>, located in the corresponding database folder (''db_name'') under the ''datadir'' directory.


  more C:\MySqlServer5.5\data\''dbname''\db.opt
  more C:\MySqlServer5.5\data\''db_name''\db.opt


Outputs something like:
Outputs something like:
Line 206: Line 209:
  default-collation=latin1_swedish_ci
  default-collation=latin1_swedish_ci


When a database is first created, if the character set and collation options are not defined in the <code>CREATE DATABASE</code> statement, then they are taken from the server charset and collation values. These two values can be modified via the <code>ALTER DATABASE</code> statement.
When a database is first created, if the character set and collation options are not defined in the CREATE DATABASE statement, then they are taken from the server charset and collation values. These two values can be modified via the ALTER DATABASE statement.


=== Gathering information (1) ===
=== Gathering information (1) ===
Line 216: Line 219:
==== Server and client ====
==== Server and client ====


Running a particular program with the parameters <tt>--verbose --help</tt>:
Running a particular program with the parameters <tt style="color:#0000b0;">--verbose --help</tt>:


  mysqld --verbose --help
  mysqld --verbose --help
Line 233: Line 236:
Default options are read from the following files in the given order:
Default options are read from the following files in the given order:


{| class="nicetable"
{| class="wikitable"
! OS
! OS
! The following groups are read
! The following groups are read
Line 250: Line 253:
The option groups (in the configuration file) read by each program; for example:
The option groups (in the configuration file) read by each program; for example:


{| class="nicetable"
{| class="wikitable"
! Program
! Program
! The following groups are read
! The following groups are read
Line 266: Line 269:
===== Allowed options (variables) =====
===== Allowed options (variables) =====


The variables accepted (in the command line) by the program, with their abreviations and descriptions; for example, the following are some of those listed for the <tt>mysql</tt> client program:
The variables accepted (in the command line) by the program, with their abreviations and descriptions; for example, the following are some of those listed for the <tt style="color:#0000b0;">mysql</tt> client program:


{| class="nicetable"
{| class="wikitable"
! Variable (option)
! Variable (option)
! Description
! Description
Line 281: Line 284:
===== Variables and their values =====
===== Variables and their values =====


The default values of variables that can be set from the command line for each program; for example, the following are some of those listed by the <tt>mysqld</tt> server program:
The default values of variables that can be set from the command line for each program; for example, the following are some of those listed by the <tt style="color:#0000b0;">mysqld</tt> server program:


{| class="nicetable"
{| class="wikitable"
! Variable
! Variable
! Default value
! Default value
Line 306: Line 309:
|}
|}


 
'''* Important note'''. The <tt style="color:#0000b0;">mysqld</tt> program lists the values of the variables that the server (based on its compiled-in defaults) will use; however, some of those values might change depending on the command line parameters given. For example, if the <tt style="color:#0000b0;">my.ini</tt> (or <tt style="color:#0000b0;">my.cnf</tt>) configuration file has the following lines:
'''* Important note'''. The <tt>mysqld</tt> program lists the values of the variables that the server (based on its compiled-in defaults) will use; however, some of those values might change depending on the command line parameters given. For example, if the <tt>my.ini</tt> configuration file has the following lines:


  [mysqld]
  [mysqld]
  character-set-server=utf8
  character-set-server=utf8


Then we will get a different value for variable <tt>character-set-server</tt> if we use the <tt>no-defaults</tt> parameter:
Then we will get a different value for variable <tt style="color:#0000b0;">character-set-server</tt> if we use the <tt style="color:#0000b0;">no-defaults</tt> parameter:


{| class="nicetable"
{| class="wikitable"
! Command
! Command
! Action
! Action
! Value of ''character-set-server''
! Value of ''character-set-server''
|-
|-
| <tt>mysqld --no-defaults --verbose --help</tt>
| <tt style="color:#0000b0;">mysqld --no-defaults --verbose --help</tt>
| Ignore the settings in any option files.
| Ignore the settings in any option files.
| <tt>latin1</tt>
| <tt style="color:#0000b0;">latin1</tt>
|-
|-
| <tt>mysqld --verbose --help</tt>
| <tt style="color:#0000b0;">mysqld --verbose --help</tt>
| Also include the settings of any option files that it reads.
| Include the settings of any option files that it reads.
| <tt>utf8</tt>
| <tt style="color:#0000b0;">utf8</tt>
|}
|}


==== System values used by a running server ====
==== System values used by a running server ====


While the <tt>--help --verbose</tt> options of the MySQL server program (<tt>mysqld</tt>) show command line options and their default values, there are also other variables that can only be seen after the server has been started and that provide information about its operation.
While the <tt style="color:#0000b0;">--help --verbose</tt> options of the MySQL server program (<tt style="color:#0000b0;">mysqld</tt>) show command line options and their default values, there are also other variables that can only be seen after the server has been started and that provide information about its operation.


To see what system values is using a running server, the <tt>mysqladmin</tt> program can be used with the <tt>variables</tt> option:
To see what system values is using a running server, the <tt style="color:#0000b0;">mysqladmin</tt> program can be used with the <tt style="color:#0000b0;">variables</tt> option:


  mysqladmin [-u root -p''your_password''] variables
  mysqladmin [-u root -p''your_password''] variables
Line 338: Line 340:
The following are some of the variables and values listed:
The following are some of the variables and values listed:


{| class="nicetable"
{| class="wikitable"
! Variable_name
! Variable_name
! Value
! Value
Line 357: Line 359:
==== Server status information ====
==== Server status information ====


The server status information provided by the mysqladmin <tt>extended-status</tt> option is rather technical and so it will only be useful for DBA’s, however it is worth knowing how to display it:
The server status information provided by the mysqladmin <tt style="color:#0000b0;">extended-status</tt> option is rather technical and so it will only be useful for DBAs (database administrators), however it is worth knowing how to display it:


  mysqladmin [-u root -p''your_password''] extended-status
  mysqladmin [-u root -p''your_password''] extended-status
Line 375: Line 377:
  mysql>
  mysql>


* The phpMyAdmin tool, through the “Run SQL query/queries on database dbname” window.
* The phpMyAdmin tool, through the “Run SQL query/queries on database ''db_name''” window.


==== Saving the output of a query into a file ====
==== Saving the output of a query into a file ====
Line 381: Line 383:
As results from many commands might be a bit long to be analyzed from a window, it is useful to know how to send them into a file.
As results from many commands might be a bit long to be analyzed from a window, it is useful to know how to send them into a file.


The following commands can only be run from a mysql command window.
===== Windows =====


===== Windows =====
Use command <tt style="color:#0000b0;">tee</tt> to save the output of a query into a file, and command <tt style="color:#0000b0;">notee</tt> to stop writing into a file. These commands can only be run from a mysql command window.


Use command <tt>tee</tt> to save the output of a query into a file, and command <tt>notee</tt> to stop writing into a file. For example:
For example:


  mysql> tee e:/tmp/vars.out;
  mysql> tee e:/tmp/vars.out;
Line 392: Line 394:
  Outfile disabled.
  Outfile disabled.


To determine into which file output is being written, just type <tt>tee</tt> by itself:
To determine into which file output is being written, just type <tt style="color:#0000b0;">tee</tt> by itself:


  mysql> tee;
  mysql> tee;
Line 399: Line 401:
===== Unix =====
===== Unix =====


Use command <tt>pager</tt> to save the output of a query into a file, and command <tt>nopager</tt> to stop writing into a file. For example:
Use command <tt style="color:#0000b0;">pager</tt> to save the output of a query into a file, and command <tt style="color:#0000b0;">nopager</tt> to stop writing into a file. These commands can only be run from a mysql command window.
 
For example:


  mysql> pager cat > /tmp/vars.out
  mysql> pager cat > /tmp/vars.out
Line 407: Line 411:
===== phpMyAdmin =====
===== phpMyAdmin =====


Once a query is run and the results are displayed, the <tt>Print view</tt> or the <tt>Print view (with full texts)</tt> links from the <tt>Query results operations</tt> section can be used to save them.
Once a query is run and the results are displayed, the <tt style="color:#0000b0;">Print view</tt> or the <tt style="color:#0000b0;">Print view (with full texts)</tt> links from the <tt style="color:#0000b0;">Query results operations</tt> section can be used to save them.


==== Basic server status information ====
==== Basic server status information ====
Line 434: Line 438:
  Uptime:                18 hours 25 min 38 sec
  Uptime:                18 hours 25 min 38 sec


If a database hasn’t been set active, the Db characterset variable will contain the global default, otherwise its value will reflect that of the selected database.
'''Note'''. If a database hasn’t been set active, the <tt style="color:#0000b0;">Db characterset</tt> variable will contain the global default, otherwise its value will reflect that of the selected database.


==== Server status information ====
==== Server status information ====


The following will list the same variables as the <tt>mysqladmin extended-status</tt> command:
The following will list the same variables as the <tt style="color:#0000b0;">mysqladmin extended-status</tt> command:


  SHOW STATUS;
  SHOW STATUS;
Line 446: Line 450:
==== System variables with their values ====
==== System variables with their values ====


The following will list the same system variables as the <tt>mysqladmin variables</tt> command, plus a few more:
The following will list the same system variables as the <tt style="color:#0000b0;">mysqladmin variables</tt> command, plus a few more:


  SHOW VARIABLES;
  SHOW VARIABLES;
Line 452: Line 456:
As the list of variables is quite long, the LIKE clause can be used to list only those that match.
As the list of variables is quite long, the LIKE clause can be used to list only those that match.


To list all the character_set variables:
To list all the <tt style="color:#0000b0;">character_set</tt> variables:


  SHOW VARIABLES LIKE 'character_set\_%';
  SHOW VARIABLES LIKE 'character_set\_%';
Line 470: Line 474:
  +--------------------------+-----------------------------------+
  +--------------------------+-----------------------------------+


If a database hasn’t been set active, the character_set_database variable will contain the global default, otherwise its value will reflect that of the selected database.
'''Note'''. If a database hasn’t been set active, the <tt style="color:#0000b0;">character_set_database</tt> variable will contain the global default, otherwise its value will reflect that of the selected database.


To list all the collations defined:
To list all the collations defined:
Line 496: Line 500:
Outputs something like:
Outputs something like:


  +----------+-------------------------------------------------------------------+
  CREATE DATABASE `test02` /*!40100 DEFAULT CHARACTER SET latin1 */
| Database | Create Database                                                  |
+----------+-------------------------------------------------------------------+
| test02  | CREATE DATABASE `test02` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-------------------------------------------------------------------+


If a database was created with a specific character set and/or collation (i.e. it isn’t using the server’s default), the result will include the defined collation, for example:
If a database was created with a specific character set and/or collation (i.e. it isn’t using the server’s default), the result will include the defined collation, for example:


  +----------+---------------------------------------------------------------------------------------------+
  CREATE DATABASE `temp01` /*!40100 DEFAULT CHARACTER SET latin1 <span style="color:#CC0000;">COLLATE latin1_spanish_ci</span> */
| Database | Create Database                                                                            |
+----------+---------------------------------------------------------------------------------------------+
| temp01  | CREATE DATABASE `temp01` /*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_spanish_ci */ |
+----------+---------------------------------------------------------------------------------------------+


As mentioned before, when a database is created or when its character set and/or collation are modified, these two values are saved in the corresponding <tt>db.opt</tt> file.
As mentioned before, when a database is created or when its character set and/or collation are modified, these two values are saved in the corresponding <tt style="color:#0000b0;">db.opt</tt> file.


===== Character set and collation =====
===== Character set and collation =====


The following SQL statement gives both, the character set and collation of a schema.
The following SQL statement gives both, the character set and collation of a schema (database).


  SELECT schema_name, default_character_set_name "character_set", default_collation_name "collation"
  SELECT schema_name, default_character_set_name "character_set", default_collation_name "collation"
Line 532: Line 528:
==== Tables and columns ====
==== Tables and columns ====


A database must first be selected before trying to use these queries.
A database must first be selected before trying to use the SHOW and DESCRIBE commands.


* From a mysql command window:
* From a mysql command window:
Line 557: Line 553:
  CREATE TABLE `t3_latin` (
  CREATE TABLE `t3_latin` (
   `userid` smallint(5) unsigned DEFAULT NULL,
   `userid` smallint(5) unsigned DEFAULT NULL,
   `coll_swd` char(16) COLLATE latin1_spanish_ci NOT NULL,
   `coll_swd` char(16) <span style="color:#CC0000;">COLLATE latin1_spanish_ci</span> NOT NULL,
   `coll_spa` text COLLATE latin1_spanish_ci,
   `coll_spa` text <span style="color:#CC0000;">COLLATE latin1_spanish_ci</span>,
   PRIMARY KEY (`coll_swd`)
   PRIMARY KEY (`coll_swd`)
  ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_spanish_ci
  ) ENGINE=InnoDB DEFAULT CHARSET=latin1 <span style="color:#CC0000;">COLLATE=latin1_spanish_ci</span>


Furthermore, if a particular column was created with a specific character set and/or collation (i.e. it isn’t using the table’s default), the result will include the character set and collation definition in the respective column, for example:
Furthermore, if a particular column was created with a specific character set and/or collation (i.e. it isn’t using the table’s default), the result will include the character set and collation definition in the respective column, for example:
Line 566: Line 562:
  CREATE TABLE `t2_latin` (
  CREATE TABLE `t2_latin` (
   `userid` smallint(5) unsigned DEFAULT NULL,
   `userid` smallint(5) unsigned DEFAULT NULL,
   `coll_gr1` char(16) CHARACTER SET latin1 COLLATE latin1_german1_ci NOT NULL,
   `coll_gr1` char(16) <span style="color:#CC0000;">CHARACTER SET latin1 COLLATE latin1_german1_ci</span> NOT NULL,
   PRIMARY KEY (`coll_gr1`)
   PRIMARY KEY (`coll_gr1`)
  ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  ) ENGINE=InnoDB DEFAULT CHARSET=latin1
Line 649: Line 645:
  | test02      | t2_latin  | latin1              | latin1_swedish_ci |
  | test02      | t2_latin  | latin1              | latin1_swedish_ci |
  +--------------+------------+---------------------+-------------------+
  +--------------+------------+---------------------+-------------------+
===== Tables, with different character set and collation =====
The following SQL query lists all tables in a particular schema that differ from a specific collation (e.g. <span style="color:#CC0000;">latin1_swedish_ci</span>).
SELECT t.table_schema, t.table_name, ccsa.character_set_name "table_character_set", t.table_collation
FROM information_schema.`tables` t,
information_schema.`collation_character_set_applicability` ccsa
WHERE ccsa.collation_name = t.table_collation
AND t.table_schema = '<em>db_name</em>'
AND table_collation NOT LIKE '<span style="color:#CC0000;">latin1_swedish_ci</span>';
Outputs something like:
+--------------+------------+---------------------+-----------------+
| table_schema | table_name | table_character_set | table_collation |
+--------------+------------+---------------------+-----------------+
| test03      | t1_default | utf8                | utf8_unicode_ci |
+--------------+------------+---------------------+-----------------+
===== Columns, displaying character data as hexadecimal values =====
Sometimes it is necessary to confirm how text type data (e.g. CHAR, VARCHAR, TEXT) is actually saved (encoded).
To do this, a SELECT query can be created with an <tt style="color:#0000b0;">HEX()</tt> function around the column to be checked.
For example, if we have the string “áéíóú” stored in column ''col_name'', the following query:
SELECT ''col_name'', HEX(''col_name'') FROM ''tbl_name'';
Would give the following output if the string was encoded in latin1:
+----------+---------------+
| ''col_name'' | HEX(''col_name'') |
+----------+---------------+
| áéíóú    | E1E9EDF3FA    |
+----------+---------------+
And it would give the following output if the string was encoded in utf8:
+----------+----------------------+
| ''col_name'' | HEX(''col_name'')        |
+----------+----------------------+
| áéíóú    | C3A1C3A9C3ADC3B3C3BA |
+----------+----------------------+
'''Note'''. In this example it is assumed that in each case the column is defined with the correct character set (latin1 and utf8, respectively).


===== Columns, character set and collation =====
===== Columns, character set and collation =====


The following SQL query displays the character set and collation of all text type (e.g. <tt>varchar</tt>, <tt>text</tt>) columns in a particular table.
The following SQL query displays the character set and collation of all text type (e.g. CHAR, VARCHAR, TEXT) columns in a particular table.


  SELECT table_schema, table_name,
  SELECT table_schema, table_name,
Line 690: Line 733:
===== Columns, with different character set and collation =====
===== Columns, with different character set and collation =====


The following SQL query displays all text type (e.g. varchar, text) columns in all tables of a particular database, that differ from a specific collation (e.g. ''latin1_swedish_ci'').
The following SQL query displays all text type (e.g. CHAR, VARCHAR, TEXT) columns in all tables of a particular database, that differ from a specific collation (e.g. <span style="color:#CC0000;">latin1_swedish_ci</span>).


  SELECT table_schema, table_name,
  SELECT table_schema, table_name,
Line 698: Line 741:
  WHERE table_schema = '<em>db_name</em>'
  WHERE table_schema = '<em>db_name</em>'
  AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set')
  AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set')
  AND collation_name NOT LIKE '<em>latin1_swedish_ci</em>';
  AND collation_name NOT LIKE '<span style="color:#CC0000;">latin1_swedish_ci</span>';


Outputs something like:
Outputs something like:
Line 717: Line 760:
  AND table_name = '<em>tbl_name</em>'
  AND table_name = '<em>tbl_name</em>'
  AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set')
  AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set')
  AND collation_name NOT LIKE '<em>latin1_swedish_ci</em>';
  AND collation_name NOT LIKE '<span style="color:#CC0000;">latin1_swedish_ci</span>';


=== Dumping a database ===
=== Dumping a database ===


To backup or transfer databases to another SQL server, MySQL includes a client program called <tt>mysqldump</tt>. This program generates a text file with a series of SQL statements to recreate the original database, its tables and the corresponding data.
To backup or transfer databases to another SQL server, MySQL includes a client program called <tt style="color:#0000b0;">mysqldump</tt>. This program generates a text file with a series of SQL statements to recreate the original database, its tables and the corresponding data.


Basically, the <tt>mysqldump</tt> program can be run in one of the following ways:
Basically, the <tt style="color:#0000b0;">mysqldump</tt> program can be run in one of the following ways:


{| class="nicetable"
{| class="wikitable"
! Command
! Command
! Action
! Action
Line 745: Line 788:
===== --default-character-set =====
===== --default-character-set =====


The <tt>--default-character-set</tt> option sets ''charset_name'' as the default character set. It specifies the file character encoding to be used or how the file is going to be encoded, that is, column values are converted to the character set specified by this option.
The <tt style="color:#0000b0;">--default-character-set</tt> option defines ''charset_name'' as the default character set.
 
This is one of the most critical options, as it specifies:
 
# The file character encoding to be used or how the file is going to be encoded: as column values from the database are read, they are converted to the character set specified by this option before being saved in the dump file (see, Encoding and decoding).
# The ''default_character_set'' for the SET NAMES statement when the <tt style="color:#0000b0;">--skip-set-charset</tt> or the <tt style="color:#0000b0;">--no-set-names</tt> options are not used.


For example, the following command:
For example, the following command:


  mysqldump --default-character-set=''latin1'' test04 -r test04.sql
  mysqldump --default-character-set=<span style="color:#CC0000;">latin1</span> ''db_name'' -r backup.sql
 
would generate a <span style="color:#CC0000;">latin1</span> encoded dump file; whereas either of the following:
 
mysqldump --default-character-set=<span style="color:#CC0000;">utf8</span> ''db_name'' -r backup.sql
mysqldump ''db_name'' -r backup.sql


would generate a latin1 encoded dump file (see, Encoding and decoding).
would generate a <span style="color:#CC0000;">utf8</span> encoded dump file.


'''Note'''. If a character set is not specified, mysqldump will use utf8 by default (versions lower than 4.1.2 use latin1 by default).
'''Note'''. If a character set is not specified (as shown in the last example), <tt style="color:#0000b0;">mysqldump</tt> will use utf8 by default (versions lower than 4.1.2 use latin1 by default).


===== --set-charset =====
===== --set-charset =====


The <tt>--set-charset</tt> option adds a SET NAMES ''default_character_set'' statement to the output. This option is enabled by default.
The <tt style="color:#0000b0;">--set-charset</tt> option adds a SET NAMES ''default_character_set'' statement to the output. This option is enabled by default.
 
The SET NAMES statement specifies the character set used by the client to send SQL statements to the server, and the character set that the server should use for sending results back to the client (see, For data encoding).


A SET NAMES ''character_set'' statement is equivalent to the following three statements:
As the SET NAMES ''character_set'' statement is equivalent to the following three statements:


  SET character_set_client = ''character_set'';
  SET character_set_client = ''character_set'';
Line 767: Line 818:
  SET character_set_connection = ''character_set'';
  SET character_set_connection = ''character_set'';


As this option is enabled by default, the following command:
the SET NAMES statement specifies the character set to be used (during the restore of the dump file) by the client to send SQL statements to the server, and the character set that the server should use for sending results back to the client (see, For data encoding).
 
Since this option is enabled by default, the following command:


  mysqldump --default-character-set=''latin1'' test04 -r test04.sql
  mysqldump --default-character-set=<span style="color:#CC0000;">latin1</span> ''db_name'' -r backup.sql


would add the following conditional comments at the beginning of the dump file:
would add the following conditional comments at the beginning of the dump file:
Line 776: Line 829:
  /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
  /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
  /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
  /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
  /*!40101 SET NAMES ''latin1'' */;
  /*!40101 SET NAMES <span style="color:#CC0000;">latin1</span> */;


and the following at the end:
and the following at the end:
Line 784: Line 837:
  /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
  /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;


This would tell the <tt>mysql</tt> program that it should use, send and receive data encoded in latin1.
where the "SET NAMES <span style="color:#CC0000;">latin1</span>" command will instruct the <tt style="color:#0000b0;">mysql</tt> program to use, send and receive data encoded in <span style="color:#CC0000;">latin1</span>.


===== --skip-set-charset =====
===== --skip-set-charset =====


The <tt>--skip-set-charset</tt> option is used to disable the <tt>--set-charset</tt> option, which is enabled by default.
The <tt style="color:#0000b0;">--skip-set-charset</tt> option is used to disable the <tt style="color:#0000b0;">--set-charset</tt> option, which is enabled by default.


===== --no-set-names (-N) =====
===== --no-set-names (-N) =====


The <tt>--no-set-names</tt> is equivalent to <tt>--skip-set-charset</tt>, and so this option disables the <tt>--set-charset</tt> option: it doesn’t add a SET NAMES ''default_character_set'' to the output.
The <tt style="color:#0000b0;">--no-set-names</tt> is equivalent to <tt style="color:#0000b0;">--skip-set-charset</tt>, and so this option disables the <tt style="color:#0000b0;">--set-charset</tt> option: it indicates that a SET NAMES ''default_character_set'' should not be added to the output.


===== --complete-insert (-c) =====
===== --complete-insert (-c) =====


The <tt>--complete-insert</tt> option indicates to use complete INSERT statements that include column names.
The <tt style="color:#0000b0;">--complete-insert</tt> option indicates to use complete INSERT statements that include column names.


For example, the following command:
For example, the following command:


  mysqldump '''-c''' test04 -r test04.sql
  mysqldump <span style="color:#CC0000;">-c</span> ''db_name'' -r backup.sql


would create an INSERT statement that includes the names of the columns:
would create an INSERT statement that includes the names of the columns:


  INSERT INTO `t1_latin` '''(`id`, `string`)''' VALUES (1,'text1'),(2,' text2');
  INSERT INTO `t1_latin` <span style="color:#CC0000;">(`id`, `string`)</span> VALUES (1,'text1'),(2,' text2');


But the following command:
But the following command:


  mysqldump test04 -r test04.sql
  mysqldump ''db_name'' -r backup.sql


would create an INSERT statement without the names of the columns:
would create an INSERT statement without the names of the columns:
Line 816: Line 869:
===== --extended-insert (-e) =====
===== --extended-insert (-e) =====


The <tt>--extended-insert</tt> indicates to use a multiple-row INSERT syntax that include several VALUES lists. This results in smaller dump files and faster inserts when the file is reloaded. This option is enabled by default.
The <tt style="color:#0000b0;">--extended-insert</tt> indicates to use a multiple-row INSERT syntax that include several VALUES lists. This results in smaller dump files and faster inserts when the file is reloaded. This option is enabled by default.


As this option is enabled by default, the following command:
Since this option is enabled by default, the following command:


  mysqldump test04 -r test04.sql
  mysqldump ''db_name'' -r backup.sql


would create an INSERT statement that includes all the records to be added:
would create an INSERT statement that includes all the records to be added:


  INSERT INTO `t1_latin` VALUES (1,'text1'),(2,'text2');
  INSERT INTO `t1_latin` VALUES <span style="color:#CC0000;">(1,'text1'),(2,'text2')</span>;


===== --skip-extended-insert =====
===== --skip-extended-insert =====


The <tt>--skip-extended-insert</tt> disables the <tt>--extended-insert</tt> option, which is enabled by default.
The <tt style="color:#0000b0;">--skip-extended-insert</tt> disables the <tt style="color:#0000b0;">--extended-insert</tt> option, which is enabled by default.


For example, the following command:
For example, the following command:


  mysqldump --skip-extended-insert test04 -r test04.sql
  mysqldump --skip-extended-insert ''db_name'' -r backup.sql


would create multiple INSERT statements, one for each record added:
would create multiple INSERT statements, one for each record added:
Line 841: Line 894:
===== --result-file (-r) =====
===== --result-file (-r) =====


The <tt>--result-file</tt> option directs output to a given file.
The <tt style="color:#0000b0;">--result-file</tt> option directs output to a given file.


As this option prevents newline characters from being converted to carriage return/newline sequences, in a Windows environment it should be used:
Since this option prevents newline characters from being converted to carriage return/newline sequences, it should be used when working in a Windows environment:


  mysqldump test04 '''-r''' test04.sql
  mysqldump ''db_name'' <span style="color:#CC0000;">-r</span> backup.sql


instead of using ">" to redirect output to file:
instead of using ">" to redirect output to a file:


  mysqldump test04 > test04.sql
  mysqldump ''db_name'' > backup.sql


===== --single-transaction =====
===== --single-transaction =====


The <tt>--single-transaction</tt> option issues a BEGIN SQL statement before dumping data from the server, and it should be used with transactional tables (like InnoDB).
The <tt style="color:#0000b0;">--single-transaction</tt> option issues a BEGIN SQL statement before dumping data from the server, and it should be used with transactional tables (like InnoDB).


As this option and the <tt>--lock-tables</tt> option are mutually exclusive, the <tt>--skip-lock-tables</tt> option should also be used prior to using the <tt>--single-transaction</tt> option:
Since this option and the <tt style="color:#0000b0;">--lock-tables</tt> option are mutually exclusive, the <tt style="color:#0000b0;">--skip-lock-tables</tt> option should also be used prior to using the <tt style="color:#0000b0;">--single-transaction</tt> option:


  mysqldump --skip-lock-tables --single-transaction test04 -r test04.sql
  mysqldump <span style="color:#CC0000;">--skip-lock-tables --single-transaction</span> ''db_name'' -r backup.sql


===== --opt =====
===== --opt =====


The <tt>--opt</tt> option is shorthand for the following:
The <tt style="color:#0000b0;">--opt</tt> option is shorthand for the following:


{| class="nicetable"
{| class="wikitable"
! Command
! Command
! Action
! Action
Line 892: Line 945:
|}
|}


As all of these options are enabled by default, the <tt>--opt</tt> option is also enabled by default.
Since all of these options are enabled by default, the <tt style="color:#0000b0;">--opt</tt> option is also enabled by default.


===== --skip-opt =====
===== --skip-opt =====


The <tt>--skip-opt</tt> disables the <tt>--opt</tt> option, which is enabled by default.
The <tt style="color:#0000b0;">--skip-opt</tt> disables the <tt style="color:#0000b0;">--opt</tt> option, which is enabled by default.


===== --add-drop-database =====
===== --add-drop-database =====


The <tt>--add-drop-database</tt> option adds a DROP DATABASE statement before each CREATE DATABASE statement.
The <tt style="color:#0000b0;">--add-drop-database</tt> option adds a DROP DATABASE statement before each CREATE DATABASE statement.


===== --add-drop-table =====
===== --add-drop-table =====


The <tt>--add-drop-table</tt> option adds a DROP TABLE statement before each CREATE TABLE statement.
The <tt style="color:#0000b0;">--add-drop-table</tt> option adds a DROP TABLE statement before each CREATE TABLE statement.


==== Internal structure of a dump file ====
==== Internal structure of a dump file ====
Line 912: Line 965:
===== Comments =====
===== Comments =====


Comments are completely ignored by the <tt>mysql</tt> client program and only serve to describe what follows. Comments have the following structure:
Comments are completely ignored by the <tt style="color:#0000b0;">mysql</tt> client program and only serve to describe what follows. Comments have the following structure:
 
-- ''comment''
 
For example:


  -- ''comment'' (e.g. -- MySQL dump 10.13  Distrib 5.5.8, for Win32 (x86)
  -- MySQL dump 10.13  Distrib 5.5.8, for Win32 (x86)


===== Conditional comments =====
===== Conditional comments =====


Conditional comments are used to hide SET commands or SQL statements from older versions of <tt>mysql</tt> (used to restore a dump file). Conditional comments have the following structure:
Conditional comments are used to hide SET commands or SQL statements from older versions of <tt style="color:#0000b0;">mysql</tt> (used to restore a dump file). Conditional comments have the following structure:


  /*!''five digit version number'' (e.g. /*!40101 or /*!40103)
  /*!''five digit version number'' SET|SQL statement */
SET ''statement'' (e.g. SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT or SET NAMES utf8)
*/


For example, the following line, which contains two conditional comments:
For example, the following line, which contains two conditional comments:
Line 928: Line 983:
  CREATE DATABASE /*!32312 IF NOT EXISTS*/ `test04` /*!40100 DEFAULT CHARACTER SET latin1 */;
  CREATE DATABASE /*!32312 IF NOT EXISTS*/ `test04` /*!40100 DEFAULT CHARACTER SET latin1 */;


would be interpreted differently by each <tt>mysql</tt> version:
would be interpreted differently by each <tt style="color:#0000b0;">mysql</tt> version:


{| class="nicetable"
{| class="wikitable"
! mysql versions
! mysql versions
! would read it as
! would read it as
Line 948: Line 1,003:
About the variable syntax of the SET command:
About the variable syntax of the SET command:


* The string <tt>@@''variable_name''</tt> can refer to a session value or a global value:
* The string <tt style="color:#0000b0;">@@''variable_name''</tt> can refer to a session value or a global value:
** When being refered, MySQL returns the session value if it exists and the global value otherwise.
** When being refered, MySQL returns the session value if it exists and the global value otherwise.
** When being set (e.g. <tt>SET @@variable_name=value</tt>), it always refers to the session value.
** When being set (e.g. <tt style="color:#0000b0;">SET @@variable_name=value</tt>), it always refers to the session value.
* The string <tt>@variable_name</tt> refers to a user variable.
* The string <tt style="color:#0000b0;">@variable_name</tt> refers to a user variable.
* When a variable doesn't have a modifier, it is a session variable.
* When a variable doesn't have a modifier, it is a session variable.


Line 958: Line 1,013:
The actual SQL statements, for example:
The actual SQL statements, for example:


  USE `test04`;
  USE `''db_name''`;
  DROP TABLE IF EXISTS `t1_latin`;
  DROP TABLE IF EXISTS `''tbl_name''`;


== Useful tools ==
== Useful tools ==
Line 981: Line 1,036:
==== Super Sed ====
==== Super Sed ====


[http://sed.sourceforge.net/grabbag/ssed/ Super Sed] is a heavily enhanced version of <tt>sed</tt> (a stream editor used to perform basic text transformations on an input stream).
[http://sed.sourceforge.net/grabbag/ssed/ Super Sed] is a heavily enhanced version of <tt style="color:#0000b0;">sed</tt> (a stream editor used to perform basic text transformations on an input stream).


Download: [http://sed.sourceforge.net/grabbag/ssed/sed-3.62.zip Super Sed].
Download: [http://sed.sourceforge.net/grabbag/ssed/sed-3.62.zip Super Sed].

Latest revision as of 04:55, 1 May 2024


Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.


This is the start of a few pages of largely version independent reference

In the spirit of wiki and in response to the post here: http://moodle.org/mod/forum/discuss.php?d=205117

Go for it Guillermo.

Maybe start here, and later on add a category:Reference and see what happens. I've set a watch on, and I may do a page on JSON errors.

Database

I'll start writing here about database and utf8 issues, but I guess that the idea would be for this to be the main reference table of contents page so it then links to each main topic page.

Basic concepts

Byte, character, character set, character encoding and collation

A byte (or octet) is the smallest unit of storage that can be allocated and, in almost all modern computers, it consists of 8 bits, where a bit can only have two values: 0 or 1.

A character is a symbol used in a writing system, such as a letter of the alphabet.

A character set, or a coded character set, refers to the set of characters and numbers, or code points, used to represent them. A code point can be understood as the position the character occupies in the set.

A character encoding, or a character encoding form, refers to how the code points are converted (encoded) into code values to be saved or stored in a computer.

Unicode is a character set developed to consistently encode, represent and handle text in most of the world's writing systems in use today. Unicode can be implemented by different character encodings, where the two most commonly used are UTF-8 and UTF-16:

  • UTF-8 is a variable-width character encoding that can represent every character in the Unicode character set. UTF-8 encodes each of the 1,112,064 code points in the Unicode character set using one to four bytes.
  • UTF-16 is a variable-length character encoding for Unicode that uses one or two 16-bit code units, where each unit takes two 8-bit bytes, and the order of the bytes may depend on the byte order (endianness) of the computer architecture. To assist in recognizing the byte order of code units, UTF-16 allows a Byte Order Mark (BOM), a code unit with the hexadecimal value U+FEFF, to precede the first actual coded value.

Eventhough UTF-8 is the preferred character encoding nowadays, many others have been used, for example:

  • ISO-8859-1 (or Latin-1). A character encoding for the Latin script, where each character is encoded as a single 8-bit code value. This character-encoding scheme consists of 191 characters and it is used throughout The Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.
  • ISO-8859-15 (or Latin-9). A character encoding similar to Latin-1, except that it substituted some rarely used characters with the euro sign and a few other letters.
  • Shift JIS. A character encoding for the Japanese language.

To illustrate the concepts given above, the € (euro) character has a code point equal to 8364 (or U+20AC, in hexadecimal notation) in the Unicode character set. This code point can be stored in different ways, depending on the character encoding used:

Character encoding Code value (hex)
ISO-8859-15 A4
UTF-8 E2 82 AC
UTF-16 20AC

Finally, a Collation determines how data is sorted and how strings are compared to each other.

Encoding and decoding

One of the most important things one must keep in mind when working with databases (or files, for that matter) is to be aware of how characters (letters, numbers and non-alphanumeric characters) are actually being saved or encoded and how bytes (represented with hexadecimal values) are actually being interpreted or decoded.

Note. In a database, character encoding only applies to text type columns (e.g. CHAR, VARCHAR, TEXT).

For example, it is not enough to say that character “é” (small letter e with acute) has been saved in a text file, one also has to be aware of how was it encoded. Why? Because if it was encoded as Latin-1, the file will have one byte of hex value E9, but if it was encoded as UTF-8, the file would then have two bytes of hex value C3 and A9:

Character Latin-1 hex value UTF-8 hex values
é E9 C3 A9

In the same way, it is not enough to say that an UTF-8 encoded character “é” is going to be read from a text file, because eventhough we might know that two bytes of hex value “C3 A9” will be read, if they are decoded (interpreted) as Latin-1, we will get characters é, but if they are decoded as UTF-8, we would then get character “é”:

Hex value Latin-1 characters UTF-8 character
C3 A9 é é

Being aware of these two concepts will become relevant when trying to understand how data is encoded and decoded through the many database stages (source, client, server) in order to be saved or restored.

MySQL

Character sets and collations

Until version 4.1, MySQL tables used the latin1 character set by default. From version 4.1, not only the concepts of "character set" and "collation" were introduced, but the character set by default (at the configuration file) was changed to utf8.

In MySQL, a character set is a set of symbols and encodings, and a collation is a set of rules for comparing characters in a character set.

Note. Eventhough in MySQL the term character set is used, it should actually be character encoding (charset).

The MySQL server supports multiple character sets, like ascii, koi8r, latin1, sjis and utf8. Each character set has at least one collation, however, as it may have many more, a default one is always defined for each character set.

Character set Default collation Other collations
ascii ascii_general_ci ascii_bin
koi8r koi8r_general_ci koi8r_bin
latin1 latin1_swedish_ci latin1_bin, latin1_general_ci, latin1_spanish_ci
sjis sjis_japanese_ci sjis_bin
utf8 utf8_general_ci utf8_bin, utf8_unicode_ci, utf8_turkish_ci

Collation names follow a convention: they start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), or _bin (binary). From this, is should be evident that two different character sets cannot have the same collation.

Character set variables

MySQL uses several “character set” type system variables that could be classified in three groups according to their use.

For data encoding

The following three determine the character encoding used to transform data as it flows from the client to the server and viceversa.

  • character_set_client. This variable defines the character set used by the client and in which it sends statements to the server.
  • character_set_connection. This variable defines the character set in which the server translates statements received from the client.
  • character_set_results. This variable defines the character set in which the server returns query results to the client.

Note. For the character_set_connection variable there is also a related collation type variable: collation_connection.

As default values

The following two are used to determine the character encoding to be used when a database or a table is created, respectively.

  • character_set_server. This variable defines the character set of a new schema when the database charset is not specified in the CREATE DATABASE statement.
  • character_set_database. This variable defines the character set of a new table when the table charset is not specified in the CREATE TABLE statement.

Note. For these two variables there are also two related collation type variables: collation_server and collation_database, respectively.

Other uses

The following two have other uses.

  • character_set_filesystem. This variable defines the character set used by the file system (names of directories or names of files).
  • character_set_system. This variable defines the character set used to save metadata (data about data, for example, names of tables).

Default settings

The MySQL database server uses a very flexible scheme where default settings for character sets and collations can be set at four levels: Server, Database, Table and Column.

Server
The server charset and collation are used as default values for schema definitions when the database charset and collation are not specified in CREATE DATABASE statements.
The server character set and collation can be determined from the values of the character_set_server and collation_server system variables.
Database
The database charset and collation are used as default values for table definitions when the table charset and collation are not specified in CREATE TABLE statements.
The character set and collation for the default database can be determined from the values of the character_set_database and collation_database system variables.
Table
The table charset and collation are used as default values for column definitions when the column charset and collation are not specified in individual column definitions.
The table character set and collation are MySQL extensions; therefore they do not exist in standard SQL.
Column
A character set and a collation may be specified for every character type column (that is, columns of type CHAR, VARCHAR, or TEXT).
The column character set and collate clauses are standard SQL.

Configuration files

MySQL settings

All MySQL settings, both for the server and the client components, are defined under one configuration or option file: my.ini (Windows) or my.cnf (Linux), although many copies of this file might be read at startup.

Settings in the configuration file are divided in option groups, where each one contains the settings for: the server, all clients and each particular client:

Settings for Option group
The server [mysqld], [server], etc.
All clients [client]
Each particular client [mysql], [mysqldump], [myisamchk], [mysqlhotcopy], etc.

Database

For each database created, its character set and collation values are saved in a text file named db.opt, located in the corresponding database folder (db_name) under the datadir directory.

more C:\MySqlServer5.5\data\db_name\db.opt

Outputs something like:

default-character-set=latin1
default-collation=latin1_swedish_ci

When a database is first created, if the character set and collation options are not defined in the CREATE DATABASE statement, then they are taken from the server charset and collation values. These two values can be modified via the ALTER DATABASE statement.

Gathering information (1)

To run the commands given in this section, one has to have direct access to the MySQL server.

From here on it will be assumed that the server is already running. Also, for Windows users, commands must be entered from a Windows command (system) window.

Server and client

Running a particular program with the parameters --verbose --help:

mysqld --verbose --help
mysql --verbose --help
mysqldump --verbose --help

will list four pieces of information:

  1. The location of the configuration files.
  2. The option groups read by the program.
  3. The allowed options or variables (parameters) with their abreviations and descriptions.
  4. The default values of variables that can be set from the command line.
Location of the configuration files

Default options are read from the following files in the given order:

OS The following groups are read
Windows WINDIR\my.ini WINDIR\my.cnf C:\my.ini C:\my.cnf INSTALLDIR\my.ini INSTALLDIR\my.cnf
Linux /etc/my.cnf /etc/mysql/my.cnf SYSCONFDIR/my.cnf $MYSQL_HOME/my.cnf

Note. WINDIR is the Windows directory (e.g. C:\Windows) and INSTALLDIR is the MySQL installation directory (e.g. C:\MySqlServer5.5).

The option groups read

The option groups (in the configuration file) read by each program; for example:

Program The following groups are read
mysqld mysqld server mysqld-5.5
mysql mysql client
mysqldump mysqldump client
Allowed options (variables)

The variables accepted (in the command line) by the program, with their abreviations and descriptions; for example, the following are some of those listed for the mysql client program:

Variable (option) Description
-D, --database=name Database to use.
--default-character-set=name Set the default character set.
Variables and their values

The default values of variables that can be set from the command line for each program; for example, the following are some of those listed by the mysqld server program:

Variable Default value
basedir C:/MySqlServer5.5/
character-set-server latin1
datadir C:\ MySqlServer5.5\Data\
date-format %Y-%m-%d
innodb ON
tmpdir C:\Tmp

* Important note. The mysqld program lists the values of the variables that the server (based on its compiled-in defaults) will use; however, some of those values might change depending on the command line parameters given. For example, if the my.ini (or my.cnf) configuration file has the following lines:

[mysqld]
character-set-server=utf8

Then we will get a different value for variable character-set-server if we use the no-defaults parameter:

Command Action Value of character-set-server
mysqld --no-defaults --verbose --help Ignore the settings in any option files. latin1
mysqld --verbose --help Include the settings of any option files that it reads. utf8

System values used by a running server

While the --help --verbose options of the MySQL server program (mysqld) show command line options and their default values, there are also other variables that can only be seen after the server has been started and that provide information about its operation.

To see what system values is using a running server, the mysqladmin program can be used with the variables option:

mysqladmin [-u root -pyour_password] variables

The following are some of the variables and values listed:

Variable_name Value
character_set_client latin1
character_set_connection latin1
collation_connection latin1_swedish_ci
connect_timeout 10

Server status information

The server status information provided by the mysqladmin extended-status option is rather technical and so it will only be useful for DBAs (database administrators), however it is worth knowing how to display it:

mysqladmin [-u root -pyour_password] extended-status

Gathering information (2)

The first step before trying to fix a character set or a collation problem in a database, table or column is to determine how are they actually defined. The aim of this section then, is to show how to get specific pieces of information about these components.

To run all the SQL queries shown in this section, one can work from:

  • A mysql command window, in which case the mysql client program must be running:
mysql -u root -pyour_password

Starts a mysql session:

mysql>
  • The phpMyAdmin tool, through the “Run SQL query/queries on database db_name” window.

Saving the output of a query into a file

As results from many commands might be a bit long to be analyzed from a window, it is useful to know how to send them into a file.

Windows

Use command tee to save the output of a query into a file, and command notee to stop writing into a file. These commands can only be run from a mysql command window.

For example:

mysql> tee e:/tmp/vars.out;
mysql> show variables;
mysql> notee;
Outfile disabled.

To determine into which file output is being written, just type tee by itself:

mysql> tee;
Currently logging to file 'e:/tmp/vars.out'
Unix

Use command pager to save the output of a query into a file, and command nopager to stop writing into a file. These commands can only be run from a mysql command window.

For example:

mysql> pager cat > /tmp/vars.out
mysql> show variables;
mysql> nopager;
phpMyAdmin

Once a query is run and the results are displayed, the Print view or the Print view (with full texts) links from the Query results operations section can be used to save them.

Basic server status information

The STATUS command will get basic status information from the server, like the current database in use, the server version, the TCP port or the characterset defined for the server, the database, the client and the connection. This command can only be run from a mysql command window.

mysql> STATUS;

Outputs something like:

Current database:
mysql  Ver 14.14 Distrib 5.5.8, for Win32 (x86)
Connection id:          232
Current database:
Current user:           root@localhost
SSL:                    Not in use
Using delimiter:        ;
Server version:         5.5.8 MySQL Community Server (GPL)
Protocol version:       10
Connection:             localhost via TCP/IP
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1
Conn.  characterset:    latin1
TCP port:               3306
Uptime:                 18 hours 25 min 38 sec

Note. If a database hasn’t been set active, the Db characterset variable will contain the global default, otherwise its value will reflect that of the selected database.

Server status information

The following will list the same variables as the mysqladmin extended-status command:

SHOW STATUS;

As the list of variables is quite long, the LIKE clause can be used to list only those that match.

System variables with their values

The following will list the same system variables as the mysqladmin variables command, plus a few more:

SHOW VARIABLES;

As the list of variables is quite long, the LIKE clause can be used to list only those that match.

To list all the character_set variables:

SHOW VARIABLES LIKE 'character_set\_%';

Outputs something like:

+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | latin1                            |
| character_set_connection | latin1                            |
| character_set_database   | latin1                            |
| character_set_filesystem | binary                            |
| character_set_results    | latin1                            |
| character_set_server     | latin1                            |
| character_set_system     | utf8                              |
+--------------------------+-----------------------------------+

Note. If a database hasn’t been set active, the character_set_database variable will contain the global default, otherwise its value will reflect that of the selected database.

To list all the collations defined:

SHOW VARIABLES LIKE 'coll%';

Outputs something like:

+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+

Database

How was it created

The following command shows how a database (schema) was created (i.e., the complete CREATE DATABASE statement used to create it).

show create database db_name;

Outputs something like:

CREATE DATABASE `test02` /*!40100 DEFAULT CHARACTER SET latin1 */

If a database was created with a specific character set and/or collation (i.e. it isn’t using the server’s default), the result will include the defined collation, for example:

CREATE DATABASE `temp01` /*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_spanish_ci */

As mentioned before, when a database is created or when its character set and/or collation are modified, these two values are saved in the corresponding db.opt file.

Character set and collation

The following SQL statement gives both, the character set and collation of a schema (database).

SELECT schema_name, default_character_set_name "character_set", default_collation_name "collation"
FROM information_schema.schemata
WHERE schema_name = 'db_name';

Outputs something like:

+-------------+---------------+-------------------+
| schema_name | character_set | collation         |
+-------------+---------------+-------------------+
| test02      | latin1        | latin1_swedish_ci |
+-------------+---------------+-------------------+

It is important to remember that these values only act as defaults to define the character set and the collation of newly created tables.

Tables and columns

A database must first be selected before trying to use the SHOW and DESCRIBE commands.

  • From a mysql command window:
use db_name;
  • From phpMyAdmin, just select the database to be used.
Table, how was it created

The following command shows how a table was created.

SHOW CREATE TABLE tbl_name;

Outputs something like:

CREATE TABLE `t1_latin` (
  `userid` smallint(5) unsigned DEFAULT NULL,
  `coll_swd` char(16) NOT NULL,
  PRIMARY KEY (`coll_swd`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

If a particular table was created with a specific character set and/or collation (i.e. it isn’t using the database’s default), the result will include the defined collation both in the table definition and in each of the text type column definitions, for example:

CREATE TABLE `t3_latin` (
  `userid` smallint(5) unsigned DEFAULT NULL,
  `coll_swd` char(16) COLLATE latin1_spanish_ci NOT NULL,
  `coll_spa` text COLLATE latin1_spanish_ci,
  PRIMARY KEY (`coll_swd`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_spanish_ci

Furthermore, if a particular column was created with a specific character set and/or collation (i.e. it isn’t using the table’s default), the result will include the character set and collation definition in the respective column, for example:

CREATE TABLE `t2_latin` (
  `userid` smallint(5) unsigned DEFAULT NULL,
  `coll_gr1` char(16) CHARACTER SET latin1 COLLATE latin1_german1_ci NOT NULL,
  PRIMARY KEY (`coll_gr1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Table, describe its structure

Either of the two following commands display a simple view of the structure of table.

DESCRIBE tbl_name;
SHOW COLUMNS FROM tbl_name;

Outputs something like:

+----------+----------------------+------+-----+---------+-------+
| Field    | Type                 | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+-------+
| userid   | smallint(5) unsigned | YES  |     | NULL    |       |
| coll_swd | char(16)             | NO   | PRI | NULL    |       |
+----------+----------------------+------+-----+---------+-------+

The following command displays a more complete view of the structure of table.

SHOW FULL COLUMNS FROM tbl_name;

Outputs something like:

+----------+----------------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field    | Type                 | Collation         | Null | Key | Default | Extra | Privileges                      | Comment |
+----------+----------------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| userid   | smallint(5) unsigned | NULL              | YES  |     | NULL    |       | select,insert,update,references |         |
| coll_swd | char(16)             | latin1_swedish_ci | NO   | PRI | NULL    |       | select,insert,update,references |         |
+----------+----------------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
Table, index information

The following command lists the indexes (if there are any) on a table.

SHOW INDEX FROM tbl_name;

Outputs something like:

+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t1_latin |          0 | PRIMARY  |            1 | coll_swd    | A         |           4 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Table, character set and collation

The following SQL query displays the character set and collation of a table in a particular schema.

SELECT t.table_schema, t.table_name, ccsa.character_set_name "table_character_set", t.table_collation
FROM information_schema.`tables` t,
     information_schema.`collation_character_set_applicability` ccsa
WHERE ccsa.collation_name = t.table_collation
AND t.table_schema = 'db_name'
AND t.table_name = 'tbl_name';

Outputs something like:

+--------------+------------+---------------------+-------------------+
| table_schema | table_name | table_character_set | table_collation   |
+--------------+------------+---------------------+-------------------+
| test02       | t1_latin   | latin1              | latin1_swedish_ci |
+--------------+------------+---------------------+-------------------+

To list the same information for all the tables in a particular schema, one only needs to remove the last AND clause from the previous query.

SELECT t.table_schema, t.table_name, ccsa.character_set_name "table_character_set", t.table_collation
FROM information_schema.`tables` t,
     information_schema.`collation_character_set_applicability` ccsa
WHERE ccsa.collation_name = t.table_collation
AND t.table_schema = 'db_name';

Outputs something like:

+--------------+------------+---------------------+-------------------+
| table_schema | table_name | table_character_set | table_collation   |
+--------------+------------+---------------------+-------------------+
| test02       | t1_latin   | latin1              | latin1_swedish_ci |
| test02       | t2_latin   | latin1              | latin1_swedish_ci |
+--------------+------------+---------------------+-------------------+
Tables, with different character set and collation

The following SQL query lists all tables in a particular schema that differ from a specific collation (e.g. latin1_swedish_ci).

SELECT t.table_schema, t.table_name, ccsa.character_set_name "table_character_set", t.table_collation
FROM information_schema.`tables` t,
information_schema.`collation_character_set_applicability` ccsa
WHERE ccsa.collation_name = t.table_collation
AND t.table_schema = 'db_name'
AND table_collation NOT LIKE 'latin1_swedish_ci';

Outputs something like:

+--------------+------------+---------------------+-----------------+
| table_schema | table_name | table_character_set | table_collation |
+--------------+------------+---------------------+-----------------+
| test03       | t1_default | utf8                | utf8_unicode_ci |
+--------------+------------+---------------------+-----------------+
Columns, displaying character data as hexadecimal values

Sometimes it is necessary to confirm how text type data (e.g. CHAR, VARCHAR, TEXT) is actually saved (encoded).

To do this, a SELECT query can be created with an HEX() function around the column to be checked.

For example, if we have the string “áéíóú” stored in column col_name, the following query:

SELECT col_name, HEX(col_name) FROM tbl_name;

Would give the following output if the string was encoded in latin1:

+----------+---------------+
| col_name | HEX(col_name) |
+----------+---------------+
| áéíóú    | E1E9EDF3FA    |
+----------+---------------+

And it would give the following output if the string was encoded in utf8:

+----------+----------------------+
| col_name | HEX(col_name)        |
+----------+----------------------+
| áéíóú    | C3A1C3A9C3ADC3B3C3BA |
+----------+----------------------+

Note. In this example it is assumed that in each case the column is defined with the correct character set (latin1 and utf8, respectively).

Columns, character set and collation

The following SQL query displays the character set and collation of all text type (e.g. CHAR, VARCHAR, TEXT) columns in a particular table.

SELECT table_schema, table_name,
  column_name, character_set_name "column_character_set", collation_name "column_collation",
  data_type, column_type
FROM information_schema.columns
WHERE table_schema = 'db_name'
AND table_name = 'tbl_name'
AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set');

Outputs something like:

+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
| table_schema | table_name | column_name | column_character_set | column_collation  | data_type | column_type |
+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
| test02       | t1_latin   | coll_swd    | latin1               | latin1_swedish_ci | char      | char(16)    |
+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+

To list the same information for all the tables in a particular schema, one only needs to remove the first AND clause from the previous query.

SELECT table_schema, table_name,
  column_name, character_set_name "column_character_set", collation_name "column_collation",
  data_type, column_type
FROM information_schema.columns
WHERE table_schema = 'db_name'
AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set');

Outputs something like:

+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
| table_schema | table_name | column_name | column_character_set | column_collation  | data_type | column_type |
+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
| test02       | t1_latin   | coll_swd    | latin1               | latin1_swedish_ci | char      | char(16)    |
| test02       | t2_latin   | coll_gr1    | latin1               | latin1_german_ci  | char      | char(16)    |
+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
Columns, with different character set and collation

The following SQL query displays all text type (e.g. CHAR, VARCHAR, TEXT) columns in all tables of a particular database, that differ from a specific collation (e.g. latin1_swedish_ci).

SELECT table_schema, table_name,
  column_name, character_set_name "column_character_set", collation_name "column_collation",
  data_type, column_type
FROM information_schema.columns
WHERE table_schema = 'db_name'
AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set')
AND collation_name NOT LIKE 'latin1_swedish_ci';

Outputs something like:

+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
| table_schema | table_name | column_name | column_character_set | column_collation  | data_type | column_type |
+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+
| test02       | t2_latin   | coll_gr1    | latin1               | latin1_german1_ci | char      | char(16)    |
+--------------+------------+-------------+----------------------+-------------------+-----------+-------------+

To display the same information, but only for a specific table, an extra AND clause can be added to the SQL query:

SELECT table_schema, table_name,
  column_name, character_set_name "column_character_set", collation_name "column_collation",
  data_type, column_type
FROM information_schema.columns
WHERE table_schema = 'db_name'
AND table_name = 'tbl_name'
AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext', 'enum', 'set')
AND collation_name NOT LIKE 'latin1_swedish_ci';

Dumping a database

To backup or transfer databases to another SQL server, MySQL includes a client program called mysqldump. This program generates a text file with a series of SQL statements to recreate the original database, its tables and the corresponding data.

Basically, the mysqldump program can be run in one of the following ways:

Command Action
mysqldump [options] db_name [tbl_name ...] dumps database, a table or a series of tables
mysqldump [options] --databases db_name ... dumps a series of databases
mysqldump [options] --all-databases dumps all databases

This program is simple to use, but it is important to be familiar with a few of its options, as using a wrong setting could lead to corrupted data.

Options

--default-character-set

The --default-character-set option defines charset_name as the default character set.

This is one of the most critical options, as it specifies:

  1. The file character encoding to be used or how the file is going to be encoded: as column values from the database are read, they are converted to the character set specified by this option before being saved in the dump file (see, Encoding and decoding).
  2. The default_character_set for the SET NAMES statement when the --skip-set-charset or the --no-set-names options are not used.

For example, the following command:

mysqldump --default-character-set=latin1 db_name -r backup.sql

would generate a latin1 encoded dump file; whereas either of the following:

mysqldump --default-character-set=utf8 db_name -r backup.sql
mysqldump db_name -r backup.sql

would generate a utf8 encoded dump file.

Note. If a character set is not specified (as shown in the last example), mysqldump will use utf8 by default (versions lower than 4.1.2 use latin1 by default).

--set-charset

The --set-charset option adds a SET NAMES default_character_set statement to the output. This option is enabled by default.

As the SET NAMES character_set statement is equivalent to the following three statements:

SET character_set_client = character_set;
SET character_set_results = character_set;
SET character_set_connection = character_set;

the SET NAMES statement specifies the character set to be used (during the restore of the dump file) by the client to send SQL statements to the server, and the character set that the server should use for sending results back to the client (see, For data encoding).

Since this option is enabled by default, the following command:

mysqldump --default-character-set=latin1 db_name -r backup.sql

would add the following conditional comments at the beginning of the dump file:

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES latin1 */;

and the following at the end:

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

where the "SET NAMES latin1" command will instruct the mysql program to use, send and receive data encoded in latin1.

--skip-set-charset

The --skip-set-charset option is used to disable the --set-charset option, which is enabled by default.

--no-set-names (-N)

The --no-set-names is equivalent to --skip-set-charset, and so this option disables the --set-charset option: it indicates that a SET NAMES default_character_set should not be added to the output.

--complete-insert (-c)

The --complete-insert option indicates to use complete INSERT statements that include column names.

For example, the following command:

mysqldump -c db_name -r backup.sql

would create an INSERT statement that includes the names of the columns:

INSERT INTO `t1_latin` (`id`, `string`) VALUES (1,'text1'),(2,' text2');

But the following command:

mysqldump db_name -r backup.sql

would create an INSERT statement without the names of the columns:

INSERT INTO `t1_latin` VALUES (1,'text1'),(2,'text2');
--extended-insert (-e)

The --extended-insert indicates to use a multiple-row INSERT syntax that include several VALUES lists. This results in smaller dump files and faster inserts when the file is reloaded. This option is enabled by default.

Since this option is enabled by default, the following command:

mysqldump db_name -r backup.sql

would create an INSERT statement that includes all the records to be added:

INSERT INTO `t1_latin` VALUES (1,'text1'),(2,'text2');
--skip-extended-insert

The --skip-extended-insert disables the --extended-insert option, which is enabled by default.

For example, the following command:

mysqldump --skip-extended-insert db_name -r backup.sql

would create multiple INSERT statements, one for each record added:

INSERT INTO `t1_latin` (`id`, `string`) VALUES (1,'text1');
INSERT INTO `t1_latin` (`id`, `string`) VALUES (2,'text2');
--result-file (-r)

The --result-file option directs output to a given file.

Since this option prevents newline characters from being converted to carriage return/newline sequences, it should be used when working in a Windows environment:

mysqldump db_name -r backup.sql

instead of using ">" to redirect output to a file:

mysqldump db_name > backup.sql
--single-transaction

The --single-transaction option issues a BEGIN SQL statement before dumping data from the server, and it should be used with transactional tables (like InnoDB).

Since this option and the --lock-tables option are mutually exclusive, the --skip-lock-tables option should also be used prior to using the --single-transaction option:

mysqldump --skip-lock-tables --single-transaction db_name -r backup.sql
--opt

The --opt option is shorthand for the following:

Command Action
--add-drop-table Add a DROP TABLE statement before each CREATE TABLE statement.
--add-locks Surround each table dump with LOCK TABLES and UNLOCK TABLES statements.
--create-options Include all MySQL-specific table options in CREATE TABLE statements.
--disable-keys For each table, surround the INSERT statements with statements to disable and enable keys.
--extended-insert Use multiple-row INSERT syntax that include several VALUES lists.
--lock-tables Lock all tables before dumping them.
--quick Retrieve rows for a table from the server a row at a time.
--set-charset Add SET NAMES default_character_set to the output.

Since all of these options are enabled by default, the --opt option is also enabled by default.

--skip-opt

The --skip-opt disables the --opt option, which is enabled by default.

--add-drop-database

The --add-drop-database option adds a DROP DATABASE statement before each CREATE DATABASE statement.

--add-drop-table

The --add-drop-table option adds a DROP TABLE statement before each CREATE TABLE statement.

Internal structure of a dump file

A dump file usually consists of three different elements: comments, conditional comments and SQL statements.

Comments

Comments are completely ignored by the mysql client program and only serve to describe what follows. Comments have the following structure:

-- comment

For example:

-- MySQL dump 10.13  Distrib 5.5.8, for Win32 (x86)
Conditional comments

Conditional comments are used to hide SET commands or SQL statements from older versions of mysql (used to restore a dump file). Conditional comments have the following structure:

/*!five digit version number SET|SQL statement */

For example, the following line, which contains two conditional comments:

CREATE DATABASE /*!32312 IF NOT EXISTS*/ `test04` /*!40100 DEFAULT CHARACTER SET latin1 */;

would be interpreted differently by each mysql version:

mysql versions would read it as
< 3.23.12 CREATE DATABASE `test04`;
>= 3.23.12 and < 4.1.0 CREATE DATABASE IF NOT EXISTS `test04`;
>=4.1.0 (4.01.00) CREATE DATABASE IF NOT EXISTS `test04` DEFAULT CHARACTER SET latin1;

Conditional comments usually set some variables at the beginning of a block and reset them at the end.

About the variable syntax of the SET command:

  • The string @@variable_name can refer to a session value or a global value:
    • When being refered, MySQL returns the session value if it exists and the global value otherwise.
    • When being set (e.g. SET @@variable_name=value), it always refers to the session value.
  • The string @variable_name refers to a user variable.
  • When a variable doesn't have a modifier, it is a session variable.
SQL statements

The actual SQL statements, for example:

USE `db_name`;
DROP TABLE IF EXISTS `tbl_name`;

Useful tools

The tools listed here are useful when working with database dumps and character encodings.

Windows

SuperEdi

SuperEdi is a text editor that supports Unicode UTF-8, UTF-16 and many locale-specific encodings.

Download: SuperEdi.

HxD hexeditor

HxD hexeditor is a fast free hex editor that can open files of any size.

Download: HxD hexeditor.

Super Sed

Super Sed is a heavily enhanced version of sed (a stream editor used to perform basic text transformations on an input stream).

Download: Super Sed.

LibIconv

GNU LibIconv is an encoding conversion library.

Download: LibIconv complete package.

Download zip files (only the bin folder has to be extracted from both files):

Charco

Charco is a character set conversion tool used to recode character sets, much like iconv on Linux does, but in a more portable way and with a GUI for easy-of-use. Charco supports the most common character set encodings found on Windows, OS X and Linux, including a full complement of UTF-encodings. Charco also has a batch processing mode.

Download: Charco.