UTF-8 and BOM: Difference between revisions
m (added link to spanish translation of document) |
|||
(6 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
; Database Activity | |||
With the ''Database Activity'' there still seems to be a problem importing UTF-8 files with BOM (http://en.wikipedia.org/wiki/Byte-order_mark). | |||
* See this forum discussion for an [http://moodle.org/mod/forum/discuss.php?d=62251#p559428 example in Hebrew] | |||
* See also this [http://tracker.moodle.org/secure/IssueNavigator.jspa?reset=true&&query=bom&summary=true&description=true&body=true&sorter/field=updated&sorter/order=DESC tracker issues] | |||
See this forum discussion for an | |||
http://moodle.org/mod/forum/discuss.php?d=62251#p559428 | |||
See also this | |||
http://tracker.moodle.org/secure/IssueNavigator.jspa?reset=true&&query=bom&summary=true&description=true&body=true&sorter/field=updated&sorter/order=DESC | |||
--[[User:Frank Ralf|Frank Ralf]] 10:36, 13 July 2009 (UTC) | --[[User:Frank Ralf|Frank Ralf]] 10:36, 13 July 2009 (UTC) | ||
Line 15: | Line 11: | ||
* BOM stands for [http://en.wikipedia.org/wiki/Byte-order_mark "Byte Order Mark"]. | * BOM stands for [http://en.wikipedia.org/wiki/Byte-order_mark "Byte Order Mark"]. | ||
* [http://www.unicode.org/unicode/faq/utf_bom.html#BOM Byte Order Mark (BOM) FAQ] by the Unicode Consortium. | * [http://www.unicode.org/unicode/faq/utf_bom.html#BOM Byte Order Mark (BOM) FAQ] by the Unicode Consortium. | ||
* [http://www.joelonsoftware.com/articles/Unicode.html "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"] from Joel on Software | * [http://www.joelonsoftware.com/articles/Unicode.html "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"] from Joel on Software (2003) | ||
== What is it good for? == | == What is it good for? == | ||
Line 34: | Line 30: | ||
The picture shows an exported CSV file from OpenOffice Calc where the BOM (#FEFF) sneaks in even after the first delimiter! | The picture shows an exported CSV file from OpenOffice Calc where the BOM (#FEFF) sneaks in even after the first delimiter! | ||
=== | === Scanning Moodle folder for BOM files === | ||
If Moodle is installed on a linux server you can try one of these command lines: | |||
find . -type f -print0 | xargs -0r awk '/^\xEF\xBB\xBF/ {print FILENAME}{nextfile}' | |||
fgrep -rl `echo -ne '\xef\xbb\xbf'` | |||
'''find & remove!''' | |||
find . -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \; | |||
== How can I get rid of the BOM? == | == How can I get rid of the BOM? == | ||
Line 54: | Line 52: | ||
[[Category:UTF-8]] | [[Category:UTF-8]] | ||
[[es:UTF-8 y BOM]] |
Latest revision as of 18:20, 18 November 2013
- Database Activity
With the Database Activity there still seems to be a problem importing UTF-8 files with BOM (http://en.wikipedia.org/wiki/Byte-order_mark).
- See this forum discussion for an example in Hebrew
- See also this tracker issues
--Frank Ralf 10:36, 13 July 2009 (UTC)
What does BOM mean?
- BOM stands for "Byte Order Mark".
- Byte Order Mark (BOM) FAQ by the Unicode Consortium.
- "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" from Joel on Software (2003)
What is it good for?
It is used for multibyte characters to mark the order in which the bytes appear.
What's the problem with the BOM?
- See Display problems caused by the UTF-8 BOM
- Some text editors add a BOM by default, for example Windows' Notepad.
- When exporting from OpenOffice Calc the BOM sneaks in even after the first delimiter!
How can I detect a BOM?
You will need a text editor which is capable of showing special Unicode characters. A good Unicode text editor for Windows is SC UniPad.
The picture shows an exported CSV file from OpenOffice Calc where the BOM (#FEFF) sneaks in even after the first delimiter!
Scanning Moodle folder for BOM files
If Moodle is installed on a linux server you can try one of these command lines:
find . -type f -print0 | xargs -0r awk '/^\xEF\xBB\xBF/ {print FILENAME}{nextfile}' fgrep -rl `echo -ne '\xef\xbb\xbf'`
find & remove!
find . -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;
How can I get rid of the BOM?
Any of the above mentioned Unicode capable text editors will allow you to remove a BOM, some even automatically when opening or saving a file.
Some other text editors will save files without BOM, e.g. Notepad++.