Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

UTF-8 scripts

From MoodleDocs

UTF-8 migration > Recoding PHP scripts

This page is under construction!!

Only some preliminary ideas have been defined.

Recoding PHP scripts

Build one "check for 1.6 upgrade" utility under 1.5! (it should check for software present and lang packs used).

datalib.php to support collations under MySQL

textlib.class to handle all those utf-compliant functions

XML import/export (scorm, ims, backup/restore, glossary, quizzes...)

excel export

htmlentities() ---> s() migration 1.5 and 1.6

uses of substr, strlen, strpos, regexp (both posix and perl), htmlspecialchars and htmlentities.

potentially filters...

modify documentation to let users know how they MUST create their DB before installing Moodle.

Modify creation scripts to use the UTF-8 encoding? Perhaps not necessary if DB has it defined as default? Test it.

1) The wiki module - this is due to the use of htmlentities() without specifying the character set. DFWiki does not have any known problems apparently. Solved! 1.1) Pressing the edit button leads to a blank page, perhaps because double byte characters do nto work in the URL (see the source of the below) http://www.contiento.com/moodle_utf8/mod/wiki/view.php?id=12&page=edit/このWikiは文字化けしますね。

2) Japanese can't be used in paths/foldernames. This is not a Moodle problem but simply the limitations of the server I presume.

3) I have not testsed GD and Japanese fonts but I guess that this may have problems as well.

4) I have found what I think is a small bug. When I backed up and restored the course (choosing to add data to this course, and only restoring the quiz) I found that the quiz discription had added the remains of some html tags on at the beginning and end.

5) Also the News feed of the Asahi Newspaper (asahi.com - Japan's most famous newspaper, perhaps) is garbling. The reason is the use of break_up_long_words(). The current fix http://moodle.org/bugs/bug.php?op=show&bugid=2105&pos=1 will not work since all languages are now utf8 languages. Over at the Japanese forums there are two suggestions http://moodle.org/mod/forum/discuss.php?d=6831&parent=32225 One is to carefully calculate the position of inserted space so that it is inserted between multibyte characters. The final suggestion from Prof Nakayama is at http://moodle.org/file.php/14/moddata/forum/164/32822/break_up_long_words.txt However, this relies on the use of mbstring php extensions. Multibyte languages (using Kanji at least, but perhaps not Korean) can be wrapped onto the next line at any point, so if mbstring extensions are used, then, as prof Kariya suggests it is usually safe simply NOT to add any spaces into the string at all, so he recommends if(extension_loaded('mbstring')) return $string; SOLVED!

6) The language list drop down menu does not display properly. I am not sure if it should, and I don't really care (since I normally limit my site to a few languages) but it looks like this http://ds21.cc.yamaguchi-u.ac.jp/~econo/temp/languagemenu.jpg

7) The Assingment module says that 0 Words have been submitted after a Japanese language submission, because there are no spaces in Japanese.

De-UTF8-ing for client side applications

One of the biggest problems which remains after the move to UTF-8 is that while UTF-8 is great on the web a lot of client side software in Japan and China is *NOT* UTF-8 compatible.

The Japanese community's solution to this (provided originally by Mr. Kashiwagi, below) is: 1) At the inferface between Moodle and client side software (particularly email clients and spread sheet programs) Moodle checks to see if there is a /lib folder in the current language. 2) If a lang/xyx/lib folder is present, then the routines in that folder are used to convert the encoding to formats compatible with client side software. ( E.g. Even outlook expresses is not compatible with UTF8 in the subject line, and Excel can not deal with UTF8 either, so the lang/lib/ files contains code to covert the UTF8 to a client readable format. ) Contact with client side software occurs at the following points. 2.1) Email sent to Email clients 2.2) Grade Files export to Excel 2.3) Quizes and lessons imported from text editors

3) If the lang/xyz/lib folder is not present, then UTF8 encoding is used to talk to the client software as normal.

Patches

Patches to allow Japanese Language Moodles that function without garbling have been prepared and are available at at the following sites.

These patches are explained here http://moodle.org/mod/forum/discuss.php?d=13558

Mr. Kashiwagi's "Supertak" Patch (described in the thread above) http://www.supertak.com/down/sample.htm

Prof Kita's patches and rpms (based in part on Mr. Kashiwagi's) http://t-kita.net/rpm/FC/moodle/
A read me file in English describing the rpm http://t-kita.net/rpm/moodle/README-rpm-en.txt A patch describing all the things that need to be done http://t-kita.net/rpm/FC/moodle/patches/moodle-t-kita.patch Many, or even most end-users (including myself = Tim) are not sufficiently confident making extensive patches, so our Moodles have been garbling in important areas (email/excel).

Status

1) Solved - tests in progress.

5) Has been solved by the use of a new multi-byte character compatible break_up_long_words().