Languages:Tim's crazy proposal based on maketext

Revision as of 13:01, 10 November 2013 by Petr Škoda (škoďák) (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.

They say that the border between madness and genius is is very narrow. Here goes.

The best article I know about the problems of localising software is I particularly like the narrative in the first half. Also, I must warn you that I have not read much about localisation, so my endorsement may not mean much.

The article mentioned is also available from the authors' homepage: "Localizing Your Perl Programs" --Frank Ralf 00:34, 21 July 2011 (WST)

OK, so the key point it makes is that really, a language string like "There have been $a quiz attempts" is really a function (in the mathematical sense of a mapping, not necessarily as a programming language construct). Depending on $a, we want it to output

  • There have been no quiz attempts
  • There has been one quiz attempt
  • There have been 42 quiz attempts

So the question is, why not make it a function in the programming language construct sense as well.

Two representations

Up to Moodle 1.9, the files Moodle used at runtime were exactly the same as the files that translators edited. That was convenient, but limited us to a human-readable and editable format. Also, it meant that Moodle had to do a lot of searching at runtime.

In Moodle 2.0 we are already proposing to split the representations, which lets us optimise the runtime format to be pretty much whatever we like, without making the format edited by translators impossible.

Proposed runtime lang file syntax

In moodledata/lang/ there are subfolders like en/ (note we lose the legacy _utf8) that contains files like mod_quiz.php or core_moodle.php, that is, using the new component naming convention.

Suppose we have a hypothetical plugin admin/report/dylan with its current lang file admin/report/dylan/lang/en_utf8/report_dylan.php that contains:

$string['howmanyroads'] = 'How many roads must a man walk down?';
$string['roadsx'] = 'Roads: $a';
$string['xroadsfromytowns'] = '$a->numroads roads from at least $a->numcities different cities.';
// Can't really handle pluralisation in that last one in Moodle :-(

In my new proposal, moodledata/lang/en/report_dylan.php will contain:

class strings_en_report_capability extends strings_base {
    protected static $helper = lang_helper_en::get(); // Get singleton instance.
    public function howmanyroads($a) { return 'How many roads must a man walk down?'; }
    public function roadsx($a) { return self::$helper->quant($a, 'road'); }
    public function xroadsfromytowns($a) { 
        return self::$helper->quant($a->numroads, 'road') . ' from at least ' .
                self::$helper->quant($a-> numcities, 'different city', 'different cities') . '.';

(I manually line-wrapped that last example to make it readable. in reality, remember that this file is being automatically compiled from some more human-readiable source.)

Also note that moodledata/lang/fr/report_dylan.php will look like:

include_once($CFG->langdir . '/en/report_dylan.php');
class strings_fr_report_capability extends strings_en_report_capability {
    protected static $helper = lang_helper_fr::get(); // Get singleton instance.
    public function howmanyroads() { return 'Combien de rue ...'; }
    // etc.

And moodledata/lang/fr_ca/report_dylan.php is:

include_once($CFG->langdir . '/fr/report_dylan.php');
class strings_fr_ca_report_capability extends strings_en_report_capability {

Using that runtime format

Then, string_manager() becomes:

class string_manager {
    protected $stringclasses = array();
    public function get_string($identifier, $component = '', $a = null) {
        $component = $this->fix_legacy_component_names($component);
        $this->get_string_class(current_language(), $component)->$identifier($a);
    protected function get_string_class($lang, $component) {
        global $CFG;
        if (!isset($this->stringclasses[$lang][$component])) {
            $this->stringclasses[$lang][$component] = 
                    $this->load_string_class($lang, $component)();
        return $this->stringclasses[$lang][$component];
    protected function load_string_class($lang, $component) {
        $file = "$CFG->langdir/$lang/$component.php";
        $class = "strings_{$lang}_{$component}";
        if ($CFG->langediting && !$this->is_up_to_date($file)) {
            compile_lang_strings($lang, $component);
        if (!is_readable($file)) {
            return new strings_base(); // See below.
        if (!class_exists()) {
            throw new coding_exception($file . ' did not define the ' . $class .
                    'class. There must be a bug in compile_lang_strings.');
        return new $class();
    // A few other methods omitted.

Note that if you have a developer/translator flag on ($CFG->langediting) then is_up_to_date checks various file timestamps, so that lang files can automatically be recompiled as needed for those people, without hurting runtime performance for production sites.

As a final bit of magic, we have

class strings_base {
    public function __call($name, $arguments) {
        return "[[$name]]";

Remember that the strings_en_report_capability inherited form this. This gives us our classing missing string fallback. Also:

class lang_helper_en {
    private static $inst = null;
    public static get() {
        if (!$inst) {
            $inst = new lang_helper_en();
        return $inst;
    public function quant($number, $singular, $plural = '') {
        if ($number = 1) {
            return "$number $singular";
        } else if ($plural) {
            return "$number $plural";
        } else {
            return "$number {$singular}s";
    // Other helper functions.


I think this gives us the fastest possible runtime performance (particularly when combined with a PHP accelerator. Of course, it leaves open the following problems:

  • what string format do translators to edit? (I suggest we copy the maketext format.)
  • can we write the compile_lang_strings function?

See also