Difference between revisions of "Languages subsystem improvements 2.0"

Jump to: navigation, search
m
m (Research: added mediawiki way)
Line 51: Line 51:
 
* [http://framework.zend.com/manual/en/zend.translate.html Zend_Translate reference guide]
 
* [http://framework.zend.com/manual/en/zend.translate.html Zend_Translate reference guide]
 
* MDL-12433 - Sam Marshal's proposal
 
* MDL-12433 - Sam Marshal's proposal
 +
* MediaWiki approach: [http://www.mediawiki.org/wiki/Manual:$wgGrammarForms Grammar forms] and plurals: <code>{{plural:1|is|are}} {{plural:2|is|are}}</code> (Example of how mediawiki outputs the correct given pluralization form depending on the count. Plural transformations are used for languages like Russian based on "count mod 10").
  
 
== Proposals ==
 
== Proposals ==

Revision as of 14:39, 21 November 2009

Languages subsystem improvements
Project state Research and planning
Tracker issue n/a
Discussion n/a
Assignee David Mudrak

Moodle 2.0


This is an initial proposal of changes to the language strings processing in Moodle.

Current issues

String files are not branched 
We must keep all strings from all branches in place for backwards compatibility and we are unable to easily clean up language packs. Some say the branching and merging is too big toast for our translators.
Plural forms, gender forms and other grammar 
We are unable to handle plurals at all. For example, handling plural forms in gettext is traditional, well tested and robust way. MDL-12433 by Sam Marshal shows alternative approach based on logical expressions.
Strings can't be modified 
It is difficult to notify translators that some string was modified (expanded, fixed, changed). The current work around it the policy of adding another string with the same suffixed name (like 'license2'). Would be nice if such strings were tagged/highlighted in the translation UI.
We do not use standard formats 
Translators can't use specialized tools for translation (PO/gettext editors, community translation portals). Also, I am not aware of any benchmarking showing the performance differences between out native $string[] format compared to, for example, standard .po format.
More syntax checks are required 
So the translators do not brake Moodle functionality (see MDL-12433)

Goals

  1. Do not reinvent the wheel. Keep "do one thing and do it well" principle. Keep it simple and stupid.
  2. Make simple things easy and hard things possible
  3. ...

Key design questions

What is the data structure for storing the master copies of the lang packs that translators work on 
At the moment it is plain PHP array, editable via translation UI or directly. Petr proposes a change to keeping these strings in database, sort of syncable with some central repo. Whatever the format is, we must be able to store some metadata - the timestamp of the last modification, the author name, proposed alternatives, comments etc (see rosetta translation tool at launchpad for the example of possible metadata)
What is the UI for translators, what are the processes of contributing and how the translations are redistributed to Moodle sites 
Out translators should not be forced to use the only one possible tool. We should consider switching to a standardized common format (like PO or XLIFF) that is supported by a variety of advanced tools (equipped with translation memory, connected with dictionaries, i18n portals etc).
What is the data structure Moodle uses at runtime 
This is just a performance optimization (implementation detail), should be independent on the native format that humans work with so it could be modified anytime in the future. For example, see the system proposed by Tim based on calling class methods (inspired by Perl's Maketext).
What is the format of a lang string, and how are placeholders substituted 
This is the most important issue we have at the moment but as it is strongly tied together with the runtime format, it can be changed any time. On the other hand, both the UI and storage format must support it.

Use cases

  1. Developers add new strings to the core
  2. Translators translate untranslated core strings and publish their work
  3. Admins want to locally modify the language pack
  4. Contributors add new string to the contributed code
  5. Translators translate untranslated contrib strings and publish their work
  6. ...

Research

This is the list of projects, resources and tools being explored

  • Great CPAN article about software localization. Plain string based lexicon is not enough. Strings can be translated by functions only. "A phrase is a function; a phrasebook is a bunch of functions."
  • XLIFF - XML Localization Interchange File Format
  • Virtaal - promising, we could have XLIFF <-> .php conversion
  • Launchpad - translation portal used by Ubuntu and many other projects. Would require BSD licensing, therefore IMO not suitable as we could not import our current GPL'ed translation. Seems to be pretty slow during the process.
  • Plural forms in gettext
  • Zend_Translate reference guide
  • MDL-12433 - Sam Marshal's proposal
  • MediaWiki approach: Grammar forms and plurals:
    {{plural:1|is|are}} {{plural:2|is|are}}
    (Example of how mediawiki outputs the correct given pluralization form depending on the count. Plural transformations are used for languages like Russian based on "count mod 10").

Proposals

See also