Server clustering improvements proposal
Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.
This is a base for discussion about potential server clustering improvements in Moodle 2.6
config.php settings
Each node in cluster may use local set of php files including config.php, these may be synchronised via git for example, rsync, etc.
$CFG->wwwroot
It must be the same on all nodes, it must be the public facing URL. It cannot be dynamic.
$CFG->sslproxy = true
Enable if you have https:// wwwroot but the SSL is not done by Apache.
$CFG->reverseproxy = true
Enable if your nodes are accessed via different URL. Please note that it is not compatible with $CFG->loginhttps.
$CFG->dirroot
It is strongly recommended that $CFG->dirroot (which is automatically set via realpath(config.php)) contains the same path on all nodes. It does not need to point to the same shared directory though. The reason is that some some low level code may use the dirroot value for cache invalidation.
The simplest solution is to have the same directory structure on each cluster node and synchronise these during each upgrade.
The dirroot should be always read only for apache process because otherwise built in add-on installation and plugin uninstallation would get the nodes out of sync.
$CFG->dataroot
This MUST be a shared directory where each cluster node is accessing the files directly. It must be very reliable, data must not be manipulated directly there.
Locking support is not required, if any code tries to use file locks in dataroot outside of cachedir or tempdir it is a bug.
$CFG->tempdir
It is recommended to use separate ram disks on each node. Scripts may use this directory during one request only. The contents of this directory may be deleted if there is no pending HTTP request, otherwise delete only files with older timestamps.
Always purge this directory when starting cluster node.
If any script tries to use files that were not created during current request it is a bug that needs to be fixed.
$CFG->cachedir
This MUST be a shared directory. The existing caching code is not designed to deal with local node cache dirs, the code is not going to be changed to hack around this restriction.
Why does it have to be shared? Because the developers who wrote the current caching code in all stable 2.x branches expected that there is only one cachedir and no changes in that directory are lost when processing another http request on different node.
Shared filesystems are usually slow, ideally it should be possible to use MUC caches instead of $CFG->cachedir. I is also possible to create new MUC backends that are clustering aware.
You can safely purge cachedir when restarting the whole cluster.
File locking is for now required.
$CFG->localcachedir proposal
Local node caches (not shared) require revision numbers (such as $CFG->themerev, $CFG->jsrev). We found out that the the revision number should to be time-based and incrementing, it prevents fatal cache invalidation problems on restored sites.
The revision numbers must the same on all nodes, the simplest solution is to store it in database and bump it up after any change in the cached data.
Component cache
Class core_component is using a $CFG->cachedir/core_component.php cache that contains a complete list of all plugins and all classes present in $CFG->dirroot. The implementation must be as fast as possible and the results must be extremely reliable.
The cache is automatically invalidated on admin/index.php page and during installation and every upgrade. It is also cleared during purge_all_caches(), but that is only a side effect of storing it in cachedir and it is not required.
core_component class and cache cannot depend on database, MUC or any core libraries - that is the reason why there cannot be any revision flags, there is nowhere to store them, the sha1() of that file itself is the revision.
See MDL-40475 for proposed workaround that allows you to use an alternative component cache file.
Typically the $CFG->alternative_component_cache = '/local/cache/dir/core_component.php' would point to local node cache directory. Before upgrade the administrator would have to manually execute following:
$ php admin/cli/alternative_component_cache.php --rebuild
MUC and clustering
The requirement is to make MUC stores aware of revision numbers, if we do that we can store the data in multiple backends without getting the caches stale. Another benefit is that we would not have to purge all existing data which would help on shared servers.
Default MUC file stores could decide to use either $CFG->cachedir and $CFG->localcachedir depending on availability of the revision number. Current workaround is to include the revision number in cache key, but that does not solve the problem with $CFG->cachedir that must be shared in all nodes.
Another problem is that MUC cache configuration is stored in shared directory which makes tweaking of individual node cache configurations problematic. It would be probably better to allow alternative MUC cache file location in config.php
Theme caching
Javascript caching
Language installation and customisations
Op code caches
Standard opcache extension is strongly recommended for Moodle 2.6 forward, it is the only solution officially supported by PHP developers.
Potential problems:
- file time stamps must be verified in Moodle 2.5 and bellow
See MDL-40415 for more details.
Browser sessions
File pool
At present there is a filedir in dataroot where we store all contexts of files in Moodle.
Theoretically the code could be abstracted to allow storage of file contents elsewhere, multiple different servers could be sharing the same file pool.
Installation procedure
- Install Moodle first without any clustering.
- ...
TODO: describe possible setups for distribution of HTTP requests, failovers, etc.