File Storage Plugintype
Introduction
This page details a proposal to introduce a new plugin type for File Storage.
Purpose
Moodle currently only supports one method for storing files, which is on a local filesystem. In order to scale Moodle to multiple machines, a shared filesystem must be used in order to keep the files synchronised between all nodes in the cluster. This dependence on a shared filesystem becomes a bottleneck when scaling Moodle to large clusters.
In Moodle 2.3, the File Storage API introduced a class (file_storage) that was responsible for all reading and writing to the "file pool". Any code that wants to read or write files does so via the file_storage class. This separates code from the particular implementation of how files are stored and retrieved.
By extending the file_storage class and overriding methods, it is possible to change Moodle's file storage method to non-filesystem-based solutions (such as Amazon S3, etc). Whilst this approach currently works, a new plugin type would allow this to be done without having to patch core.
Design
- New plugin type to go into a new directory called "storage", eg. storage/filesystem/, storage/s3/.
- New abstract base class for file_storage and stored_file. The methods should match the public and protected methods currently in file_storage and stored_file.
- The current file_storage and stored_file classes should move into a plugin, storage/filesystem/ and be made to extend the base class.
- A new admin configuration setting, $CFG->storagemethod to be added which is set to one of the plugin names ('filesystem' by default).
- The get_file_storage() method in moodlelib should be modified to check $CFG->storagemethod and to return an instance of the enabled storage plugin. The other code currently in get_file_storage() which does filesystem-specific checks should be moved into the storage/filesystem/ plugin's constructor.
- New section under Site Administration -> Plugins -> Storage can be used for configuring file storage plugins.
Code layout for storage plugins
- storage/<plugin>/lib.php: should contain a class which extends the file_storage base class.
- plugins should also include a class which extends stored_file. this can go in lib.php or in a separate file - it doesn't matter since the plugin's stored_file class only needs to be included/instantiated by its file_storage class
Audit of all $CFG->dataroot usage
An audit needs to be done of all usage of $CFG->dataroot in core. No code should write to dataroot if it is required to be readable on subsequent page requests. It is OK to write to $CFG->cachedir for caching purposes (ie. the data can be cleared at any point without causing problems) or $CFG->tempdir for temp usage (temp files whos lifetime only spans a single script execution). Any usage of $CFG->dataroot for persistance should be replaced by a call to the storage API.
Effected areas identified by the audit are:
- Custom language packs / language overrides: Currently these are stored in $CFG->dataroot and need to be shared between hosts. These need to move elsewhere (eg. database or storage API).
- Database module (mod/data/) presets: These are stored in $CFG->dataroot/data/preset/ - they should be moved to a filearea/component in the storage API.
- This list is hugely incomplete...
Third party plugins
The CONTRIB plugin review process should include an additional step to check for direct usage of $CFG->dataroot. Plugin developers should be encouraged to use the storage API so that users who are not using the filesystem storage method can use their plugin without bugs.
Migration between storage methods
A CLI script for migrating between storage methods should be added into admin/cli/. There will be no need for storage plugins to implement their own migration routines as this can be implemented in the base class as follows:
- create an instance of the current file storage plugin ($fs = get_file_storage())
- also create an instance of the new plugin (eg. $newfs = new s3_file_storage())
- call the migrate() method: $fs->migrate($newfs);
- the migrate method is implemented in the base class (doesnt need to be implemented in plugins). it does:
- SQL query: select distinct contenthash from mdl_files;
- for each result:
- call $fs to retrieve contents of contenthash and store in a temp file (this will require a new method protected method to be added to the file_storage base class which needs to be implemented in each plugin). eg. $fs->copy_contenthash_to($contenthash, $temppath);
- call $newfs->add_file_to_pool($temppath);
Note: this method of migration could be very slow for large sites, so advanced users may which to implement their own strategys for migration.
Enhancements to the file_storage interface
- An option should be added to stored_file::readfile() for serving a range request. This will prevent file.php from having to call $fs->get_content_file_handle() which would force the storage plugin to download the entire file.
Note re latency
File storage plugins that use network connections to send/retrieve file contents are likely to introduce more latency into file operations than the default filesystem method. This could effect pages that do many file operations (eg. restoring a backup, extracting scorm packages). If backups are made asynchronous this be less of a problem. If SCORM packages are also a problem then they could be queued as well.