The Moodle Universal Cache (MUC)
The Moodle Universal Cache (MUC) | |
---|---|
Project state | Stage 1 completed, stage 2 in progress |
Tracker issue | MDL-25290 and MDL-34224 |
Discussion | |
Assignee | Sam Hemelryk |
Moodle 2.4
Foreword
Over the past couple of weeks I have used my available time to look into a caching solution for Moodle.
The concept is not a new one and for the past years people have constantly come back to the need for a proper caching solution, particularly one that allows for a shared cache.
The tracker issue MDL-25290 has been set up to follow progress on finding a solution, and in fact Tony Levi plus others have written a community proposal and made initial headway in code Caching_system_(proposed).
I've now read through their proposal and solution, looked into the different caching systems that are available, looking into what other web solutions are doing, and researched the needs that we have in Moodle and have now planned the solution that will make up this proposal.
The existing proposal
Having read through the current proposal and having looked at the code created by Tony I can tell you right now that they are off to a very good start. In my research into this I think that we need to take it further again however if you look at the existing solution you will notice that this proposal takes a lot of what they already have and expands upon it.
Requirements
The following are the requirements that this new solution hopes to satisfy.
- Add a shared (application) cache.
- Add a static cache to replace static variables presently in use.
- Suitable to replace existing cache solutions with equal or better performance. (the lang string cache will be the primary test case).
- Integrate with and support existing cache products.
- Some mind to support future caching products.
- Have a relatively simple, self explanatory nature to aid developers in making the most of it.
- The cache product integration should also be relatively simple and self explanatory to ensure now cache produce integrations are easy to write.
- Have suitable fail over systems so that things continue to work even when the caches are unavailable.
- Able to be initialized early on in system initialization (before config for example so that we can cache that)
- Support a time to live on data where applicable to ensure developers can determine how long information can live in the cache before deemed out of date and needs to be generated again.
Outline
The section contains information on the key ideas and structures that will form the basis for the cache solution.
Cache types
The first thing to understand is that there are three cache types that have been defined and will be addressed by this solution. Each cache type represents a separate way in which information can be cached, and each while being operated identically will function completely differently.
These cache types are one of the few things that developers will need to understand about the cache solution. When defining every cache a cache type needs to be selected for it. That cache type will determine who the cached information is available to and for how long the cache instance will live.
For each type there will be a representative class that the developer can instantiate for a cache and work with to retrieve and set data. This is as far as the developer will usually need to be familiar with the Cache API, everything they require to work with a cache will be made available through the cache type objects.
The three cache types are details below:
Request cache
You can think of the request cache type like a static property. It is alive for the duration of the request and cleared at the end of every request. It is also important to point out that this cache is of course user specific, so information stored in the cache will only be accessible to the user who put it there in the first place.
This cache type will be used to replace static caches currently used through Moodle (think static variables, static properties, and global variables).
By default, if either not cache instance has been assigned to the request cache type, or if the data can not be stored in the defined cache instance then a static memory based cache will be used by default.
Session cache
The session cache is exactly as the name suggests similar to the session made available through PHP. It is alive for the duration of the users session and of course user specific. At the end of the users session information will be purged (by means of cron or similar).
This is probably the most limited cache type, however the idea behind it is to separate out information that would be blowing out the current session size and allow an installation to specify an instance dedicated to its storage.
Application
This is the most prominent and beneficial cache type as there is nothing like it presently within Moodle. The application cache type is a shared cache that all users can access making common data accessible to all without the need for every user to generate it for every request or generate once and store in a user specific cache.
The life time of information within the cache will be determined by a time to live that will be specified by the developer, and will default when not specified to a configuration setting.
Cache plugins
This is the second most important thing to understand is that there will be a new plugin type introduced into Moodle, the Cache plugin. The Cache plugins base directory will be $CFG->dirroot/cache and each plugin will have its own sub directory there. Initially we are planning to produce plugins for APC, Memcache, XCache, Database, Static, and File each of which will of have its own sub directory.
Each plugin will define its own cache plugin class that will be required to extend an abstract class or interface that will map out all of the required methods of the plugin.
Also be aware that there can be multiple instances of a plugin. For example you could have two instances of a cache product set up, one a limited set up on the web server, and another running on remote machines. You could then map each instance to particular cache type, or in the future perhaps even a specific cache.
This is important for a couple of reasons, first that it will allow load balancing and/or prioritization to occur, and second that it will allow us to further what we can do with the cache solution in the future (after initial integration).
The cache manager
The cache manager will tie everything together. It will be responsible for initializing cache instances when required, making generalistic information available, and publishing static methods to interact with the cache solution as a whole.
This will include publishing events for the following:
- Available cache plugins
- Available configured cache instances
- Maximum data and/or cache size
- Setting and creating the configuration file
- Retrieving and processing the configuration file
Although the cache_manager will be tieing everything together a developer wishing to use the cache API will not in fact need to know anything about it, to keep the cache API straight forward and simple to most common functionality will be wrapped for convenience into several controlling classes, one for each cache type, that will make storing and retrieving data from a cache a very simple task.
In this way the uses of the cache manager within core code will likely be internal use by the cache API itself, and core code where the advanced parts of the API will be required such as configuring instances, or purging caches on cron or through the existing admin interface.
Logic separation
The logic for the cache API will be separated into three very clear, separate sections. These sections are API use, core logic, and plugin development. These three highlight the very important idea that when using, or working with this API the developer need only know about the seccion they are working with.
- API use
- When using the API the developer will not know, or need to know anything at all about cache plugins that have been installed, what instances are available for each of those plugins, or even where their information is going to be stored. The developer will work with a cache type class (more on those later) that gets initialized with the details of the cache and can then be interacted with simply. All the developer needs to know is which cache type they should be making use of. Everything else will be taken care of internally by the core logic, and actual caching and retrieving by the cache plugin (that the developer knows nothing about).
- Core logic
- The core logic refers to what ever is controlling the cache (the cache_manager) which will be aware of what plugins are available, and what instances exist for each plugin. It is responsible for managing those plugins, the plugin instances, and any cache events (such as purging all caches). This may sound like a very significant part of the cache solution, and that is because it is. However its very important to recognise that all of those management and general features are going to be rarely used by the developer. In fact I would very few people ever having to utilise the cache manager and this section of the API. It is more for internal, and edge case core use than anything else.
- Plugin development
- This is a very essential part of both the solution and the API, the development of cache plugins. In developing a plugin the developer will not need to pay heed to the cache types, or the cache_manager controller. In fact the plugin their develop will only need to publish methods for the essential interaction with the cache, get, set, delete etc. It will be provided with only the required, already parsed and ready information required for those operations, and will not need to do anything but interact with the cache. Everything else will be taken care of internally by the core logic.
Controlling interaction
As mentioned above for convenience and ease of use of the cache API access to it will be controlled by several classes, one for each cache type. All of these classes will operate in the same means (extending the same parent) and will make any allowances required for the cache type internally.
The cache type classes will be designed to be initialized for each bit of data the developer wishes to be cached. Once initialized simple methods are available on the object to get, set, check whether the cache has the information, and delete information from the cache.
As this is something more easily understood through code example take a peak at the following snippet of code that illustrates how I see interaction occurring.
$courseid = required_param('courseid', PARAM_INT);
$cache = new cache_application('core_course', 'modinfo', array('id' => $courseid));
if (!$modinfo = $cache->get($courseid)) {
$course = $DB->get_record('course', array('id' => $courseid), '*', MUST_EXIST);
$modinfo = get_fast_modinfo($course);
$cache->set($courseid, $modinfo);
}
As you can see once the object has been initialized for the cache its as simple as calling get() to retrieve the information, or set($data) to set it.
As there are three cache types purposed the following classes will be created to control interaction:
- cache_request
- For working with a request cache.
- cache_session
- For working with a session cache.
- cache_application
- For working with an application wide cache.
Configuration
In discussing this proposal with Eloy this was one area we highlighted as small but critical area of the cache solution. Initially I had planned to make use of the main database introducing new tables to record information on plugin instances and their mappings to cache types.
However in discussing this we decided that there was three problems with this, first it would introduce overhead as we would need to execute a couple of queries to build the required information, second the information being stored would itself be a candidate for caching, and third it would require the setup code for Moodle to be initialized past the point of loading configuration before the code would be usable preventing us for caching the configuration which is one of the candidates for caching.
Our solution to this: the configuration for the caches will be stored in the dataroot as a PHP file, similar to what we do with the current lang cache.
From there it will be included by the cache API and parsed when the API is first used in a page.
This negates the requirement for additional database tables and interaction, and will allow us to utilize the cache solution much earlier during setup, early enough to allow us to cache configuration.
The cacheable interface
There will also be a new cacheable interface, much like the rendererable interface, which can be given to a class to mark it as having special cache methods.
The interface will have two required methods, the first method will be called when the data is being parsed before being cached and will require the method to return a simple type (stdClass, or array) containing information that will be used to reinitialize the object when it is reawakened. The second method will be a static method and will be given the cached data and expected to reinitialize the object.
A couple of things about this approach:
- This has been done so that objects can take control of what gets cached when they get cached. This allows them to ensure nothing dynamic will be cached and that they don't clutter thing if they are large structures.
- The interface and specific methods will be used rather than utilizing serialization etc because each cache plugin should have its own say on how to process data right before storing it and there will be no requirement to serialize data. This simplification of data happens automatically by some cache programs so there is no way we could ever rely on the magic serialize methods.
- It also has the advantage of clearly marking the object as cacheable when other developers are looking at the code.
Code specific information
Basic details on my early thoughts of about the code.
First up please note from the very start I had intended in my thinking and design contruct this design purely in an object orientated fashion. There will be no global functions introduced, everything will be made available through class static methods, or through object instance methods.
Directories and files
The following unless otherwise stated make use of $CFG->dirroot. Obvious requirements such as a plugins version and lang files have not been included in this list.
- /cache
- The cache plugin directory, and the home of the files that are essential to the cache solution.
- /cache/lib.php
- Will contain all of the cache API core classes, and will be included in the list of requires presently happening within /lib/setup.php.
- /cache/settings.php
- All of the general settings (and inclusion of external pages) for the cache solution. It will also be responsible for including plugin settings pages if they exist.
- /cache/pluginname
- A directory used to house a cache plugin, pluginname will of course be the name of the plugin. It's component will be cache_pluginname.
- /cache/pluginname/lib.php
- Will contain the plugin class and be included upon initialisation of cache_manager (its first use).
Essential classes and public methods
cache_manager | ||
---|---|---|
static | instance | Will return an instance of the cache_manager. This instance will be statically stored within the method as only one cache_manager will ever be required under normal circumstances. However an override may be required to force a re-initialisation such as for configuration changes. |
static | prepare_data_for_caching | Helper method to parse data removing references and interacting with cacheable objects. |
static | parse_cached_data | When data is brought out of the cache it should be run through this. |
static | early_get_cache_plugins | Gets all of the cache plugins that are available without using any core functions so that it can be called early before setup is finished. |
add_plugin_instance | For internal use to add a plugin instance to the configuration. | |
edit_plugin_instance | For internal use to edit a plugin instance already in the configuration. | |
delete_plugin_instance | For internal use to delete a plugin instance from the configuration. | |
add_instance_mapping | For internal use to map a plugin instance to a cache type. | |
delete_instance_mapping | For internal use to remove an instance => cache type mapping. | |
get_cache_type_default | Returns a cache instance that is the default for a given type | |
get_cache_type_instance | Returns the cache instance to use for a given type. Returns the default is none is configured or ready. |
abstract | cache_type | |
---|---|---|
get_type | Returns the type of this cache type | |
is_user_specific | Returns true if the cache type is user specific (request or session). | |
get | Retrieves data from the cache for the instance | |
set | Sends data to the cache for the instance. | |
has | Checks if data exists in the cache for the instance. | |
delete | Deletes the data for cache for the instance. | |
purge | Purges all information from the entrie cache (not just the data specific to an instance). | |
uses_simple_types_only | If called a flag is set to guarantee simple types only (scalars and array or stdClass instances containing scalars). This can be called to avoid the overhead of data parsing which is not required for simple types under most circumstances. |
abstract | cache_plugin | |
---|---|---|
static | is_supported | Returns true if the plugin is supported (required things installed). |
static | is_supported_cache_type | Returns true if the plugin can be used for the given cache type. |
static | supports_native_ttl | Returns true if the plugin nativily supports ttl. If it doesn't then ttl will have to be managed internally. |
initialise | Initialises the plugin instance upon its first use. | |
is_ready | Returns true if this plugin is properly configured and ready to be used. | |
supports_multikey | Returns true if the plugin supports multiple keys, or if it wants to perform its own key handling. | |
get | Retrieves the data for the given key from the cache. | |
set | Adds the given key => value pair to the cache program. | |
has | Checks if there is data for the given key in the cache. | |
delete | Deletes the data associated with the given key. | |
purge | Purges all information from the cache. | |
close | Gets called when the cache variable is destroyed (e.g. at the end of the request) and allows the plugin to perform and closing tasks and cleanup (closing connections, saving files etc). |
Whats to be done
The following is what needs to be done for before this can be put up for integration into core, hopefully in time for the release of 2.3
- Implement the Cache API
- Implement two special cache instances "static" and "dataroot" to serve as defaults and fall over systems when nothing has been configured or data being cached is unsuitable for the selected solution.
- Implement cache plugins for the following products
- APC
- Memcache
- XCache
- Database
- Static
- File
- Conversion of the following three areas in order or priority
- Lang string cache. This will be be the first point of testing for the new system as the string requirements are predictable and should be significant enough to easily measure and quantify.
- Database meta information, particularly focused around get_table, and get_column.
- Config data, this will be a test of the system to ensure it can be initialized early on in overall system initialization.
- Create an admin tool that can create/edit/delete plugin instances as well as map them to cache types.
- Document the whole system and create tutorials in the docs wiki.
Whats to be done after the initial integration
The following are things that can and or need to be done after integration:
- Convert existing cache and static variables/properties to make use of MUC instead [where applicable]
- Implement support for xxx/db/caches.php files that detail where caches are used and what for (read the next to do to see why)
- Expand the admin tool so that cache instances can be mapped to caches specified in xxx/db/caches.php files.
Proof of concept
I've created a proof of concept that implements the proposal as at the time of writing.
This can be found on my github account: https://github.com/samhemelryk/moodle/compare/ba3e7df265...poc_cache_1
While the proof of concept doesn't have the admin interfaces for configuring plugin instances it is largely functional and there a a couple of hundred PHPunit tests for it already.
Cache terminology
- Cache
- A cache is a collection of data that is kept on hand and made readily available in order to avoid costly fetching and molding upon every request. In this document when I refer to a cache I am referring to an area in code where such information is being dealt with.
- Cache solution
- The solution refers the the MUC implementation and everything it embodies, the API, all of the plugins, the milestones etc.
- Cache API
- The cache API consists of the cache manager, its supporting classes, and methods both public and private (Where I mention public API or outwards API I am referring to the public methods that developers will be able to make use of).
- Cache product
- A 3rd party product that handles the actual caching on information such as APC, or memcache.
- Cache type
- There are three cache types defined in this proposal, each is representative of one way of storing information.
- Cache plugin
- A cache plugin is code that is reponsible for bridging the cache API and a particular cache product. For instance we expect to have an APC cache plugin.
- Cache instance
- A cache instance is single instance of a cache plugin that has been configured to for a specific system. You can have multiple instances of a cache plugin, this is done so that you can direct different cache types to different memcache systems for instance.