Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

The Moodle Universal Cache (MUC)

From MoodleDocs
Revision as of 01:14, 18 June 2014 by Victor Martinez (talk | contribs) (fix some typos)
The Moodle Universal Cache (MUC)
Project state Stage 1 completed, stage 2 in progress
Tracker issue MDL-25290 and MDL-34224
Discussion
Assignee Sam Hemelryk

Moodle 2.4


Foreword

Over the past couple of weeks I have used my available time to look into a caching solution for Moodle.
The concept is not a new one and for the past years people have constantly come back to the need for a proper caching solution, particularly one that allows for a shared cache.
The tracker issue MDL-25290 has been set up to follow progress on finding a solution, and in fact Tony Levi plus others have written a community proposal and made initial headway in code Caching_system_(proposed).

I've now read through their proposal and solution, looked into the different caching systems that are available, looking into what other web solutions are doing, and researched the needs that we have in Moodle and have now planned the solution that will make up this proposal.

The existing proposal

Having read through the current proposal and having looked at the code created by Tony I can tell you right now that they are off to a very good start. In my research into this I think that we need to take it further again however if you look at the existing solution you will notice that this proposal takes a lot of what they already have and expands upon it.

Requirements

The following are the requirements that this new solution hopes to satisfy.

  • Add a shared (application) cache.
  • Add a static cache to replace static variables presently in use.
  • Suitable to replace existing cache solutions with equal or better performance. (the lang string cache will be the primary test case).
  • Integrate with and support existing cache products.
  • Some mind to support future caching products.
  • Have a relatively simple, self explanatory nature to aid developers in making the most of it.
  • The cache product integration should also be relatively simple and self explanatory to ensure now cache produce integrations are easy to write.
  • Have suitable fail over systems so that things continue to work even when the caches are unavailable.
  • Able to be initialized early on in system initialization (before config for example so that we can cache that)
  • Support a time to live on data where applicable to ensure developers can determine how long information can live in the cache before deemed out of date and needs to be generated again.

Outline

The section contains information on the key ideas and structures that will form the basis for the cache solution.

Cache types

The first thing to understand is that there are three cache types that have been defined and will be addressed by this solution. Each cache type represents a separate way in which information can be cached, and each while being operated identically will function completely differently.

These cache types are one of the few things that developers will need to understand about the cache solution. When defining every cache a cache type needs to be selected for it. That cache type will determine who the cached information is available to and for how long the cache instance will live.

For each type there will be a representative class that the developer can instantiate for a cache and work with to retrieve and set data. This is as far as the developer will usually need to be familiar with the Cache API, everything they require to work with a cache will be made available through the cache type objects.

The three cache types are details below:

Request cache

You can think of the request cache type like a static property. It is alive for the duration of the request and cleared at the end of every request. It is also important to point out that this cache is of course user specific, so information stored in the cache will only be accessible to the user who put it there in the first place.

This cache type will be used to replace static caches currently used through Moodle (think static variables, static properties, and global variables).

By default, if either not cache instance has been assigned to the request cache type, or if the data can not be stored in the defined cache instance then a static memory based cache will be used by default.

Session cache

The session cache is exactly as the name suggests similar to the session made available through PHP. It is alive for the duration of the users session and of course user specific. At the end of the users session information will be purged (by means of cron or similar).

This is probably the most limited cache type, however the idea behind it is to separate out information that would be blowing out the current session size and allow an installation to specify an instance dedicated to its storage.

Application

This is the most prominent and beneficial cache type as there is nothing like it presently within Moodle. The application cache type is a shared cache that all users can access making common data accessible to all without the need for every user to generate it for every request or generate once and store in a user specific cache.

The life time of information within the cache will be determined by a time to live that will be specified by the developer, and will default when not specified to a configuration setting.

Cache plugins

This is the second most important thing to understand is that there will be a new plugin type introduced into Moodle, the Cache plugin. The Cache plugins base directory will be $CFG->dirroot/cache and each plugin will have its own sub directory there. Initially we are planning to produce plugins for APC, Memcache, XCache, Database, Static, and File each of which will of have its own sub directory.

Each plugin will define its own cache plugin class that will be required to extend an abstract class or interface that will map out all of the required methods of the plugin.

Also be aware that there can be multiple instances of a plugin. For example you could have two instances of a cache product set up, one a limited set up on the web server, and another running on remote machines. You could then map each instance to particular cache type, or in the future perhaps even a specific cache.

This is important for a couple of reasons, first that it will allow load balancing and/or prioritization to occur, and second that it will allow us to further what we can do with the cache solution in the future (after initial integration).

The cache manager

The cache manager will tie everything together. It will be responsible for initializing cache instances when required, making generalistic information available, and publishing static methods to interact with the cache solution as a whole.
This will include publishing events for the following:

  • Available cache plugins
  • Available configured cache instances
  • Maximum data and/or cache size
  • Setting and creating the configuration file
  • Retrieving and processing the configuration file

Although the cache_manager will be tieing everything together a developer wishing to use the cache API will not in fact need to know anything about it, to keep the cache API straight forward and simple to most common functionality will be wrapped for convenience into several controlling classes, one for each cache type, that will make storing and retrieving data from a cache a very simple task.

In this way the uses of the cache manager within core code will likely be internal use by the cache API itself, and core code where the advanced parts of the API will be required such as configuring instances, or purging caches on cron or through the existing admin interface.

Logic separation

The logic for the cache API will be separated into three very clear, separate sections. These sections are API use, core logic, and plugin development. These three highlight the very important idea that when using, or working with this API the developer need only know about the seccion they are working with.

API use
When using the API the developer will not know, or need to know anything at all about cache plugins that have been installed, what instances are available for each of those plugins, or even where their information is going to be stored. The developer will work with a cache type class (more on those later) that gets initialized with the details of the cache and can then be interacted with simply. All the developer needs to know is which cache type they should be making use of. Everything else will be taken care of internally by the core logic, and actual caching and retrieving by the cache plugin (that the developer knows nothing about).
Core logic
The core logic refers to what ever is controlling the cache (the cache_manager) which will be aware of what plugins are available, and what instances exist for each plugin. It is responsible for managing those plugins, the plugin instances, and any cache events (such as purging all caches). This may sound like a very significant part of the cache solution, and that is because it is. However its very important to recognise that all of those management and general features are going to be rarely used by the developer. In fact I would very few people ever having to utilise the cache manager and this section of the API. It is more for internal, and edge case core use than anything else.
Plugin development
This is a very essential part of both the solution and the API, the development of cache plugins. In developing a plugin the developer will not need to pay heed to the cache types, or the cache_manager controller. In fact the plugin their develop will only need to publish methods for the essential interaction with the cache, get, set, delete etc. It will be provided with only the required, already parsed and ready information required for those operations, and will not need to do anything but interact with the cache. Everything else will be taken care of internally by the core logic.

Controlling interaction

As mentioned above for convenience and ease of use of the cache API access to it will be controlled by several classes, one for each cache type. All of these classes will operate in the same means (extending the same parent) and will make any allowances required for the cache type internally.

The cache type classes will be designed to be initialized for each bit of data the developer wishes to be cached. Once initialized simple methods are available on the object to get, set, check whether the cache has the information, and delete information from the cache.
As this is something more easily understood through code example take a peak at the following snippet of code that illustrates how I see interaction occurring. $courseid = required_param('courseid', PARAM_INT); $cache = new cache_application('core_course', 'modinfo', array('id' => $courseid)); if (!$modinfo = $cache->get($courseid)) {

   $course = $DB->get_record('course', array('id' => $courseid), '*', MUST_EXIST);
   $modinfo = get_fast_modinfo($course);
   $cache->set($courseid, $modinfo);

} As you can see once the object has been initialized for the cache its as simple as calling get() to retrieve the information, or set($data) to set it.

As there are three cache types purposed the following classes will be created to control interaction:

cache_request
For working with a request cache.
cache_session
For working with a session cache.
cache_application
For working with an application wide cache.

Configuration

In discussing this proposal with Eloy this was one area we highlighted as small but critical area of the cache solution. Initially I had planned to make use of the main database introducing new tables to record information on plugin instances and their mappings to cache types.
However in discussing this we decided that there was three problems with this, first it would introduce overhead as we would need to execute a couple of queries to build the required information, second the information being stored would itself be a candidate for caching, and third it would require the setup code for Moodle to be initialized past the point of loading configuration before the code would be usable preventing us for caching the configuration which is one of the candidates for caching.

Our solution to this: the configuration for the caches will be stored in the dataroot as a PHP file, similar to what we do with the current lang cache.
From there it will be included by the cache API and parsed when the API is first used in a page.

This negates the requirement for additional database tables and interaction, and will allow us to utilize the cache solution much earlier during setup, early enough to allow us to cache configuration.

The cacheable interface

There will also be a new cacheable interface, much like the rendererable interface, which can be given to a class to mark it as having special cache methods.
The interface will have two required methods, the first method will be called when the data is being parsed before being cached and will require the method to return a simple type (stdClass, or array) containing information that will be used to reinitialize the object when it is reawakened. The second method will be a static method and will be given the cached data and expected to reinitialize the object.

A couple of things about this approach:

  1. This has been done so that objects can take control of what gets cached when they get cached. This allows them to ensure nothing dynamic will be cached and that they don't clutter thing if they are large structures.
  2. The interface and specific methods will be used rather than utilizing serialization etc because each cache plugin should have its own say on how to process data right before storing it and there will be no requirement to serialize data. This simplification of data happens automatically by some cache programs so there is no way we could ever rely on the magic serialize methods.
  3. It also has the advantage of clearly marking the object as cacheable when other developers are looking at the code.

Code specific information

Basic details on my early thoughts of about the code.

First up please note from the very start I had intended in my thinking and design contruct this design purely in an object orientated fashion. There will be no global functions introduced, everything will be made available through class static methods, or through object instance methods.

Directories and files

The following unless otherwise stated make use of $CFG->dirroot. Obvious requirements such as a plugins version and lang files have not been included in this list.

/cache
The cache plugin directory, and the home of the files that are essential to the cache solution.
/cache/lib.php
Will contain all of the cache API core classes, and will be included in the list of requires presently happening within /lib/setup.php.
/cache/settings.php
All of the general settings (and inclusion of external pages) for the cache solution. It will also be responsible for including plugin settings pages if they exist.
/cache/pluginname
A directory used to house a cache plugin, pluginname will of course be the name of the plugin. It's component will be cache_pluginname.
/cache/pluginname/lib.php
Will contain the plugin class and be included upon initialisation of cache_manager (its first use).

Essential classes and public methods

cache_manager
static instance Will return an instance of the cache_manager. This instance will be statically stored within the method as only one cache_manager will ever be required under normal circumstances. However an override may be required to force a re-initialisation such as for configuration changes.
static prepare_data_for_caching Helper method to parse data removing references and interacting with cacheable objects.
static parse_cached_data When data is brought out of the cache it should be run through this.
static early_get_cache_plugins Gets all of the cache plugins that are available without using any core functions so that it can be called early before setup is finished.
add_plugin_instance For internal use to add a plugin instance to the configuration.
edit_plugin_instance For internal use to edit a plugin instance already in the configuration.
delete_plugin_instance For internal use to delete a plugin instance from the configuration.
add_instance_mapping For internal use to map a plugin instance to a cache type.
delete_instance_mapping For internal use to remove an instance => cache type mapping.
get_cache_type_default Returns a cache instance that is the default for a given type
get_cache_type_instance Returns the cache instance to use for a given type. Returns the default is none is configured or ready.


abstract cache_type
get_type Returns the type of this cache type
is_user_specific Returns true if the cache type is user specific (request or session).
get Retrieves data from the cache for the instance
set Sends data to the cache for the instance.
has Checks if data exists in the cache for the instance.
delete Deletes the data for cache for the instance.
purge Purges all information from the entrie cache (not just the data specific to an instance).
uses_simple_types_only If called a flag is set to guarantee simple types only (scalars and array or stdClass instances containing scalars). This can be called to avoid the overhead of data parsing which is not required for simple types under most circumstances.


abstract cache_plugin
static is_supported Returns true if the plugin is supported (required things installed).
static is_supported_cache_type Returns true if the plugin can be used for the given cache type.
static supports_native_ttl Returns true if the plugin nativily supports ttl. If it doesn't then ttl will have to be managed internally.
initialise Initialises the plugin instance upon its first use.
is_ready Returns true if this plugin is properly configured and ready to be used.
supports_multikey Returns true if the plugin supports multiple keys, or if it wants to perform its own key handling.
get Retrieves the data for the given key from the cache.
set Adds the given key => value pair to the cache program.
has Checks if there is data for the given key in the cache.
delete Deletes the data associated with the given key.
purge Purges all information from the cache.
close Gets called when the cache variable is destroyed (e.g. at the end of the request) and allows the plugin to perform and closing tasks and cleanup (closing connections, saving files etc).

Whats to be done

The following is what needs to be done for before this can be put up for integration into core, hopefully in time for the release of 2.3

  • Implement the Cache API
  • Implement two special cache instances "static" and "dataroot" to serve as defaults and fall over systems when nothing has been configured or data being cached is unsuitable for the selected solution.
  • Implement cache plugins for the following products
    • APC
    • Memcache
    • XCache
    • Database
    • Static
    • File
  • Conversion of the following three areas in order or priority
    1. Lang string cache. This will be be the first point of testing for the new system as the string requirements are predictable and should be significant enough to easily measure and quantify.
    2. Database meta information, particularly focused around get_table, and get_column.
    3. Config data, this will be a test of the system to ensure it can be initialized early on in overall system initialization.
  • Create an admin tool that can create/edit/delete plugin instances as well as map them to cache types.
  • Document the whole system and create tutorials in the docs wiki.

Whats to be done after the initial integration

The following are things that can and or need to be done after integration:

  • Convert existing cache and static variables/properties to make use of MUC instead [where applicable]
  • Implement support for xxx/db/caches.php files that detail where caches are used and what for (read the next to do to see why)
  • Expand the admin tool so that cache instances can be mapped to caches specified in xxx/db/caches.php files.

Proof of concept

I've created a proof of concept that implements the proposal as at the time of writing.
This can be found on my github account: https://github.com/samhemelryk/moodle/compare/ba3e7df265...poc_cache_1

While the proof of concept doesn't have the admin interfaces for configuring plugin instances it is largely functional and there a a couple of hundred PHPunit tests for it already.

Cache terminology

Cache
A cache is a collection of data that is kept on hand and made readily available in order to avoid costly fetching and molding upon every request. In this document when I refer to a cache I am referring to an area in code where such information is being dealt with.
Cache solution
The solution refers the the MUC implementation and everything it embodies, the API, all of the plugins, the milestones etc.
Cache API
The cache API consists of the cache manager, its supporting classes, and methods both public and private (Where I mention public API or outwards API I am referring to the public methods that developers will be able to make use of).
Cache product
A 3rd party product that handles the actual caching on information such as APC, or memcache.
Cache type
There are three cache types defined in this proposal, each is representative of one way of storing information.
Cache plugin
A cache plugin is code that is reponsible for bridging the cache API and a particular cache product. For instance we expect to have an APC cache plugin.
Cache instance
A cache instance is single instance of a cache plugin that has been configured to for a specific system. You can have multiple instances of a cache plugin, this is done so that you can direct different cache types to different memcache systems for instance.