Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Status API

From MoodleDocs

NOTE: This is a work in progress and is not in moodle core see https://tracker.moodle.org/browse/MDL-47271

Introduction

The Status API is simple way for any plugin to declare its own set of health checks. These are aggregated by the core system and exposed in a variety of ways to make it easy to add robust monitoring of a moodle instance. A variety of core health metrics are defined which monitor: TBA

Why do we need this API in core

There have been at least 3 'local' or 'admin tool' plugins in the wild which specifically do Nagios monitoring of various parts of moodle, in particular the focus has been on cron health. The requirement for monitoring of some form is very clear. The main draw back to these approaches has been that they are 3rd party plugins which has meant a components health check code is maintained completely separately to the main code. Each component should be responsible for providing it's own definition of health and maintained in parallel. Additionally there are many monitoring services besides Nagios and a new core API can be more flexible in how it exposes the overall health status.

Declaring the metrics for a plugin

Each plugin can declare as many metrics as they want and are dynamically registered with the Status API via a callback in lib.php:

/admin/tool/task/lib.php: function tool_task_define_metrics(){

   return array(
       '\tool_task\metric\cronlastrun',
       '\tool_task\metric\scheduledtask',
   );

}

By making this a function and not a static file the db directory like other approaches in the past, each plugin can declare different metrics depending on how moodle is configured. For instance auth_ldap could declare a metric to test bind connection latency or a failure, but if the ldap auth is disabled then we don't want to know about the metric at all or show it in any reports or checks.

/auth/ldap/lib.php: function auth_ldap_define_metrics(){

   TODO add enabled test
   return array(
       '\auth_ldap\metric\bindlatency',
   );

}

Implementing a metric

Each metric is an autoloaded class which extends the \tool_status\metric_base. A simple metric needs to define the actual value, and provide some metadata around how to interpret the value and default thresholds.

Conceptually the idea of a 'warning' and 'error' threshold could be considered outside of the metric. For instance there could be a 3rd party plugin which exposes the metrics via a particular protocol (eg MetricBeat and Prometheus have been suggested in the tracker). However there is a lot of value in the metrics themselves being exposed to these values while it performs the test, in particular it can use these values in timeouts as an upper limit. If a metric can quickly return a value regardless of the thresholds then they can be ignored.


""admin/tool/task/classes/metric/cronlastrun.php""

class cronlastrun extends \tool_status\metric_base {

   /**
    * Returns the number of minutes since any cron task was run.
    */
   function get_value($warn, $error) {
       global $DB;
       if (!$lastcron = $DB->get_field_sql('SELECT MAX(lastruntime) FROM {task_scheduled}')) {
           $this->set_state(self::STATE_UNKNOWN);
           return null;
       }
       $delta = floor((time() - $lastcron) / 60);
       return $delta;
   }
   function get_thresholds() {
       return array(
           'warn' => 10,
           'error' => 60,
       );
   }

}

Metric states

TBA

Most metrics will return an integer or float ordinal value. However some metrics are boolean. In both cases there may be instances where the value cannot be determined and so it automatically treated as an error.


Metric dependencies

Because the system is aggregating metrics, and some functionality may be dependent on other systems or plugins, we need a way to declare dependencies between metrics so the system can properly report on the root cause of issues and give succinct actionable messages.

Monitoring overall status

There are several use cases for monitoring and this API aims to be flexible enough to work with any approach:

1) A simple standalone moodle with no or few dependencies and wants it's own public 'Status' page

2) A moodle with an external monitoring tool such as Nagios / Icinga, or much simpler tools like Monastic

3) A full blown 3rd party 'status' page which is user facing and hosted separately to the main app (eg statuspage.io or cachethq.io)

To accommodate these various architectures the Status API can be used via:

1) An optional simple user facing html page which can be set to public or not.

2) A very lightweight unauthenticated web API which returns 200 or 503 and a summary message

3) A cli script, which happens to be NRPE compliant but can be used with almost any command line monitoring tool

4) A moodle webservice for querying of the overall status, or full list of metrics and status details

5) Lastly for any more complex needs the internal php API can be queried from any plugin, for instance a hypothetical tool_prometheus could translate and expose the internal metric API into the Prometheus protocol.