Machine learning backends: Difference between revisions

Latest revision as of 05:34, 31 May 2022

Important:

This content of this page has been updated and migrated to the new Moodle Developer Resources. The information contained on the page should no longer be seen up-to-date.

Why not view this page on the new site and help us to migrate more content to the new site!

Introduction

Machine learning backends process the datasets generated from the indicators and targets calculated by the Analytics API. They are used for machine learning training, prediction and models evaluation. May be good that you also read Analytics API to read some concept definitions, how these concepts are implemented in Moodle and how machine learning backend plugins fit into the analytics API.

The communication between machine learning backends and Moodle is through files because the code that will process the dataset can be written in PHP, in Python, in other languages or even use cloud services. This needs to be scalable so they are expected to be able to manage big files and train algorithms reading input files in batches if necessary.

Machine learning backend is a new Moodle plugin type. They are stored in lib/mlbackend, where you can add your own plugins.

Backends included in Moodle core

The PHP backend is the default predictions processor as it is written in PHP and does not have any external dependencies. It is using logistic regression.

The Python backend requires python binary (either python 2 or python 3) and moodlemlbackend python package which is maintained by Moodle HQ. Python version and its libraries versions used are very important. We recommend to use 3.7 for mlbackend 3.x version. It is based on Google's tensorflow library and it is using a feed-forward neural network with 1 single hidden layer. moodlemlbackend package does store model performance information that can be visualised using tensorboard. Information generated during models evaluation is available through the models management page, under each model Actions > Log menu. moodlemlbackend source code is available in https://github.com/moodlehq/moodle-mlbackend-python.

Python backend is recommended over the PHP as it is able to predict more accurately than the PHP backend and it is faster.

Interfaces

A summary of these interfaces purpose:

Evaluate a provided prediction model
Train machine learning algorithms with the existing site data
Predict targets based on previously trained algorithms

Predictor

This is the basic interface to be implemented by machine learning backends. Two main types are, classifiers and regressors. We provide the Regressor interface but it is not currently implemented by core Machine learning backends. Both of these are supervised algorithms. Each type includes methods to train, predict and evaluate datasets.

You can use is_ready to check that the backend is available.

   /**
    * Is it ready to predict?
    *
    * @return bool
    */
   public function is_ready();

clear_model and delete_output_dir purpose is to clean up stuff created by the machine learning backend.

   /**
    * Delete all stored information of the current model id.
    *
    * This method is called when there are important changes to a model,
    * all previous training algorithms using that version of the model
    * should be deleted.
    *
    * @param string $uniqueid The site model unique id string
    * @param string $modelversionoutputdir The output dir of this model version
    * @return null
    */
   public function clear_model($uniqueid, $modelversionoutputdir);

   /**
    * Delete the output directory.
    *
    * This method is called when a model is completely deleted.
    *
    * @param string $modeloutputdir The model directory id (parent of all model versions subdirectories).
    * @param string $uniqueid The site model unique id string
    * @return null
    */
   public function delete_output_dir($modeloutputdir, $uniqueid);

Classifier

A classifier sorts input into two or more categories, based on analysis of the indicators. This is frequently used in binary predictions, e.g. course completion vs. dropout. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support classification. It extends the Predictor interface.

Both these methods and Predictor methods should be implemented.

   /**
    * Train this processor classification model using the provided supervised learning dataset.
    *
    * @param string $uniqueid
    * @param \stored_file $dataset
    * @param string $outputdir
    * @return \stdClass
    */
   public function train_classification($uniqueid, \stored_file $dataset, $outputdir);

   /**
    * Classifies the provided dataset samples.
    *
    * @param string $uniqueid
    * @param \stored_file $dataset
    * @param string $outputdir
    * @return \stdClass
    */
   public function classify($uniqueid, \stored_file $dataset, $outputdir);

   /**
    * Evaluates this processor classification model using the provided supervised learning dataset.
    *
    * @param string $uniqueid
    * @param float $maxdeviation
    * @param int $niterations
    * @param \stored_file $dataset
    * @param string $outputdir
    * @param  string $trainedmodeldir
    * @return \stdClass
    */
   public function evaluate_classification($uniqueid, $maxdeviation, $niterations, \stored_file $dataset, $outputdir);

Regressor

A regressor predicts the value of an outcome (or dependent) variable based on analysis of the indicators. This value is linear, such as a final grade in a course or the likelihood a student is to pass a course. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support regression. It extends Predictor interface.