Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Machine learning backends: Difference between revisions

From MoodleDocs
m (Protected "Machine learning backends": Developer Docs Migration ([Edit=Allow only administrators] (indefinite)))
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{Template:Migrated|newDocId=/docs/apis/plugintypes/mlbackend/}}
== Introduction ==
== Introduction ==
Machine learning backends process the datasets generated from the indicators and targets calculated by the Analytics API. They are used for machine learning training, prediction and models evaluation. May be good that you also read [https://docs.moodle.org/dev/Analytics_API Analytics API] to read some concept definitions, how these concepts are implemented in Moodle and how machine learning backend plugins fit into the analytics API.
Machine learning backends process the datasets generated from the indicators and targets calculated by the Analytics API. They are used for machine learning training, prediction and models evaluation. May be good that you also read [https://docs.moodle.org/dev/Analytics_API Analytics API] to read some concept definitions, how these concepts are implemented in Moodle and how machine learning backend plugins fit into the analytics API.


Line 7: Line 6:


Machine learning backend is a new Moodle plugin type. They are stored in lib/mlbackend, where you can add your own plugins.
Machine learning backend is a new Moodle plugin type. They are stored in lib/mlbackend, where you can add your own plugins.
== Backends included in Moodle core ==
== Backends included in Moodle core ==
The '''PHP backend''' is the default predictions processor as it is written in PHP and does not have any external dependencies. It is using logistic regression.


The '''PHP backend''' is the default predictions processor as it is written in PHP and do not have any external dependencies. It is using logistic regression.
The '''Python backend''' requires ''python'' binary (either python 2 or python 3) and [https://pypi.python.org/pypi?name=moodlemlbackend&version=0.0.5&:action=display moodlemlbackend python package] which is maintained by Moodle HQ. Python version and its libraries versions used are '''very important'''. We recommend to use 3.7 for mlbackend 3.x version. It is based on [https://www.tensorflow.org/ Google's tensorflow library] and it is using a feed-forward neural network with 1 single hidden layer. ''moodlemlbackend'' package does store model performance information that can be visualised using [https://www.tensorflow.org/get_started/summaries_and_tensorboard tensorboard]. Information generated during models evaluation is available through the models management page, under each model ''Actions > Log'' menu. ''moodlemlbackend'' source code is available in https://github.com/moodlehq/moodle-mlbackend-python.
 
The '''Python backend''' requires ''python'' binary (either python 2 or python 3) and [https://pypi.python.org/pypi?name=moodlemlbackend&version=0.0.5&:action=display moodlemlbackend python package] which is maintained by Moodle HQ. It is based on [https://www.tensorflow.org/ Google's tensorflow library] and it is using a feed-forward neural network with 1 single hidden layer. ''moodlemlbackend'' package does store model performance information that can be visualised using [https://www.tensorflow.org/get_started/summaries_and_tensorboard tensorboard]. Information generated during models evaluation is available through the models management page, under each model ''Actions > Log'' menu. ''moodlemlbackend'' source code is available in https://github.com/moodlehq/moodle-mlbackend-python.


'''Python backend is recommended over the PHP''' as it is able to predict more accurately than the PHP backend and it is faster.
'''Python backend is recommended over the PHP''' as it is able to predict more accurately than the PHP backend and it is faster.
== Interfaces ==
== Interfaces ==
A summary of these interfaces purpose:
A summary of these interfaces purpose:
* Evaluate a provided prediction model
* Evaluate a provided prediction model
* Train machine learning algorithms with the existing site data
* Train machine learning algorithms with the existing site data
* Predict targets based on previously trained algorithms
* Predict targets based on previously trained algorithms
==== Predictor ====
==== Predictor ====
This is the basic interface to be implemented by machine learning backends. Two main types are, ''classifiers'' and ''regressors''. We provide the ''Regressor'' interface but it is not currently implemented by core Machine learning backends. Both of these are supervised algorithms. Each type includes methods to train, predict and evaluate datasets.
This is the basic interface to be implemented by machine learning backends. Two main types are, ''classifiers'' and ''regressors''. We provide the ''Regressor'' interface but it is not currently implemented by core Machine learning backends. Both of these are supervised algorithms. Each type includes methods to train, predict and evaluate datasets.


You can use '''is_ready''' to check that the backend is available.
You can use '''is_ready''' to check that the backend is available.
     /**
     /**
     * Is it ready to predict?
     * Is it ready to predict?
Line 35: Line 27:
     */
     */
     public function is_ready();
     public function is_ready();
'''clear_model''' and '''delete_output_dir''' purpose is to clean up stuff created by the machine learning backend.
'''clear_model''' and '''delete_output_dir''' purpose is to clean up stuff created by the machine learning backend.
     /**
     /**
     * Delete all stored information of the current model id.
     * Delete all stored information of the current model id.
Line 62: Line 52:
     */
     */
     public function delete_output_dir($modeloutputdir, $uniqueid);
     public function delete_output_dir($modeloutputdir, $uniqueid);
===== Classifier =====
===== Classifier =====
A [https://en.wikipedia.org/wiki/Statistical_classification classifier] sorts input into two or more categories, based on analysis of the indicators. This is frequently used in binary predictions, e.g. course completion vs. dropout. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support classification. It extends the ''Predictor'' interface.
A [https://en.wikipedia.org/wiki/Statistical_classification classifier] sorts input into two or more categories, based on analysis of the indicators. This is frequently used in binary predictions, e.g. course completion vs. dropout. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support classification. It extends the ''Predictor'' interface.


Both these methods and ''Predictor'' methods should be implemented.
Both these methods and ''Predictor'' methods should be implemented.
     /**
     /**
     * Train this processor classification model using the provided supervised learning dataset.
     * Train this processor classification model using the provided supervised learning dataset.
Line 103: Line 90:
     */
     */
     public function evaluate_classification($uniqueid, $maxdeviation, $niterations, \stored_file $dataset, $outputdir);
     public function evaluate_classification($uniqueid, $maxdeviation, $niterations, \stored_file $dataset, $outputdir);
===== Regressor =====
===== Regressor =====
A [https://en.wikipedia.org/wiki/Regression_analysis regressor] predicts the value of an outcome (or dependent) variable based on analysis of the indicators. This value is linear, such as a final grade in a course or the likelihood a student is to pass a course. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support regression. It extends ''Predictor'' interface.
A [https://en.wikipedia.org/wiki/Regression_analysis regressor] predicts the value of an outcome (or dependent) variable based on analysis of the indicators. This value is linear, such as a final grade in a course or the likelihood a student is to pass a course. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support regression. It extends ''Predictor'' interface.


Both these methods and ''Predictor'' methods should be implemented.
Both these methods and ''Predictor'' methods should be implemented.
     /**
     /**
     * Train this processor regression model using the provided supervised learning dataset.
     * Train this processor regression model using the provided supervised learning dataset.

Latest revision as of 05:34, 31 May 2022

Important:

This content of this page has been updated and migrated to the new Moodle Developer Resources. The information contained on the page should no longer be seen up-to-date.

Why not view this page on the new site and help us to migrate more content to the new site!

Introduction

Machine learning backends process the datasets generated from the indicators and targets calculated by the Analytics API. They are used for machine learning training, prediction and models evaluation. May be good that you also read Analytics API to read some concept definitions, how these concepts are implemented in Moodle and how machine learning backend plugins fit into the analytics API.

The communication between machine learning backends and Moodle is through files because the code that will process the dataset can be written in PHP, in Python, in other languages or even use cloud services. This needs to be scalable so they are expected to be able to manage big files and train algorithms reading input files in batches if necessary.

Machine learning backend is a new Moodle plugin type. They are stored in lib/mlbackend, where you can add your own plugins.

Backends included in Moodle core

The PHP backend is the default predictions processor as it is written in PHP and does not have any external dependencies. It is using logistic regression.

The Python backend requires python binary (either python 2 or python 3) and moodlemlbackend python package which is maintained by Moodle HQ. Python version and its libraries versions used are very important. We recommend to use 3.7 for mlbackend 3.x version. It is based on Google's tensorflow library and it is using a feed-forward neural network with 1 single hidden layer. moodlemlbackend package does store model performance information that can be visualised using tensorboard. Information generated during models evaluation is available through the models management page, under each model Actions > Log menu. moodlemlbackend source code is available in https://github.com/moodlehq/moodle-mlbackend-python.

Python backend is recommended over the PHP as it is able to predict more accurately than the PHP backend and it is faster.

Interfaces

A summary of these interfaces purpose:

  • Evaluate a provided prediction model
  • Train machine learning algorithms with the existing site data
  • Predict targets based on previously trained algorithms

Predictor

This is the basic interface to be implemented by machine learning backends. Two main types are, classifiers and regressors. We provide the Regressor interface but it is not currently implemented by core Machine learning backends. Both of these are supervised algorithms. Each type includes methods to train, predict and evaluate datasets.

You can use is_ready to check that the backend is available.

   /**
    * Is it ready to predict?
    *
    * @return bool
    */
   public function is_ready();

clear_model and delete_output_dir purpose is to clean up stuff created by the machine learning backend.

   /**
    * Delete all stored information of the current model id.
    *
    * This method is called when there are important changes to a model,
    * all previous training algorithms using that version of the model
    * should be deleted.
    *
    * @param string $uniqueid The site model unique id string
    * @param string $modelversionoutputdir The output dir of this model version
    * @return null
    */
   public function clear_model($uniqueid, $modelversionoutputdir);


   /**
    * Delete the output directory.
    *
    * This method is called when a model is completely deleted.
    *
    * @param string $modeloutputdir The model directory id (parent of all model versions subdirectories).
    * @param string $uniqueid The site model unique id string
    * @return null
    */
   public function delete_output_dir($modeloutputdir, $uniqueid);
Classifier

A classifier sorts input into two or more categories, based on analysis of the indicators. This is frequently used in binary predictions, e.g. course completion vs. dropout. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support classification. It extends the Predictor interface.

Both these methods and Predictor methods should be implemented.

   /**
    * Train this processor classification model using the provided supervised learning dataset.
    *
    * @param string $uniqueid
    * @param \stored_file $dataset
    * @param string $outputdir
    * @return \stdClass
    */
   public function train_classification($uniqueid, \stored_file $dataset, $outputdir);


   /**
    * Classifies the provided dataset samples.
    *
    * @param string $uniqueid
    * @param \stored_file $dataset
    * @param string $outputdir
    * @return \stdClass
    */
   public function classify($uniqueid, \stored_file $dataset, $outputdir);


   /**
    * Evaluates this processor classification model using the provided supervised learning dataset.
    *
    * @param string $uniqueid
    * @param float $maxdeviation
    * @param int $niterations
    * @param \stored_file $dataset
    * @param string $outputdir
    * @param  string $trainedmodeldir
    * @return \stdClass
    */
   public function evaluate_classification($uniqueid, $maxdeviation, $niterations, \stored_file $dataset, $outputdir);
Regressor

A regressor predicts the value of an outcome (or dependent) variable based on analysis of the indicators. This value is linear, such as a final grade in a course or the likelihood a student is to pass a course. This machine learning algorithm is "supervised": It requires a training data set of elements whose classification is known (e.g. courses in the past with a clear definition of whether the student has dropped out or not). This is an interface to be implemented by machine learning backends that support regression. It extends Predictor interface.

Both these methods and Predictor methods should be implemented.

   /**
    * Train this processor regression model using the provided supervised learning dataset.
    *
    * @param string $uniqueid
    * @param \stored_file $dataset
    * @param string $outputdir
    * @return \stdClass
    */
   public function train_regression($uniqueid, \stored_file $dataset, $outputdir);


   /**
    * Estimates linear values for the provided dataset samples.
    *
    * @param string $uniqueid
    * @param \stored_file $dataset
    * @param mixed $outputdir
    * @return void
    */
   public function estimate($uniqueid, \stored_file $dataset, $outputdir);


   /**
    * Evaluates this processor regression model using the provided supervised learning dataset.
    *
    * @param string $uniqueid
    * @param float $maxdeviation
    * @param int $niterations
    * @param \stored_file $dataset
    * @param string $outputdir
    * @param  string $trainedmodeldir
    * @return \stdClass
    */
   public function evaluate_regression($uniqueid, $maxdeviation, $niterations, \stored_file $dataset, $outputdir);