Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Recommender system specification: Difference between revisions

From MoodleDocs
(Created page with "This is a proposal for a [https://en.wikipedia.org/wiki/Recommender_system recommender system] in Moodle. A recommender system ''"seeks to predict the "rating" or "preference"...")
 
m (Text replacement - "</code>" to "</syntaxhighlight>")
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This is a proposal for a [https://en.wikipedia.org/wiki/Recommender_system recommender system] in Moodle. A recommender system ''"seeks to predict the "rating" or "preference" a user would give to an item."'', in other words, it tries to identify items that would be interesting for a user.
This is a proposal for a [https://en.wikipedia.org/wiki/Recommender_system recommender system] in Moodle. A recommender system ''"seeks to predict the "rating" or "preference" a user would give to an item."'', in other words, it tries to identify items that would be interesting for a user.


This code snippet below is an example using the proposed public API. This generates two recommended activities of type ''page'' for the user with id 111 in the course with id 222 based on the values in an hypothetical ''user_activity_rates'' table. The recommender system should only consider users whose ''city'' is ''Barcelona''.
== Proposal info ==
<code php>
This proposal is based on some assumptions:
# Recommender systems will be limited to specific contexts in most cases (e.g. a course, an activity).
## In cases where they will not be limited to a specific context (e.g. recommend a course) the number of items will not reach millions of records. The reason is that a two-dimensional array with scalar values is loaded in PHP memory.
# Recommendations can be generated on-demand, that is: we don't need to train the recommender systems in CLI tasks in the background before being able to use them. We can do it because of #1 above.
# The training data changes too often to spend resources on a complex caching system.
## Every time we have a new user in a course or a new activity (using the example recommender system described below) the dimensions of the training data change and the recommender system needs to be re-trained.
## Every time there is a new user rating (using the example recommender system below) the training data should be refreshed. 
# We want the filtering to be applied before generating the training data as it modifies the dimensions of the dataset which is critical for the recommender system.


// The 'contexts' filter is used to restrict the recommender system to a specific set of contexts. It can be used to restrict a recommender
== Classes diagram ==
This is an overview of the classes involved.
 
[[File:Recommender_system_class_diagram.png]]
 
== API specs ==
 
=== Public API ===
This code snippet below is an example using the proposed public API. This generates two recommended activities of type '''page''' for the user with id '''111''' in the course with id '''222''' based on the values in an hypothetical '''user_activity_rates''' table. The recommender system should only consider users whose '''city''' is '''Barcelona'''.
<syntaxhighlight lang="php">// The 'contexts' filter is used to restrict the recommender system to a specific set of contexts. It can be used to restrict a recommender
// system to the activities of a single course or to restrict a recommender system to the entries of a single glossary activity.
// system to the activities of a single course or to restrict a recommender system to the entries of a single glossary activity.
// The filters in 'dimensions' are applied to each of the dimensions used by the recommender system.
// The filters in 'dimensions' are applied to each of the dimensions used by the recommender system.
Line 20: Line 36:
$recommender = \core_analytics\recommender($dataset);
$recommender = \core_analytics\recommender($dataset);
$recommendations = $recommender->recommend(111, 2);
$recommendations = $recommender->recommend(111, 2);
</code>
</syntaxhighlight>
 
This is an overview of the classes involved.


=== Training dataset ===
The training data for a recommender system is usually a grid of values in a two-dimensional matrix, where one of the axis usually represent the user. The classes extending the base '''recommender_dataset''' class are responsible of instantiating their dimensions and to fill the two-dimensional matrix.
The training data for a recommender system is usually a grid of values in a two-dimensional matrix, where one of the axis usually represent the user. The classes extending the base '''recommender_dataset''' class are responsible of instantiating their dimensions and to fill the two-dimensional matrix.


Line 29: Line 44:


Example of a '''recommender_dataset''' class.
Example of a '''recommender_dataset''' class.
<code php>
<syntaxhighlight lang="php">namespace \core_course\analytics\recommender\dataset;
 
namespace \core_course\analytics\recommender\dataset;
class activities implements \core_analytics\recommender_dataset {
class activities implements \core_analytics\recommender_dataset {


Line 52: Line 65:
     }
     }
}
}
</code>
</syntaxhighlight>


Classes like '''user''' or '''activity''' that extend the base class '''recommender_dimension''' represent each of dimensions in the two-dimensional matrix. They basically return the list of records used by the implementation of the '''recommender_dataset''' class. They are separated from the '''recommender_dataset'' class for re-usability in different recommender systems.
=== Dimensions ===
Classes like '''user''' or '''activity''' (shown below) that extend the base class '''recommender_dimension''' represent each of dimensions in the two-dimensional matrix. They basically return the list of records used by the implementation of the '''recommender_dataset''' class. They are separated from the '''recommender_dataset''' class for re-usability in different recommender systems.


''We can remove this '''recommender_dataset - recommender_dimension''' separation if we don't find enough use cases that justify the separation.''
''We can remove this '''recommender_dataset recommender_dimension''' separation if we don't find enough use cases that justify the separation.''


These implementations serve as example of '''recommender_dimension''' classes.
These implementations serve as example of '''recommender_dimension''' classes.
<code php>
<syntaxhighlight lang="php">
namespace \core_course\analytics\recommender;
namespace \core_course\analytics\recommender\dimension;
class activity extends \core_analytics\recommender_dimension {
class activity extends \core_analytics\recommender_dimension {
    private $acceptedfilters = ['modulename', 'coursecategory'];


     public function get_items(array $filters) {
     public function get_items(array $filters) {
        // This is not real code, it is just to get the idea.
         // The context filtering would not make sense applied to context module if what we want is a list of activities.
         // The context filtering would not make sense applied to context module if
        // what we want is a list of activities.
         return $DB->get_recordset_sql("SELECT cm.*, c.* FROM {course_modules} cm
         return $DB->get_recordset_sql("SELECT cm.*, c.* FROM {course_modules} cm
                                         JOIN {course} c on cm.course = c.id
                                         JOIN {course} c on cm.course = c.id
                                         JOIN {context} ctx ON ctx.contextlevel = CONTEXT_COURSE AND ctx.instanceid = c.id
                                         JOIN {context} ctx ON ctx.contextlevel = CONTEXT_COURSE AND ctx.instanceid = c.id
                                         WHERE ctx.id IN $contexts");
                                         WHERE ctx.id IN $contexts AND modulename = $filters['modulename']");
     }
     }
}
}


namespace \core_user\analytics\recommender;
namespace \core_user\analytics\recommender\dimension;
class user extends \core_analytics\recommender_dimension {
class user extends \core_analytics\recommender_dimension {


     public function get_items(array $contexts) {
     public function get_items(array $filters) {
       // $contexts is ignored as users depend on the system context.
       // $contexts is ignored as users depend on the system context.
         return $DB->get_recordset_sql("SELECT * FROM {user}");
         return $DB->get_recordset_sql("SELECT * FROM {user}");
     }
     }
}
}
</code>
</syntaxhighlight>
 
=== Recommender system ===
 
The recommender class is the key element of the whole system and can be shared across all recommender systems built using this API. Extra methods to evaluate the accuracy of the recommender system should be added.


This is the recommender class skeleton.  
''Recommender systems can also be used for things like predicting student grades. We need to rename or add some methods and parameters if we want this sort of usages to feel natural. For example, a '''predict($yid, $xid)''' method would be more appropriate for predicting student grades based on previous grades.''
<code php>
 
This is the recommender class skeleton.
 
<syntaxhighlight lang="php">
namespace \core_analytics;
namespace \core_analytics;
class recommender {
class recommender {
Line 93: Line 114:
     }
     }


     public function recommend($yitem, $nrecommendations = 1) {
     public function recommend($yid, $nrecommendations = 1) {
         // Collaborative filtering or any other alternative.
        $trainingdata = $dataset->get_training_data();
 
         // Collaborative filtering or any other alternative. This is just an example.
        $model = $this->get_embeddings($trainingdata);
 
        $y = $trainingdata[$yid];
        return $model->recommend($y, $nrecommendations);
     }
     }
}
}
</code>
</syntaxhighlight>

Latest revision as of 20:20, 14 July 2021

This is a proposal for a recommender system in Moodle. A recommender system "seeks to predict the "rating" or "preference" a user would give to an item.", in other words, it tries to identify items that would be interesting for a user.

Proposal info

This proposal is based on some assumptions:

  1. Recommender systems will be limited to specific contexts in most cases (e.g. a course, an activity).
    1. In cases where they will not be limited to a specific context (e.g. recommend a course) the number of items will not reach millions of records. The reason is that a two-dimensional array with scalar values is loaded in PHP memory.
  2. Recommendations can be generated on-demand, that is: we don't need to train the recommender systems in CLI tasks in the background before being able to use them. We can do it because of #1 above.
  3. The training data changes too often to spend resources on a complex caching system.
    1. Every time we have a new user in a course or a new activity (using the example recommender system described below) the dimensions of the training data change and the recommender system needs to be re-trained.
    2. Every time there is a new user rating (using the example recommender system below) the training data should be refreshed.
  4. We want the filtering to be applied before generating the training data as it modifies the dimensions of the dataset which is critical for the recommender system.

Classes diagram

This is an overview of the classes involved.

Recommender system class diagram.png

API specs

Public API

This code snippet below is an example using the proposed public API. This generates two recommended activities of type page for the user with id 111 in the course with id 222 based on the values in an hypothetical user_activity_rates table. The recommender system should only consider users whose city is Barcelona.

// The 'contexts' filter is used to restrict the recommender system to a specific set of contexts. It can be used to restrict a recommender
// system to the activities of a single course or to restrict a recommender system to the entries of a single glossary activity.
// The filters in 'dimensions' are applied to each of the dimensions used by the recommender system.
$coursecontext = \context_course::instance(222);
$filters = [
    'contexts' => [$coursecontext->id],
    'dimensions' => [
        'user' => ['city' => 'Barcelona'],
        'activity' => ['modulename' => 'page']
    ]
];

$dataset = new \core_course\analytics\recommender\dataset\activities($filters);

$recommender = \core_analytics\recommender($dataset);
$recommendations = $recommender->recommend(111, 2);

Training dataset

The training data for a recommender system is usually a grid of values in a two-dimensional matrix, where one of the axis usually represent the user. The classes extending the base recommender_dataset class are responsible of instantiating their dimensions and to fill the two-dimensional matrix.

We could replace x and y for items and users if we can not find use cases that do not directly involve users.

Example of a recommender_dataset class.

namespace \core_course\analytics\recommender\dataset;
class activities implements \core_analytics\recommender_dataset {

    public function __construct(array $filters) {
        $this->filters = $filters;

        $this->x = new \core_course\analytics\recommender\dimension\activity();
        $this->y = new \core_user\analytics\recommender\dimension\user();
    }

    public function get_training_data() {
        $courseids = $this->get_course_ids_from_context_filter();
        $activityrates = $DB->get_records("SELECT * FROM {user_activity_rates} where courseid IN $courseids");

        $xitems = $this->x->get_items($this->filters);
        $yitems = $this->y->get_items($this->filters);
        // Iterate through both $xitems and $yitems filling $trainingdata two-dimensional array with $activityrates values.

        return $trainingdata;
    }
}

Dimensions

Classes like user or activity (shown below) that extend the base class recommender_dimension represent each of dimensions in the two-dimensional matrix. They basically return the list of records used by the implementation of the recommender_dataset class. They are separated from the recommender_dataset class for re-usability in different recommender systems.

We can remove this recommender_dataset recommender_dimension separation if we don't find enough use cases that justify the separation.

These implementations serve as example of recommender_dimension classes.

namespace \core_course\analytics\recommender\dimension;
class activity extends \core_analytics\recommender_dimension {

    private $acceptedfilters = ['modulename', 'coursecategory'];

    public function get_items(array $filters) {
        // The context filtering would not make sense applied to context module if what we want is a list of activities.
        return $DB->get_recordset_sql("SELECT cm.*, c.* FROM {course_modules} cm
                                         JOIN {course} c on cm.course = c.id
                                         JOIN {context} ctx ON ctx.contextlevel = CONTEXT_COURSE AND ctx.instanceid = c.id
                                        WHERE ctx.id IN $contexts AND modulename = $filters['modulename']");
    }
}

namespace \core_user\analytics\recommender\dimension;
class user extends \core_analytics\recommender_dimension {

    public function get_items(array $filters) {
       // $contexts is ignored as users depend on the system context.
        return $DB->get_recordset_sql("SELECT * FROM {user}");
    }
}

Recommender system

The recommender class is the key element of the whole system and can be shared across all recommender systems built using this API. Extra methods to evaluate the accuracy of the recommender system should be added.

Recommender systems can also be used for things like predicting student grades. We need to rename or add some methods and parameters if we want this sort of usages to feel natural. For example, a predict($yid, $xid) method would be more appropriate for predicting student grades based on previous grades.

This is the recommender class skeleton.

namespace \core_analytics;
class recommender {

    public function __construct(\core_analytics\recommender_dataset $dataset) {
        $this->dataset = $dataset;
    }

    public function recommend($yid, $nrecommendations = 1) {
        $trainingdata = $dataset->get_training_data();

        // Collaborative filtering or any other alternative. This is just an example.
        $model = $this->get_embeddings($trainingdata);

        $y = $trainingdata[$yid];
        return $model->recommend($y, $nrecommendations);
    }
}