Recommender system specification

Jump to: navigation, search

This is a proposal for a recommender system in Moodle. A recommender system "seeks to predict the "rating" or "preference" a user would give to an item.", in other words, it tries to identify items that would be interesting for a user.

Proposal info

This proposal is based on some assumptions:

  1. Recommender systems will be limited to specific contexts in most cases (e.g. a course, an activity).
    1. In cases where they will not be limited to a specific context (e.g. recommend a course) the number of items will not reach millions of records. The reason is that a two-dimensional array with scalar values is loaded in PHP memory.
  2. Recommendations can be generated on-demand, that is: we don't need to train the recommender systems in CLI tasks in the background before being able to use them. We can do it because of #1 above.
  3. The training data changes too often to spend resources on a complex caching system.
    1. Every time we have a new user in a course or a new activity (using the example recommender system described below) the dimensions of the training data change and the recommender system needs to be re-trained.
    2. Every time there is a new user rating (using the example recommender system below) the training data should be refreshed.
  4. We want the filtering to be applied before generating the training data as it modifies the dimensions of the dataset which is critical for the recommender system.

Classes diagram

This is an overview of the classes involved.

Recommender system class diagram.png

API specs

Public API

This code snippet below is an example using the proposed public API. This generates two recommended activities of type page for the user with id 111 in the course with id 222 based on the values in an hypothetical user_activity_rates table. The recommender system should only consider users whose city is Barcelona.

// The 'contexts' filter is used to restrict the recommender system to a specific set of contexts. It can be used to restrict a recommender
// system to the activities of a single course or to restrict a recommender system to the entries of a single glossary activity.
// The filters in 'dimensions' are applied to each of the dimensions used by the recommender system.
$coursecontext = \context_course::instance(222);
$filters = [
    'contexts' => [$coursecontext->id],
    'dimensions' => [
        'user' => ['city' => 'Barcelona'],
        'activity' => ['modulename' => 'page']
    ]
];
 
$dataset = new \core_course\analytics\recommender\dataset\activities($filters);
 
$recommender = \core_analytics\recommender($dataset);
$recommendations = $recommender->recommend(111, 2);

Training dataset

The training data for a recommender system is usually a grid of values in a two-dimensional matrix, where one of the axis usually represent the user. The classes extending the base recommender_dataset class are responsible of instantiating their dimensions and to fill the two-dimensional matrix.

We could replace x and y for items and users if we can not find use cases that do not directly involve users.

Example of a recommender_dataset class.

namespace \core_course\analytics\recommender\dataset;
class activities implements \core_analytics\recommender_dataset {
 
    public function __construct(array $filters) {
        $this->filters = $filters;
 
        $this->x = new \core_course\analytics\recommender\dimension\activity();
        $this->y = new \core_user\analytics\recommender\dimension\user();
    }
 
    public function get_training_data() {
        $courseids = $this->get_course_ids_from_context_filter();
        $activityrates = $DB->get_records("SELECT * FROM {user_activity_rates} where courseid IN $courseids");
 
        $xitems = $this->x->get_items($this->filters);
        $yitems = $this->y->get_items($this->filters);
        // Iterate through both $xitems and $yitems filling $trainingdata two-dimensional array with $activityrates values.
 
        return $trainingdata;
    }
}

Dimensions

Classes like user or activity (shown below) that extend the base class recommender_dimension represent each of dimensions in the two-dimensional matrix. They basically return the list of records used by the implementation of the recommender_dataset class. They are separated from the recommender_dataset class for re-usability in different recommender systems.

We can remove this recommender_dataset recommender_dimension separation if we don't find enough use cases that justify the separation.

These implementations serve as example of recommender_dimension classes.

namespace \core_course\analytics\recommender\dimension;
class activity extends \core_analytics\recommender_dimension {
 
    private $acceptedfilters = ['modulename', 'coursecategory'];
 
    public function get_items(array $filters) {
        // The context filtering would not make sense applied to context module if what we want is a list of activities.
        return $DB->get_recordset_sql("SELECT cm.*, c.* FROM {course_modules} cm
                                         JOIN {course} c on cm.course = c.id
                                         JOIN {context} ctx ON ctx.contextlevel = CONTEXT_COURSE AND ctx.instanceid = c.id
                                        WHERE ctx.id IN $contexts AND modulename = $filters['modulename']");
    }
}
 
namespace \core_user\analytics\recommender\dimension;
class user extends \core_analytics\recommender_dimension {
 
    public function get_items(array $filters) {
       // $contexts is ignored as users depend on the system context.
        return $DB->get_recordset_sql("SELECT * FROM {user}");
    }
}

Recommender system

The recommender class is the key element of the whole system and can be shared across all recommender systems built using this API. Extra methods to evaluate the accuracy of the recommender system should be added.

Recommender systems can also be used for things like predicting student grades. We need to rename or add some methods and parameters if we want this sort of usages to feel natural. For example, a predict($yid, $xid) method would be more appropriate for predicting student grades based on previous grades.

This is the recommender class skeleton.

namespace \core_analytics;
class recommender {
 
    public function __construct(\core_analytics\recommender_dataset $dataset) {
        $this->dataset = $dataset;
    }
 
    public function recommend($yid, $nrecommendations = 1) {
        $trainingdata = $dataset->get_training_data();
 
        // Collaborative filtering or any other alternative. This is just an example.
        $model = $this->get_embeddings($trainingdata);
 
        $y = $trainingdata[$yid];
        return $model->recommend($y, $nrecommendations);
    }
}