Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Search API

From MoodleDocs
Revision as of 09:33, 10 March 2016 by David Monllaó (talk | contribs) (Created page with "{{Moodle 3.1}} == Overview == The search API allows you to index contents in a search engine and query the search engine for results. Any Moodle component (all plugin types...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Moodle 3.1


Overview

The search API allows you to index contents in a search engine and query the search engine for results. Any Moodle component (all plugin types and all core subsystems) can define search areas for their contents.

This is different from the internal search feature some components have.

Add a search area

Note that I will be writing plugintype and pluginname to make it easier to understand for 3rd party developers not working in core, but it is also applicable for core subsystems, using core as plugintype and componentname as pluginname. I will be using areaname as the area you are defining.

All the search area stuff in contained in a single \plugintype_pluginname\search\areaname class and it should be extending one of the following classes:

  • \core_search\area\base: Generic base class for a search area
  • \core_search\area\base_mod: Base class for activities search areas (Activity_modules)
  • \core_search\area\base_activity: Base class also for activities, but different than base_mod as this is intended to be used to index Moodle activities basic data like the activity name and description. For other specific activity data to be indexed use base_mod. If you have doubts and you don't know which one you should use think of forum activities and forum posts, forum activities should use base_activity, but forum posts should use base_mod.

Index data

This function should return a recordset with all the stuff in your component that has been modified since $modifiedfrom timestamp (integer).

public function get_recordset_by_timestamp($modifiedfrom = 0) {

   global $DB;
   // The idea is to include here most (if not all) of the data you will need to index (see get_document below)
   $sql = "SELECT x.* FROM {xxxxx} WHERE x.timemodified >= ? ORDER BY x.timemodified ASC";
   // Note that this is an example, you might have more params.
   return $DB->get_recordset_sql($sql, array($modifiedfrom);

}

This function receives one of the previous query results and should return a \core_search\document object with all the data to be indexed.

public function get_document($record) {

   // All wrapped in a try & catch as we should not stop the indexing process because of a legacy corrupted database.
   try {
       $context = \context_course::instance($record->contextid);
   } catch (\dml_missing_record_exception $ex) {
       debugging('Error retrieving ' . $this->areaid . ' ' . $record->id . ' document, not all required data is available: ' .
           $ex->getMessage(), DEBUG_DEVELOPER);
       return false;
   } catch (\dml_exception $ex) {
       debugging('Error retrieving ' . $this->areaid . ' ' . $record->id . ' document: ' . $ex->getMessage(), DEBUG_DEVELOPER);
       return false;
   }
   // Prepare associative array with data from DB.
   $doc = \core_search\document_factory::instance($record->id, $this->componentname, $this->areaname);
   $doc->set('title', $record->something-that-describes-the-result);
   // Content should contain just plain text, but we have text editors that allow users to set text also in HTML or Markdown,
   // if what we want to set into content or descriptionN fields comes from a text editor we will have a *fieldname*format field
   // and with both the user input and the format that was used we can convert it to plain text.
   $doc->set('content', content_to_text($record->content, $record->contentformat));
   $doc->set('contextid', $context->id);
   $doc->set('type', \core_search\manager::TYPE_TEXT);
   $doc->set('courseid', $record->courseid);
   $doc->set('userid', $record->userid);
   $doc->set('modified', $record->timemodified);
   return $doc;

}

Indexing performance

The indexing process runs by cron incrementally, keeping track of the last indexing time... In big Moodle sites this process can be very heavy as millions of records can be indexed in the same PHP process, so performance is quite important. Think that any database query that you add in your get_document function to retrieve data that is part of your document will run for every document. If you search area can potentially contain a big number of records, you might be interested in adding static caches.

Access control

Moodle's global search limits the access to the indexed data in two different ways.

Automatic context-based filtering

This is done automatically by Moodle, the user performing a query will only have access to contents in contexts where the user have access. If your search area indexes contents that belong to a course the user will only see results that belong to the courses where they have access, if your search area belongs to a course module, only to visible activities in courses where you have access and there are no completion rules preventing them to be accessible.

You need to specify to what context (or contexts) your search area contents belong to. Overwrite $levels static attribute in your class.

protected static $levels = [CONTEXT_COURSE]

Final access checking

Your search area is the responsible to filter out the results that

public function check_access($id) {

   try {
       $myobject = $this->get_xxxxx($id);
   } catch (\dml_missing_record_exception $ex) {
       // If the record does not exist anymore in Moodle we should return \core_search\manager::ACCESS_DELETED.
       return \core_search\manager::ACCESS_DELETED;
   } catch (\dml_exception $ex) {
       // Skip results if there is any unexpected error.
       return \core_search\manager::ACCESS_DENIED;
   }
   if ($myobject->visible === false) {
       return \core_search\manager::ACCESS_DENIED;
   }
   return \core_search\manager::ACCESS_GRANTED;

}

The automatic context-based filtering should get rid of most of the results where a user do not have access. If check_access's ratio of visible results vs non-visible results is high you might need to rethink about the context where your search area belongs or we might need to expand the APIs.

Other required methods

The link to the result. Should return a \moodle_url object.

public function get_doc_url(\core_search\document $doc) {

   // This is just an example, can vary a lot depending on what are you indexing.
   return new \moodle_url('link/to/your/component.php', array('id' => $doc->get('id')));

}

The link to the result context, in some cases it might be the same than the doc url. Should return a \moodle_url object.

public function get_context_url(\core_search\document $doc) {

   // This is just an example, can vary a lot depending on what are you indexing.
   return new \moodle_url('link/to/your/component/context.php', array('id' => $doc->get('id')));

}