Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Search API

From MoodleDocs

Moodle 3.1


Overview

The search API allows you to index contents in a search engine and query the search engine for results. Any Moodle component (all plugin types and all core subsystems) can define search areas for their contents.

This is different from the internal search feature some components have.

Add a search area

Note that I will be writing plugintype and pluginname to make it easier to understand for 3rd party developers not working in core, but it is also applicable for core subsystems, using core as plugintype and componentname as pluginname. I will be using areaname as the area you are defining.

Easy case: Activity information

If you just want to index your activity basic information, like the name and the description, you can skip all the documentation below. You can do it by extending \core_search\base_activity (\core_search\area\base_activity in Moodle 3.1). Copy & paste mod/book/classes/search/activity.php and replace 'book' by your 'activityname'. If you use different fields than name and intro (the defacto standard) or you want to index extra fields or files look at mod/page/classes/search/activity.php (extra fields) or mod/assign/classes/search/activity.php (files).

To index other information your activity tables contain (e.g. glossary or journal entries, assignment submissions...) you need to extend \core_search\base_mod instead. Continue reading below please.

Base class

All the search area stuff in contained in a single \plugintype_pluginname\search\areaname class and it should be extending one of the following classes:

  • \core_search\base (\core_search\area\base in Moodle 3.1): Generic base class for a search area
  • \core_search\base_mod (\core_search\area\base_mod in Moodle 3.1): Base class for activities search areas (Activity_modules)
  • \core_search\base_activity (\core_search\area\base_activity in Moodle 3.1): Base class also for activities, but more specific than base_mod as it is intended to be used to index Moodle activities basic data like the activity name and description. For other specific activity data to be indexed use base_mod. If you have doubts and you don't know which one you should use think of forum activities and forum posts, forum activities should use base_activity, but forum posts should use base_mod.
  • \core_search\base_block (Available in Moodle 3.4+): Base class for block search areas (Blocks). Only blocks located on course pages and the site home page are supported for indexing. This is based on the 'pagetypepattern' of the block_instance. Supported page types are as follows:
    • Blocks on the site home page (pagetypepattern = 'site-index')
    • Blocks on a course home page (pagetypepattern = 'course-view-*')
    • Blocks on any course page (pagetypepattern = 'course-*')
    • Blocks on any page (pagetypepattern = '*'), where they are located at a course context
Note: Indexing is not supported for blocks configured for course-module-level display.
\core_search\base_block includes a default implementation for get_document_recordset which is suitable for blocks storing their data in the block_instances table. Blocks of this nature only need implement the get_document function to return the document. Blocks storing their data in other tables will need to override the get_document_recordset function. See blocks/html/classes/search/content.php for an example of a 'standard' block indexing implementation.

Name

You will need to set a visible name for the search area, for plugins this should be defined in the plugin's language strings file and for core subsystems in lang/en/search.php. The format to set this search area name that will be visible for Moodle users (on search page filters section or on admin search pages) is search:AREANAME, so for example for forum posts we have $string['search:posts'] = 'Forum - posts' in mod/forum/lang/en/forum.php

Index data

get_document_recordset

This function should return a recordset with all the stuff in your component that has been modified on or after $modifiedfrom timestamp (integer), which is also within the specified context $context. (If the context is null or the system context, it means to include all content system-wide. The function must support these, as well as all other possible contexts.)

You can optionally return null if you know there are no changes without having to do a database query. For example, if you are implementing a function for the glossary module, and the supplied context relates to an HTML block, then you already know it cannot contain any of your content.

The recordset must return the modified content in order by modified time (oldest first), and this must be the same time set in the document's modified field (see below). If these values are not consistent or the data is not in order then indexing will not always work correctly (for example it may never finish or it may miss updates).

Assuming the table containing the data you are indexing is called xxxxx, the basic principle is to do a query similar to:

SELECT x.*
  FROM {xxxxx} x
 WHERE x.timemodified >= ?
ORDER BY x.timemodified ASC

However you will also need to incorporate code to restrict the results based on the context within Moodle.

Here is a full example from the Glossary module (you can find this code in mod/glossary/classes/search/entry.php):

public function get_document_recordset($modifiedfrom = 0, \context $context = null) {
    global $DB;

    list ($contextjoin, $contextparams) = $this->get_context_restriction_sql($context, 'glossary', 'g');
    if ($contextjoin === null) {
        return null;
    }

    $sql = "SELECT ge.*, g.course FROM {glossary_entries} ge
              JOIN {glossary} g ON g.id = ge.glossaryid
      $contextjoin
             WHERE ge.timemodified >= ? ORDER BY ge.timemodified ASC";
    return $DB->get_recordset_sql($sql, array_merge($contextparams, [$modifiedfrom]));
}

Note the helper function get_context_restriction_sql is available for content within modules (there is a slightly different one for blocks). If you have an unusual case where the data being indexed is not within a module or block, you may need to write the context restriction SQL yourself.

get_recordset_by_timestamp (Moodle 3.3 and below)

Moodle 3.3 and below use a different function to do the same job. The function does the same job, but doesn't support the context parameter. If you want your code to support Moodle 3.3 as well as Moodle 3.4+, add the following additional function:

public function get_recordset_by_timestamp($modifiedfrom = 0) {
    return $this->get_document_recordset($modifiedfrom);
}

Note: Existing code that only has a get_recordset_by_timestamp function will still work in Moodle 3.4+ just as it did in earlier versions, but certain features added in Moodle 3.4+ (most importantly, indexing content from restored courses or activities) will not work properly.

get_document

This function receives one of the previous query results ($record) and an array of options. It should return a \core_search\document object with all the data to be indexed.

The options this function can receive are:

  • indexfiles: Whether to index files or not. File indexing support also depends on the backend search engine, not all of them support file indexing, no need to set if the document is new if there is no filesindexing support
  • lastindexedtime: Also related with files, see example below
public function get_document($record, $options = array) {

    // All wrapped in a try & catch as we should not stop the indexing process because of a legacy corrupted database.
    try {
        $context = \context_course::instance($record->contextid);
    } catch (\dml_missing_record_exception $ex) {
        debugging('Error retrieving ' . $this->areaid . ' ' . $record->id . ' document, not all required data is available: ' .
            $ex->getMessage(), DEBUG_DEVELOPER);
        return false;
    } catch (\dml_exception $ex) {
        debugging('Error retrieving ' . $this->areaid . ' ' . $record->id . ' document: ' . $ex->getMessage(), DEBUG_DEVELOPER);
        return false;
    }

    // Prepare associative array with data from DB.
    $doc = \core_search\document_factory::instance($record->id, $this->componentname, $this->areaname);

    // Any content should be converted to plain text.

    // If you just have a text string you need to call content_to_text() with the $contentformat param set to false.
    $doc->set('title', content_to_text($record->something-that-describes-the-result));

    // For a property named 'content' where another property 'contentformat' is also present the text should be
    // passed through to content_to_text() when declaring the document. This will ensure that HTML, Markdown, ...
    // formats are converted to plain text. Similar to what we do with format_text.
    $doc->set('content', content_to_text($record->content, $record->contentformat));

    $doc->set('contextid', $context->id);
    $doc->set('type', \core_search\manager::TYPE_TEXT);
    $doc->set('courseid', $record->courseid);
    $doc->set('modified', $record->timemodified);

    // Optional fields.

    // The user that created the record. It is optional. In some cases like forum posts makes sense to have it available, but in some other cases like activities it does not help much.
    $doc->set('userid', $record->userid);

    // In case the indexed document should only be accessed by the user that created it replace NO_OWNER_ID constant by the owner user userid.
    $doc->set('owneruserid', \core_search\manager::NO_OWNER_ID);

    // Extra contents associated to the document.
    $doc->set('description1', content_to_text($record->extracontent1, $record->extracontent1format));
    $doc->set('description2', content_to_text($record->extracontent2, $record->extracontent2format));

    // Not compulsory, but speeds up things when the search area includes files (see [[#Indexing files]])
    if (isset($options['lastindexedtime']) && ($options['lastindexedtime'] < $record->created)) {
        // If the document was created after the last index time, it must be new.
        $doc->set_is_new(true);
    }
    return $doc;
}

As noted above, the modified field here must be consistent with the ordering and >= condition in the get_document_recordset function.

Indexing files

First declare that the search area is interested in indexing the files attached to its documents.

public function uses_file_indexing() {
    return true;
}

Define attach_files function, which receives a \core_search\document object and fills it with stored_file objects.

This is a simplified example of one of the cases we can find in core.

public function attach_files($document) {
    $fs = get_file_storage();

    $context = \context_module::instance($document->get('contextid'));

    $files = $fs->get_area_files($context->id, 'COMPONENTNAME', 'FILEAREA', $document->itemid);
    foreach ($files as $file) {
        $document->add_stored_file($file);
    }
}

Indexing performance

The indexing process runs by cron incrementally, keeping track of the last indexing time... In big Moodle sites this process can be very heavy as millions of records can be indexed in the same PHP process, so performance is quite important. Think that any database query that you add in your get_document function to retrieve data that is part of your document will run for every document. If you search area can potentially contain a big number of records, you might be interested in adding static caches.

Access control

Moodle's global search limits the access to the indexed data in two different ways.

Automatic context-based filtering

This is done automatically by Moodle, the user performing a query will only have access to contents in contexts where the user have access. If your search area indexes contents that belong to a course the user will only see results that belong to the courses where they have access, if your search area belongs to a course module, only to visible activities in courses where you have access and there are no completion rules preventing them to be accessible. Similarly, if your search area belong to a user, only the current user and site admins will have access to your search area contents. Note that this last option is closely related to doc's 'owneruserid' field, the main difference is that setting 'owneruserid' to the user id will make the search area documents unavailable to admin users.

You need to specify to what context (or contexts) your search area contents belong to. Overwrite $levels static attribute in your class.

protected static $levels = [CONTEXT_COURSE]

Final access checking

Your search area is the responsible to filter out the results that

public function check_access($id) {
    try {
        $myobject = $this->get_xxxxx($id);
    } catch (\dml_missing_record_exception $ex) {
        // If the record does not exist anymore in Moodle we should return \core_search\manager::ACCESS_DELETED.
        return \core_search\manager::ACCESS_DELETED;
    } catch (\dml_exception $ex) {
        // Skip results if there is any unexpected error.
        return \core_search\manager::ACCESS_DENIED;
    }

    if ($myobject->visible === false) {
        return \core_search\manager::ACCESS_DENIED;
    }

    return \core_search\manager::ACCESS_GRANTED;
}

The automatic context-based filtering should get rid of most of the results where a user do not have access. If check_access's ratio of visible results vs non-visible results is high you might need to rethink about the context where your search area belongs or we might need to expand the APIs.

Other required methods

The link to the result. Should return a \moodle_url object.

public function get_doc_url(\core_search\document $doc) {
    // This is just an example, can vary a lot depending on what are you indexing.
    return new \moodle_url('link/to/your/component.php', array('id' => $doc->get('id')));
}

The link to the result context, in some cases it might be the same than the doc url. Should return a \moodle_url object.

public function get_context_url(\core_search\document $doc) {
    // This is just an example, can vary a lot depending on what are you indexing.
    return new \moodle_url('link/to/your/component/context.php', array('id' => $doc->get('id')));
}

Carry out a search

Usually you do not need to carry out a search within custom code, because users can search using the standard Moodle global search user interface. However, if you want to build a custom search interface, you might need to call search manually.

Carrying out a search involves three steps:

  • Get the search manager object.
  • Set up an object containing options about the query.
  • Do the search.

Simple example

This example code will search the system for 'frogs' and store the results in an array called $results.

require_login();
$search = \core_search\manager::instance();
$data = (object)['q' => 'frogs'];
$results = $search->search($data);

Note: You don't need to call require_login immediately before searching, but it must be called somewhere before searching. The search system requires that there is a logged-in user in order to work. It uses this user to decide which search results they are allowed to see.

Query options

The query object has the following useful options. Except for q, all of the others are optional.

q
The actual query text
title
If specified, will only show results where the title matches this value
areaids
Array of search area id strings (e.g. sdafaasf). If specified, only results from these areas will be returned.
courseids
Array of course ids. If specified, results will be restricted to these courses.
contextids
Array of context ids. If specified, results will be restricted to these exact contexts. (Note: The system does not automatically include child contexts. For example, if you only include a course context id here, it will not return results from activities within that course.)
groupids
Array of group ids. If specified, only results with these group ids will be returned. (Results with no group will not be returned.) Not all search engines support groups - check by calling $search->get_engine()->supports_group_filtering().
userids
Array of user ids. If specified, only results with these user ids will be returned. Not all search engines support user search - check by calling $search->get_engine()->supports_users().

Display

See /search/index.php for an example of how to display the results. Specifically:

  • Normally you want to display pages of results; in that case you should use the paged_search function instead of the search function shown in the above example.
  • You might be able to reuse the search renderer to display your results.