Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Search API: Difference between revisions

From MoodleDocs
No edit summary
m (Slight tweak to block search limitations)
(16 intermediate revisions by 3 users not shown)
Line 11: Line 11:
Note that I will be writing '''plugintype''' and '''pluginname''' to make it easier to understand for 3rd party developers not working in core, but it is also applicable for core subsystems, using '''core''' as plugintype and '''componentname''' as pluginname. I will be using '''areaname''' as the area you are defining.
Note that I will be writing '''plugintype''' and '''pluginname''' to make it easier to understand for 3rd party developers not working in core, but it is also applicable for core subsystems, using '''core''' as plugintype and '''componentname''' as pluginname. I will be using '''areaname''' as the area you are defining.


=== Easy case: Activity information ===
If you just want to index your activity basic information, like the name and the description, you can skip all the documentation below. You can do it by extending '''\core_search\base_activity''' (\core_search\area\base_activity in Moodle 3.1). Copy & paste mod/book/classes/search/activity.php and replace 'book' by your 'activityname'. If you use different fields than '''name''' and '''intro''' (the defacto standard) or you want to index extra fields or files look at mod/page/classes/search/activity.php (extra fields) or mod/assign/classes/search/activity.php (files).
To index other information your activity tables contain (e.g. glossary or journal entries, assignment submissions...) you need to extend \core_search\base_mod instead. Continue reading below please.
=== Base class ===
All the search area stuff in contained in a single '''\plugintype_pluginname\search\areaname''' class and it should be extending one of the following classes:
All the search area stuff in contained in a single '''\plugintype_pluginname\search\areaname''' class and it should be extending one of the following classes:
* '''\core_search\area\base''': Generic base class for a search area
* '''\core_search\base (\core_search\area\base in Moodle 3.1)''': Generic base class for a search area
* '''\core_search\area\base_mod''': Base class for activities search areas ([[Activity_modules]])
* '''\core_search\base_mod (\core_search\area\base_mod in Moodle 3.1)''': Base class for activities search areas ([[Activity_modules]])
* '''\core_search\area\base_activity''': Base class also for activities, but different than '''base_mod''' as this is intended to be used to index Moodle activities basic data like the activity name and description. For other specific activity data to be indexed use '''base_mod'''. If you have doubts and you don't know which one you should use think of forum activities and forum posts, forum activities should use '''base_activity''', but forum posts should use '''base_mod'''.
* '''\core_search\base_activity (\core_search\area\base_activity in Moodle 3.1)''': Base class also for activities, but more specific than '''base_mod''' as it is intended to be used to index Moodle activities basic data like the activity name and description. For other specific activity data to be indexed use '''base_mod'''. If you have doubts and you don't know which one you should use think of forum activities and forum posts, forum activities should use '''base_activity''', but forum posts should use '''base_mod'''.
* '''\core_search\base_block (Available in Moodle 3.4+)''': Base class for block search areas ([[Blocks]]). Only blocks located on course pages and the site home page are supported for indexing. This is based on the 'pagetypepattern' of the block_instance. Supported page types are as follows:
** Blocks on the site home page (pagetypepattern = 'site-index')
** Blocks on a course home page (pagetypepattern = 'course-view-*')
** Blocks on any course page (pagetypepattern = 'course-*')
** Blocks on any page (pagetypepattern = '*'), where they are located at a course context
: Note: Indexing is not supported for blocks configured for course-module-level display.
 
: '''\core_search\base_block''' includes a default implementation for '''get_document_recordset''' which is suitable for blocks storing their data in the '''block_instances''' table. Blocks of this nature only need implement the '''get_document''' function to return the document. Blocks storing their data in other tables will need to override the '''get_document_recordset''' function. See '''blocks/html/classes/search/content.php''' for an example of a 'standard' block indexing implementation.
 
=== Name ===
You will need to set a visible name for the search area, for plugins this should be defined in the plugin's language strings file and for core subsystems in ''lang/en/search.php''. The format to set this search area name that will be visible for Moodle users (on search page filters section or on admin search pages) is '''search:AREANAME''', so for example for forum posts we have '''$string['search:posts'] = 'Forum - posts'''' in ''mod/forum/lang/en/forum.php''


=== Index data ===
=== Index data ===
Line 32: Line 50:
</code>
</code>


This function receives one of the previous query results and should return a '''\core_search\document''' object with all the data to be indexed.
This function receives one of the previous query results ($record) and an array of options. It should return a '''\core_search\document''' object with all the data to be indexed.
 
The options this function can receive are:
* '''indexfiles''': Whether to index files or not. File indexing support also depends on the backend search engine, not all of them support file indexing, no need to set if the document is new if there is no filesindexing support
* '''lastindexedtime''': Also related with files, see example below


<code php>
<code php>
public function get_document($record) {
public function get_document($record, $options = array) {


     // All wrapped in a try & catch as we should not stop the indexing process because of a legacy corrupted database.
     // All wrapped in a try & catch as we should not stop the indexing process because of a legacy corrupted database.
Line 51: Line 73:
     // Prepare associative array with data from DB.
     // Prepare associative array with data from DB.
     $doc = \core_search\document_factory::instance($record->id, $this->componentname, $this->areaname);
     $doc = \core_search\document_factory::instance($record->id, $this->componentname, $this->areaname);
     $doc->set('title', $record->something-that-describes-the-result);
 
     // Content should contain just plain text, but we have text editors that allow users to set text also in HTML or Markdown,
    // Any content should be converted to plain text.
     // if what we want to set into content or descriptionN fields comes from a text editor we will have a *fieldname*format field
 
     // and with both the user input and the format that was used we can convert it to plain text.
    // If you just have a text string you need to call content_to_text() with the $contentformat param set to false.
     $doc->set('title', content_to_text($record->something-that-describes-the-result));
 
     // For a property named 'content' where another property 'contentformat' is also present the text should be
     // passed through to content_to_text() when declaring the document. This will ensure that HTML, Markdown, ...
     // formats are converted to plain text. Similar to what we do with format_text.
     $doc->set('content', content_to_text($record->content, $record->contentformat));
     $doc->set('content', content_to_text($record->content, $record->contentformat));
     $doc->set('contextid', $context->id);
     $doc->set('contextid', $context->id);
     $doc->set('type', \core_search\manager::TYPE_TEXT);
     $doc->set('type', \core_search\manager::TYPE_TEXT);
     $doc->set('courseid', $record->courseid);
     $doc->set('courseid', $record->courseid);
    $doc->set('modified', $record->timemodified);
    // Optional fields.
    // The user that created the record. It is optional. In some cases like forum posts makes sense to have it available, but in some other cases like activities it does not help much.
     $doc->set('userid', $record->userid);
     $doc->set('userid', $record->userid);
    $doc->set('modified', $record->timemodified);


    // In case the indexed document should only be accessed by the user that created it replace NO_OWNER_ID constant by the owner user userid.
    $doc->set('owneruserid', \core_search\manager::NO_OWNER_ID);
    // Extra contents associated to the document.
    $doc->set('description1', content_to_text($record->extracontent1, $record->extracontent1format));
    $doc->set('description2', content_to_text($record->extracontent2, $record->extracontent2format));
    // Not compulsory, but speeds up things when the search area includes files (see [[#Indexing files]])
    if (isset($options['lastindexedtime']) && ($options['lastindexedtime'] < $record->created)) {
        // If the document was created after the last index time, it must be new.
        $doc->set_is_new(true);
    }
     return $doc;
     return $doc;
}
</code>
==== Indexing files ====
First declare that the search area is interested in indexing the files attached to its documents.
<code php>
public function uses_file_indexing() {
    return true;
}
</code>
Define attach_files function, which receives a \core_search\document object and fills it with stored_file objects.
This is a simplified example of one of the cases we can find in core.
<code php>
public function attach_files($document) {
    $fs = get_file_storage();
    $context = \context_module::instance($document->get('contextid'));
    $files = $fs->get_area_files($context->id, 'COMPONENTNAME', 'FILEAREA', $document->itemid);
    foreach ($files as $file) {
        $document->add_stored_file($file);
    }
}
}
</code>
</code>
Line 76: Line 146:
==== Automatic context-based filtering ====
==== Automatic context-based filtering ====


This is done automatically by Moodle, the user performing a query will only have access to contents in contexts where the user have access. If your search area indexes contents that belong to a course the user will only see results that belong to the courses where they have access, if your search area belongs to a course module, only to visible activities in courses where you have access and there are no completion rules preventing them to be accessible.
This is done automatically by Moodle, the user performing a query will only have access to contents in contexts where the user have access. If your search area indexes contents that belong to a course the user will only see results that belong to the courses where they have access, if your search area belongs to a course module, only to visible activities in courses where you have access and there are no completion rules preventing them to be accessible. Similarly, if your search area belong to a user, only the current user and site admins will have access to your search area contents. ''Note that this last option is closely related to doc's 'owneruserid' field, the main difference is that setting 'owneruserid' to the user id will make the search area documents unavailable to admin users.''


You need to specify to what context (or contexts) your search area contents belong to. Overwrite '''$levels''' static attribute in your class.
You need to specify to what context (or contexts) your search area contents belong to. Overwrite '''$levels''' static attribute in your class.

Revision as of 17:56, 14 November 2017

Moodle 3.1


Overview

The search API allows you to index contents in a search engine and query the search engine for results. Any Moodle component (all plugin types and all core subsystems) can define search areas for their contents.

This is different from the internal search feature some components have.

Add a search area

Note that I will be writing plugintype and pluginname to make it easier to understand for 3rd party developers not working in core, but it is also applicable for core subsystems, using core as plugintype and componentname as pluginname. I will be using areaname as the area you are defining.

Easy case: Activity information

If you just want to index your activity basic information, like the name and the description, you can skip all the documentation below. You can do it by extending \core_search\base_activity (\core_search\area\base_activity in Moodle 3.1). Copy & paste mod/book/classes/search/activity.php and replace 'book' by your 'activityname'. If you use different fields than name and intro (the defacto standard) or you want to index extra fields or files look at mod/page/classes/search/activity.php (extra fields) or mod/assign/classes/search/activity.php (files).

To index other information your activity tables contain (e.g. glossary or journal entries, assignment submissions...) you need to extend \core_search\base_mod instead. Continue reading below please.

Base class

All the search area stuff in contained in a single \plugintype_pluginname\search\areaname class and it should be extending one of the following classes:

  • \core_search\base (\core_search\area\base in Moodle 3.1): Generic base class for a search area
  • \core_search\base_mod (\core_search\area\base_mod in Moodle 3.1): Base class for activities search areas (Activity_modules)
  • \core_search\base_activity (\core_search\area\base_activity in Moodle 3.1): Base class also for activities, but more specific than base_mod as it is intended to be used to index Moodle activities basic data like the activity name and description. For other specific activity data to be indexed use base_mod. If you have doubts and you don't know which one you should use think of forum activities and forum posts, forum activities should use base_activity, but forum posts should use base_mod.
  • \core_search\base_block (Available in Moodle 3.4+): Base class for block search areas (Blocks). Only blocks located on course pages and the site home page are supported for indexing. This is based on the 'pagetypepattern' of the block_instance. Supported page types are as follows:
    • Blocks on the site home page (pagetypepattern = 'site-index')
    • Blocks on a course home page (pagetypepattern = 'course-view-*')
    • Blocks on any course page (pagetypepattern = 'course-*')
    • Blocks on any page (pagetypepattern = '*'), where they are located at a course context
Note: Indexing is not supported for blocks configured for course-module-level display.
\core_search\base_block includes a default implementation for get_document_recordset which is suitable for blocks storing their data in the block_instances table. Blocks of this nature only need implement the get_document function to return the document. Blocks storing their data in other tables will need to override the get_document_recordset function. See blocks/html/classes/search/content.php for an example of a 'standard' block indexing implementation.

Name

You will need to set a visible name for the search area, for plugins this should be defined in the plugin's language strings file and for core subsystems in lang/en/search.php. The format to set this search area name that will be visible for Moodle users (on search page filters section or on admin search pages) is search:AREANAME, so for example for forum posts we have $string['search:posts'] = 'Forum - posts' in mod/forum/lang/en/forum.php

Index data

This function should return a recordset with all the stuff in your component that has been modified since $modifiedfrom timestamp (integer).

public function get_recordset_by_timestamp($modifiedfrom = 0) {

   global $DB;
   // The idea is to include here most (if not all) of the data you will need to index (see get_document below)
   $sql = "SELECT x.* FROM {xxxxx} WHERE x.timemodified >= ? ORDER BY x.timemodified ASC";
   // Note that this is an example, you might have more params.
   return $DB->get_recordset_sql($sql, array($modifiedfrom);

}

This function receives one of the previous query results ($record) and an array of options. It should return a \core_search\document object with all the data to be indexed.

The options this function can receive are:

  • indexfiles: Whether to index files or not. File indexing support also depends on the backend search engine, not all of them support file indexing, no need to set if the document is new if there is no filesindexing support
  • lastindexedtime: Also related with files, see example below

public function get_document($record, $options = array) {

   // All wrapped in a try & catch as we should not stop the indexing process because of a legacy corrupted database.
   try {
       $context = \context_course::instance($record->contextid);
   } catch (\dml_missing_record_exception $ex) {
       debugging('Error retrieving ' . $this->areaid . ' ' . $record->id . ' document, not all required data is available: ' .
           $ex->getMessage(), DEBUG_DEVELOPER);
       return false;
   } catch (\dml_exception $ex) {
       debugging('Error retrieving ' . $this->areaid . ' ' . $record->id . ' document: ' . $ex->getMessage(), DEBUG_DEVELOPER);
       return false;
   }
   // Prepare associative array with data from DB.
   $doc = \core_search\document_factory::instance($record->id, $this->componentname, $this->areaname);
   // Any content should be converted to plain text.
   // If you just have a text string you need to call content_to_text() with the $contentformat param set to false.
   $doc->set('title', content_to_text($record->something-that-describes-the-result));
   // For a property named 'content' where another property 'contentformat' is also present the text should be
   // passed through to content_to_text() when declaring the document. This will ensure that HTML, Markdown, ...
   // formats are converted to plain text. Similar to what we do with format_text.
   $doc->set('content', content_to_text($record->content, $record->contentformat));
   $doc->set('contextid', $context->id);
   $doc->set('type', \core_search\manager::TYPE_TEXT);
   $doc->set('courseid', $record->courseid);
   $doc->set('modified', $record->timemodified);
   // Optional fields.
   // The user that created the record. It is optional. In some cases like forum posts makes sense to have it available, but in some other cases like activities it does not help much.
   $doc->set('userid', $record->userid);
   // In case the indexed document should only be accessed by the user that created it replace NO_OWNER_ID constant by the owner user userid.
   $doc->set('owneruserid', \core_search\manager::NO_OWNER_ID);
   // Extra contents associated to the document.
   $doc->set('description1', content_to_text($record->extracontent1, $record->extracontent1format));
   $doc->set('description2', content_to_text($record->extracontent2, $record->extracontent2format));
   // Not compulsory, but speeds up things when the search area includes files (see #Indexing files)
   if (isset($options['lastindexedtime']) && ($options['lastindexedtime'] < $record->created)) {
       // If the document was created after the last index time, it must be new.
       $doc->set_is_new(true);
   }
   return $doc;

}

Indexing files

First declare that the search area is interested in indexing the files attached to its documents.

public function uses_file_indexing() {

   return true;

}

Define attach_files function, which receives a \core_search\document object and fills it with stored_file objects.

This is a simplified example of one of the cases we can find in core. public function attach_files($document) {

   $fs = get_file_storage();
   $context = \context_module::instance($document->get('contextid'));
   $files = $fs->get_area_files($context->id, 'COMPONENTNAME', 'FILEAREA', $document->itemid);
   foreach ($files as $file) {
       $document->add_stored_file($file);
   }

}

Indexing performance

The indexing process runs by cron incrementally, keeping track of the last indexing time... In big Moodle sites this process can be very heavy as millions of records can be indexed in the same PHP process, so performance is quite important. Think that any database query that you add in your get_document function to retrieve data that is part of your document will run for every document. If you search area can potentially contain a big number of records, you might be interested in adding static caches.

Access control

Moodle's global search limits the access to the indexed data in two different ways.

Automatic context-based filtering

This is done automatically by Moodle, the user performing a query will only have access to contents in contexts where the user have access. If your search area indexes contents that belong to a course the user will only see results that belong to the courses where they have access, if your search area belongs to a course module, only to visible activities in courses where you have access and there are no completion rules preventing them to be accessible. Similarly, if your search area belong to a user, only the current user and site admins will have access to your search area contents. Note that this last option is closely related to doc's 'owneruserid' field, the main difference is that setting 'owneruserid' to the user id will make the search area documents unavailable to admin users.

You need to specify to what context (or contexts) your search area contents belong to. Overwrite $levels static attribute in your class.

protected static $levels = [CONTEXT_COURSE]

Final access checking

Your search area is the responsible to filter out the results that

public function check_access($id) {

   try {
       $myobject = $this->get_xxxxx($id);
   } catch (\dml_missing_record_exception $ex) {
       // If the record does not exist anymore in Moodle we should return \core_search\manager::ACCESS_DELETED.
       return \core_search\manager::ACCESS_DELETED;
   } catch (\dml_exception $ex) {
       // Skip results if there is any unexpected error.
       return \core_search\manager::ACCESS_DENIED;
   }
   if ($myobject->visible === false) {
       return \core_search\manager::ACCESS_DENIED;
   }
   return \core_search\manager::ACCESS_GRANTED;

}

The automatic context-based filtering should get rid of most of the results where a user do not have access. If check_access's ratio of visible results vs non-visible results is high you might need to rethink about the context where your search area belongs or we might need to expand the APIs.

Other required methods

The link to the result. Should return a \moodle_url object.

public function get_doc_url(\core_search\document $doc) {

   // This is just an example, can vary a lot depending on what are you indexing.
   return new \moodle_url('link/to/your/component.php', array('id' => $doc->get('id')));

}

The link to the result context, in some cases it might be the same than the doc url. Should return a \moodle_url object.

public function get_context_url(\core_search\document $doc) {

   // This is just an example, can vary a lot depending on what are you indexing.
   return new \moodle_url('link/to/your/component/context.php', array('id' => $doc->get('id')));

}