Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Search engines: Difference between revisions

From MoodleDocs
Line 68: Line 68:


=== Other abstract methods you need to overwrite ===
=== Other abstract methods you need to overwrite ===
<code php>
public function file_indexing_enabled() {
    // Defaults to false, overwrite it if your search engine supports file indexing.
    return false;
}
</code>


<code php>
<code php>

Revision as of 08:03, 4 May 2016

Template:Search engine plugins

Introduction

Search engines index big amounts of data in a structured way that allow users to query them and extract relevant data. There are many search engines with nice APIs to set data on and retrieve data from. We made Moodle's global search pluggable so different backends can be used, from a simple database table (ok for small sites but unusable for big sites) to open sourced systems like solr or elasticsearch (on top of Apache Lucene) or proprietary cloud based systems.

Terms

Index: You know what it means, but in this page we use index as the data container in your search engine. It can be an instance in your search engine server or a database table name if you are writing a search engine for mongodb (just an example) Document: A "searchable" unit of data like a database entry, a forum post... You can see it as one of the search results you might expect to get returned by a search engine.

Writing your own search engine plugin

To write your own search engine you need to code methods to set, retrieve and delete data from your search engine. You will need to add a \search_yourplugin\engine class in search/engine/yourplugin/classes/engine.php extending \core_search\engine.

Search engine setup

Your search engine needs to be prepared to index data, you can have a script for your plugin users so they can easily create the required structure the search engine. Otherwise add instructions about how to do it.

You can get the list of fields Moodle needs along with some other info you might need like the field types calling \core_search\document::get_default_fields_definition()

Add contents

This method is executed when Moodle contents are being indexed in the search engine. Moodle iterates through all search areas extracting which contents should be indexed and assigns them a unique id based on the search area. public function add_document(array $doc, $fileindexing = false) {

   // Use curl or any other method or extension to push the document to your search engine.

}

$doc will contain a document data with all required fields (+ maybe some optionals) and its contents will be already validated so a integer field will come with an integer value... $fileindexing will be true if file indexing if files should be indexed. Will be false if your plugin does not support file indexing

Retrieve contents

This is the key method, as search engine plugins have a lot of flexibility here.

You will get the search filters the user specified and the list of contexts the user can access and this function should return an array of \core_search\document objects.

public function execute_query($filters, $usercontexts) {

   // Prepare a query applyting all filters.
   // Include $usercontexts as a filter to contextid field.
   // Send a request to the server.
   // Iterate through results (respecting \core_search\manager::MAX_RESULTS).
   // Check user access, read https://docs.moodle.org/dev/Search_engines#Security for more info
   // Convert results to \core_search\document type objects using \core_search\document::set_data_from_engine
   // Return an array of \core_search\document objects.

}

Security

It is crucial that this function is checking \core_search\document::check_access results and do not return results where the user do not have access. Moodle already performs part of the required security checkings, but search areas always have the last word and it should be respected.

Delete contents

public function delete($areaid = false) {

   if ($areaid === false) {
       // Delete all your search engine index contents.
   } else {
       // Delete all your search engine contents where areaid = $areaid.
   }

}

\core_search\document::check_access will return \core_search\manager::ACCESS_DELETED if a document returned from the search engine is not available in Moodle any more, you can use this to clean up the search engine contents with some kind of \search_yourplugin\engine::delete_by_id method. You can look at search/engine/solr/classes/engine.php execute_query method for an example of this.

Other abstract methods you need to overwrite

public function file_indexing_enabled() {

   // Defaults to false, overwrite it if your search engine supports file indexing.
   return false;

}


public function is_server_ready() {

   // Check if your search engine is ready.

}

Other methods you might be interested in overwriting

public function is_installed() {

   // Check if the required PHP extensions you need to make the search engine work are installed.

}


public function optimize() {

   // Optimize or defragment the index contents.

}


These methods are called while the indexing process is running and allow search engine to hook the indexing process.

public function index_starting($fullindex = false) {

   // Nothing by default.

}


public function index_complete($numdocs = 0, $fullindex = false) {

   // Nothing by default.

}


public function area_index_starting($searcharea, $fullindex = false) {

   // Nothing by default.

}


public function area_index_complete($searcharea, $numdocs = 0, $fullindex = false) {

   return true;

}

Adapting document formats to your search engine format

\core_search\document is the class that represents a document, depending on your search engine backend limitations or on how it stores time values you might be interested in overwriting this class in \search_yourplugin\document. The main functions you might be interested in overwriting are:

Format date/time fields

public static function format_time_for_engine($timestamp) {

   // Convert $timestamp to a string using the format used by your search engine.

}

By default, \core_search\document::format_time_for_engine returns a timestamp (integer).

Import date/time contents from the search engine

public static function import_time_from_engine($time) {

   // Convert the string returned from the search engine as a date/time format to a timestamp (integer).

}

By default, \core_search\document::import_time_from_engine returns a timestamp (integer).

Format string fields

public static function format_string_for_engine() {

   // Limit the string length, convert iconv if your search engine only supports an specific charset...

}

By default, \core_search\document::format_string_for_engine returns the string as it is.