Difference between revisions of "Search engine adapters"

Jump to: navigation, search
(Making The Backaccess Link)
(Updating The index Database)
Line 127: Line 127:
 
* drop indexes on deleted resources and contents
 
* drop indexes on deleted resources and contents
  
An API call back tells each of these three actions where the items to consider are in the Moodle database. This callback SHOULD return an array of arrays of strings, each containing the following fieldset :  
+
An API callback tells each of these three actions where the items to consider are in the Moodle database. This callback SHOULD return an array of arrays of strings, each containing the following fieldset :  
 
* primary id fieldname,  
 
* primary id fieldname,  
 
* table name,  
 
* table name,  

Revision as of 14:27, 18 November 2007

The global search engine of Moodle, available as experimental feature till the Moodle 1.9 release, allows plugin new document types for being searched and indexed by the Lucene indexer.

Each module or block should have an adapter written to wrap the plugin's internal data model to a searchable document. The actual implementation allows a module to provide the search engine with a set of virtual documents. The search engine will index the text content of these documents, recording sufficiant data to access back the exact context in which it was appearing (i.e, access URL, course context, etc.).

Virtual documents are defined as subclasses of the SearchDocument class. Only the constructor of the subclass must be written in order to map an input record to an internal document definition.

The goals of the adapter are :

  • extract all virtual documents from the module data model and give them to the indexer (index first construction) through an iterator.
  • extract a single document for index update
  • provide sufficiant information for index delete of obsolete content
  • define the access URL that will access back the resource
  • define the access check algorithm so that the user will only access resources he is allowed to

Adapters For Modules And Blocks

Any adapter for a module or a block must reside in the
/search/documents
subdirectory, and will be named such as
<module>_document.php

Search defines

Each searchable module should add at least both (typical) following defines in /search/lib.php :

define('SEARCH_TYPE_<MODULE>', '<module>');
define('PATH_FOR_SEARCH_TYPE_<MODULE>', 'mod/<module>');

or

define('SEARCH_TYPE_<BLOCK>', '<block>');
define('PATH_FOR_SEARCH_TYPE_<BLOCK>', 'blocks/<block>');

Constructor

The constructor of the SearchDocument class has the following signature :

public function __construct(&$doc, &$data, $course_id, $group_id, $user_id, $path)

where :

  • &$doc is a reference onto a PHP object that should provide the fields :
    • docid : the id of the document, as suitable for reconstructing the access URL.
    • documenttype : in general, the name of the module itself
    • itemtype : a subclassifier, if the module provides more than one virtual document to the search engine.
    • contextid : the context object id that should be considered when checking accessor's capabilities
    • title : the title string to appear in search results as a caption
    • author : if the author is known, the user id (mdl_user.id) representing the author.
    • contents : a text bulk from the document content, filtered out from any formatting attributes or tags
    • url : the document url, that will be constructed by the adapter to access back the resource
    • date : usually the date when the resource was created
  • &$data is a reference onto a contextual metadata object that will be serialized among with the record, but will not be used as searchable content
  • $course_id is the current course the ressource is within
  • $group_id is the current group the resource belongs to, if the ressource is in a group scope (i.e. separate group wiki attachements), 0 elsewhere.
  • $user_id is the id of the user the resource beslongs to, in case the ressource is in a user specific scope (i.e. post or assignment attachements), 0 elsewhere.
  • $path is one of the above PATH defines for the module.

Providing The indexer Documents From The Module

When first constructing the index, The Indexer needs scanning all the instances of the plugin.

The adapter API must provide the

<module>_iterator(){ ... }

function that will give a set of consistant plugin instances. Here is a very standard template code for this method :

function <module>_iterator() {
    $<module> = get_records('<module>');
    return $<module>;
} //<module>_iterator

On each instance, the function :

function <module>_get_content_for_index(&$plugininstance) { ... }

is called for constructing relevant instances of the SearchDocument subclass. This function MUST return an array of SearchDocuments or false. The typical synopsis of this function is :

function <module>_get_content_for_index(&$plugininstance){
    $documents = array();

    // invalid plugin
    if (!$plugininstance) return $documents;

    // TODO : get an indexable item set

    foreach($indexableitems as $indexableitem) {

       // TODO : Prepare params with 

       $documents[] = new ForumSearchDocument(... params ...);
    } 
    return $documents;
}

Making The Backaccess Link

The constructor of the SearchDocument subclass must construct a backaccess link for the document, and give it as the 'url' attribute of the first constructor parameter (&$doc). this is usually done using a callback to the document API. the synopsys is :

function <module>_make_link(...contextual params...) {
    global $CFG;
    
    return $CFG->wwwroot.<moodle path expression that drives back to the content>;
} //<module>_make_link

Contextual params are usually ids of course module, or internal entities depending on the module construction, modal values...

Updating The index Database

The search engine, once fed with its first indexing results from scratch, will be regularily updating by a cron job. The index is updated in diff mode, so only modified entries in the Moodle data model should be considered.

The update process will :

  • add new items, as far as they are handled by the search engine
  • update modified items
  • drop indexes on deleted resources and contents

An API callback tells each of these three actions where the items to consider are in the Moodle database. This callback SHOULD return an array of arrays of strings, each containing the following fieldset :

  • primary id fieldname,
  • table name,
  • time created field name,
  • time modified field name,
  • itemtype,
  • [additional SELECT clause for filtering rows] // optional

Here comes a synopsys for such code. There should be as many arrays as known document subtypes in the module.

function <module>_db_names() {
    return array(
        array('id', '<module>_<entity>', 'created', 'modified', '<itemtype>', '')
    );
} //<module>_db_names

Note : 'created' and 'modified' have several expression among the variety of modules/blocks. Sometimes, both of these informations are not available. Consider using the dates of a parent dependency in case they are missing.

Knowing where the items to update/add/delete are, both first operation will use a "Single Document Wrapper" to process to the update/add operation individually. This is the purpose of the

function <module>_single_document($id, $itemtype) { ... }

function. Here comes a rough prototype for such a callback :

function <module>_single_document($id, $itemtype) {

    switch($itemtype){
        case <type1>:
           ... get content holding record ...
           ... get module_obj from previous ...
        break;
        case <type2>:
           ... get content holding record ...
           ... get module_obj from previous ...
        break;
        ... and so on ...
    }

    $coursemodule = get_field('modules', 'id', 'name', '<module>');
    $cm = get_record('course_modules', 'course', $<module_obj>->course, 'module', $coursemodule, 'instance', $<module_obj>->id);
    if ($cm){
        $context = get_context_instance(CONTEXT_MODULE, $cm->id);
        ... preparing some data eventually ... 
        return new <ModuleItem>SearchDocument(get_object_vars($<content_obj>), $<module_obj>->id, $<module_obj>->course, $itemtype, $context->id);
    }
    return null;
} // <module>_single_document

Note : we need use get_object_vars() as historical implementation uses a hash in constructors rather than an object.

Checking Access Back To The Content

Physical Document Adapters