Course search

From MoodleDocs
Jump to: navigation, search

Note: This article is a work in progress. Please use the page comments or an appropriate moodle.org forum for any recommendations/suggestions for improvement.

Course search
Project state Community bonding period
Tracker issue CONTRIB-4335
Discussion Writing Moodle's Course Search plugin

Course Search plugin Progress

Assignee Shashikant Vaishnav

GSOC '13

Contents

Introduction

Object of this google summer of code project is to Implement Advanced Course Search that is flexible, case-insensitive, fast and can sort results by relevance. Most importantly it should work consistently on different Database engines and different content languages. The deliverable includes a functional advance search plugin for Moodle that can be installed and configured to substitute basic core search. The project will be implemented integrating 3rd party search engine API with Moodle course schema.

Problem Description

Non latin support with search queries: Database engines don't recognize word boundaries in non-english languages and can't do case-insensitive search.


Indexing: We don't want to reinvent the wheel as there are many awesome open source enterprise level search indexers available that can make our course search fast and efficient.


Sorting by relevance: We need to sort results according to relevance for example if a query result matches course name then it's more relevant then its other matches with summary of course. Implementing spelling correction (Did you mean?) feature. Also fuzzy search (alternate form of words) needs to be implemented.


Work consistently on different database engines and different content language: Course search should be implemented in a way that it can deal with as many databases possible. and it should be able to deal with different content language.



Features

  • works consistently on different DB engines and different content languages( Non latin languages support)


  • Search results with relevance( Score )


  • Case insensitive search capability


  • Searchable Document Formats


 * Hyper Text Markup Language
 * XML and derived formats
 * Microsoft Office document formats
 * Open Document Format
 * Portable Document Format 
 * Rich Text Format
 * Compression and packaging formats [See Rebuilding Solr Cell]
 * Text formats


  • Autocomplete ( autosuggest with non-latin languages too )


  • Very fast than existing course search implementation.


  • Spell Checking capability ( Did you mean ?)


  • Keyword Matching (Searching within specific field)


   Example :- "summary:Getting started with python" searches for the "Getting started with python" in summary field of courses.


  • Proximity Search
   Example :- "moodle perth"~4 Search for "moodle perth" within 4 words from each other.


  • Wildcard Search


  • Fuzzy search support ( Alternate form of words )


  • Filtering results by startdate (Range queries)


  • Pagination & sorting results by relevance, startdate, shortname.

Implementation

Earlier we had two choices for this purpose - Apache Solr and Sphinx. Both Apache solr and Sphinx has extensive documentation and community support and have very powerful indexing capabilities, but eventually faceted search and fuzzy search features that works out of the box with some other key points explained below for Apache Solr made it the winning solution.

Why Apache Solr

  • As in Courses we are going to deal with Different types of summary documents & Solr can index proprietary formats like Microsoft Word, Pdf, etc. Sphinx can't.
  • Solr supports field collapsing to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
  • As we were concerned about the non-latin character support, Solr supports unicode character search out of the box.
  • Solr, being an Apache project, is obviously Apache2-licensed. So the real challenge is now to integrate Apache Solr with moodle course search functionality. Advance course search will be installed as a plugin to substitute basic core search.


Project Roadmap

  • Preparing course search schema to integrate with Apache solr : Solr reads the schema to be indexed using a xml based configuration file, lets call it schema.xml. When user will install advance search plugin, it will contain its schema file that will be automatically replaced by the corresponding files located on the Solr server, and it can usually be found in the main Solr directory solr/conf. We will need to design moodle course search schema.xml depending on what fields we want to index and how. Available field types and filter options in Solr makes it very flexible how we want to do that. Fortunately similar problem has already been sorted out by Drupal folks and I am quite familiar with that so here we can use this knowledge reference. Searchable fields in course are idnumber, shortname, fullname, summary. We will start with these fields for designing the schema for moodle course search.


  • Extending apache Solr integration library with php to work with moodle : It is the Solr Php Client library that will talk between moodle and Solr. Here we need to extend these files to make it work with moodle course search plugin.


  • Indexing the course search data: The entry point of indexing courses is the Cron task. Whenever the cron will run, an associated callback will trigger the indexing procedure. The plugin will look for courses which are marked for indexing, then for each course an object that contains all the relevant field contents to be indexed, will be individually converted into XML query and sent to the Solr server for indexing. We can also create a custom event to notify other entities that we just performed an indexing operation. How much courses can be indexed at once can be an admin setting. On admin side we will also have means to mark a course as a candidate for reindexing, so -
  • We need to perform re-indexing(by Cron listening)
         - If a new coursed is created. 
         - If a existing course is deleted or updated.
         - Or it is marked by admin to index again


  • Plugin search API functions : Internal search API functions will have these objectives -

1. Getting the basic search query via form submission

2. Translation of this search query into appropriate Apache Solr query and adding appropriate parameters for extra search options

3. Trigger the search by sending this query to locally running Apache Solr instance

4. Getting the search response and rendering the search results via proper theme function.

As we know Advance search is plugin type code base so when installed it will overtake the responsibility of search system. So when the user activates advance search plugin, then moodle will interact with its internal search and to interact with apache solr in backend. These search functions will be responsible to 'generate' the required apache solr query and to speak with solr running instance.

On caching - While retrieving the list of courses that match search string Apache Solr has the mechanism to cache repetitious search queries. It has two built-in in-memory caches. one for search results and one for documents. They are typically limited to several thousands entries each.

Finally we will prepare the list of courses to show to the user, with filtering out hidden courses that user can not see, retrieve all additional information such as category name, course contacts, etc.


  • Achieving relevancy by using Solr's awesome faceted search feature: We can achieve relevancy by dynamic filtering results like searching start date(by days,months,years etc.) or spelling correct feature (Did you mean?) and in back-end Solr supports that out of box.

Relevancy is made using the boost clause on the field you want to give higher relevance. These "boost" clause will be added on search fields by internal search API functions as per the need while generating the search query for Apache Solr. From the Solr docs I found these key points to achieve for relevant results.[2]

By default Apache solr sorts given search results in order of relevancy, and relevancy is defined in terms of how much 'boost' we have given to particular field and if some field has to remain more relevant by default than other Apache Solr by altering the schema file.


  • Solr with alternate form of words(fuzzy search):- In Apache Solr Text fields are typically indexed by breaking the text into words and applying various transformations such as lower casing removing plurals, or stemming to increase relevancy. The same text transformations are normally applied to any queries in order to match what is indexed.So when you are searching for "Fly" it also looks for "Flies" or "Flew"


  • Solr with intra-word delimiters:- The filter 'WordDelimiterFilter' can be used in the analyzer for the field being queried to match words with intra-word delimiters such as dashes or case changes. By default apache solr provides lots of filters for text field type to configure these things. Properly configured schema.xml is the key point to handle this.

Installation

Prerequisite

Java 5 or higher (a.k.a. 1.5.x), PHP 5.1.4 or higher, moodle 2.5 or higher.

Installing admin tool

1. Download the admin tool from here. (https://moodle.org/plugins/view.php?plugin=tool_coursesearch)

2. Extract the Course Search folder. Put it under moodle installation/admin/tool directory.

3. It should be named coursesearch. If you are already logged in just refreshing the browser should trigger your moodle site to begin the install 'Plugins Check'.

4. If not then navigate to Administration > Notifications.

Installing search_cleantheme OR replacing/copying the renderer.php file

Here is two options either you may replace/copy your existing theme renderer with the one found in search_cleantheme(https://github.com/shashirepo/moodle-theme_cleantheme). OR you may install cleantheme itself.

replacing /copying the renderer file to your theme.

1. Copy renderer.php file from search_cleantheme replace it with your theme's renderer file.

2. Standard moodle theme doesn't have any renderer file so you may simply copy this to your theme directory.

3. Rename renderer class name according to your theme name.

for example if you are using theme 'clean'. then rename the class names to 'theme_clean_core_renderer' & 'theme_clean_core_course_renderer'.

OR you may either use search_cleantheme. This is based on bootstrap clean theme.

1. Download the cleantheme from here (https://github.com/shashirepo/moodle-theme_cleantheme)

2. Extract the theme folder. and put it under moodle installation theme directory.

3. If you are already logged in just refreshing the browser should trigger your Moodle site to begin the install 'Plugins Check'.

4. If not then navigate to Administration > Notifications.

Installing Solr & placing the plugin Schema

Download the latest Solr 4.4.0 release from: http://lucene.apache.org/solr/

Unpack the tarball somewhere not visible to the web (not in your apache docroot and not inside of your moodle directory).

The Solr download comes with an example application that you can use for testing, development, and even for smaller production sites. This application is found at apache-solr-4.4.0/example.

Move apache-solr-4.4.0/example/solr/collection1/conf/schema.xml and rename it to something like schema.bak. Then move the schema.xml that comes with moodle course search admin tool plugin to take its place.

Similarly, move apache-solr-4.4.0/example/solr/collection1/conf/solrconfig.xml and rename it like solrconfig.bak. Then move the solrconfig.xml that comes with the moodle course search admin tool plugin to take its place.

Finally, move apache-solr-4.4.0/example/solr/collection1/conf/protwords.txt and rename it like protwords.bak. Then move the protwords.txt that comes with the moodle course search admin tool plugin to take its place.

Make sure that the conf directory includes the following files - the Solr core may not load if you don't have at least an empty file present: solrconfig.xml schema.xml elevate.xml mapping-ISOLatin1Accent.txt protwords.txt stopwords.txt synonyms.txt

Now start the solr application by opening a shell, changing directory to apache-solr-4.4.0/example, and executing the command java -jar start.jar

Test that your solr server is now available by visiting http://localhost:8983/solr/admin/


Testing with ping to solr

1. Advance Course can be found under :-

Administration->course->Course search settings(URL:- http://127.0.0.1/MoodleInstalltionURL/admin/tool/coursesearch)

2. Give the solr configuration options here:-

Solr Host:- localhost or 127.0.0.1 Solr Port:- 8983 (Default port for Solr ) Solr path :- /solr (Configuration directory for solr)

3. Click on "Check Solr instance Setting". if it Shows ping successful(with an success image). Now click save changes.

4. Now click on "Index courses" to index all the courses. After successful indexing. It will come up with a success image.

5. Click on "Optimize" to optimize the existing indexes And improve solr performance.


Enjoy the Search by going on page (http://127.0.0.1/MoodleInstallationURL/course/search.php)

To configure navigation with the course search page goto Site administration -> Front page -> Front page settings.

Design

Course search plugin snapshots:-

* Course search admin tool

Course search Adminui.png


















* Course search results

Course Search IntraWordQuery.png
















*Advance course search auto-suggestions

autocomplete.png














Schedule

May 28 - June 17 ( Community Bonding Period)


➢ Discuss the further ideas with the mentor .

➢ Final list of tasks to be implemented under this project.

➢ Read & study the documentations on moodle and its Database Schema.

➢ Study the plugin development in moodle.

➢ Study the Apache Solr Documentation & its complete working

➢ Set up the development and testing environment.


June 18- july 10( Interim Period )


➢ Start Coding!

➢ Prepare moodle course search schema xml configuration

➢ Create moodle plugins (admin tool and theme) that allow to substitute core search with alternative solution. At first it will duplicate the existing functionality but it is ready for modifications without changing the core.

➢ Create function for manual reindexing of all courses, add button for admin on the plugin page to perform manual reindexing.

➢ Implement the search in the plugin just to make sure it works with the simple result output (no filtering, no pagination, no additional information retrieval, etc)

➢ Document the work.


July 11- August 02( Interim Period )


➢ Display courses properly, filtered, with all additional information

➢ Add caching of search results so everything is not retrieved again when user requests the second page

➢ Prepare search plugin for distrubution: create install/upgrade/uninstall scripts, write Readme file how to install and configure Solr.


Mid Term Evaluation

Submit code of completed tasks along with documentation.


August 3- August 25 (Interim period)


➢ Listen to events and mark courses for reindexing

➢ Implement cron function that does reindexing

➢ Add configurable settings to the plugin (when/whether to run automatic re indexing)

➢ Document the work.

➢ Debug the code and reduce code complexity.


August 26- Sep 16 (Interim period)


➢ Filtering results by dates, tags.

➢ Spell correction capability.

➢ Fuzzy search support.

➢ Implement extra functionality (if any other feature required and time permits).


Sep 17- Sep 23(Pencils down)


➢ Testing, documentation & debugging.

➢ Final Release

Requirements

Require moodle 2.5 and Apache solr 4.x

Credits

Tracker

See also

Personal tools
User docs (English)