Global search (GSoC2013)

From MoodleDocs
(Redirected from Global search)
Jump to: navigation, search

Note: This article is a work in progress. Please use the page comments or an appropriate moodle.org forum for any recommendations/suggestions for improvement.

Global search
Project state Coding period
Tracker issue MDL-31989
Discussion Writing Moodle's Global Search
Assignee Prateek Sachan

GSOC '13

Contents

Introduction

Global Search will have the feature of searching keywords within the entire Moodle site across modules keeping the security intact having full-text advance search capabilities.

  • It will display results based on relevance weightage.
  • Security will be preserved throughout the search.
  • Search Modules will enable chosen search engine integration with ease. Admins will have the option for selecting the modules that could be made "searchable"
  • It will include keywords from other files types (like PDFs, PPTs, HTML content and others).
  • Following are the features that I'm considering in implementing in the first version of Global Search:
  1. Groupings of boolean operators:AND, OR, NOT. Eg.: ("query1" AND "query2") OR ("query3" NOT "query4")
  2. Highlighting: All matched keywords will be highlighted. For long content, only a short part will be displayed alongwith the highlighted keyword.
  3. Searching for phrases. Results with matched phrase will have higher priority and hence will be shown higher in the results.
  4. Wildcard (*) (?) feature.
  5. Stemming. Eg.: bag will return results both from bag and bags
  6. Proximity Searches:
  • "mood"~2 returns "moodle". (2 alphabets away from the searched term).
  • "moodle australia"~3 returns results containing "moodle hq at perth australia" (the queried terms were within 3 words of each other)

Requirements

  • Moodle 2.5 and above.
  • PHP Solr extension.

Design

Adding a Global Search Block

Add Global Search Block.png

Search through Global Search Block

Global Search Block.png

Global Search Results UI

Global Search Results UI.png

Global Search Admin Solr settings

Global Search Admin Solr Settings.png

Global Search Admin Modules Activation

Global Search Admin Module Activation.png

Global Search Admin Indexing Statistics

Global Search Admin Indexing Statistics.png

Admin Controls

These controls appear under Site Administration > Global Search. (Please refer the screenshots above).

Search Engine

This will enable you to choose your preferred Search Engine. Currently, Global Search only supports Apache Solr.

Solr Settings

This section lists down the various server connection settings that you'll need to configure for your server.

Activated Modules

This section lists down the modules that are enabled in Global Search for indexing and searching of the content within them. You may activate/deactivate modules.

Indexing Statistics

  • This lists down the indexing statistics of Global Search.
  • It also gives you the control for deleting index from specific modules or deleting the entire index in one go. (Deleting "Entire Index" deletes the entire Solr index irrespective of whether a module is activated/deactivated).
  • The content will have to be re-indexed through cron.

Installing PHP Solr extension

UNIX

  • You may download pecl-php-solr extension version 1.0.3-alpha by git clone https://github.com/lukaszkujawa/php-pecl-solr.git or official versions <=1.0.2 from http://pecl.php.net/package/solr. (Extract the contents in a directory)
  • Install the extension dependencies by executing apt-get install libxml2-dev libcurl4-openssl-dev
  • Restart apache server. sudo service apache2 restart
  • Assuming you cloned or downloaded the extension in a directory, you'll have to compile the downloaded extension.
  • cd /your-downloaded-or-cloned-php-solr-extension-directory/
  • phpize
    • This a shell script used to prepare the build environment for a php extension to be compiled. If you don't have phpize, you can install it by executing sudo apt-get install php5-dev

sudo ./configure

  • sudo make
  • sudo make install

The above procedure will compile and install it in the extension_dir directory in the php.ini file. To enable, the installed extension, you could follow any of the following two steps:

1. Navigate to the directory /etc/php5/conf.d and create a new solr.ini file with the following line:

extension=solr.so

OR

2. Open your php.ini file and include the following line:

extension=solr.so

You may follow any of the above two steps. You will need to restart your apache server after that by executing sudo service apache2 restart

You can now view the solr extension details by clicking PHP info from Site administration > Server in browser or php -m in Terminal (Ctrl+Alt+T)

OSX using macports

This method provides an easy install of php solr extension without any downloads.(php solr extension version: <=1.0.2)

- sudo port install apache-solr4
- sudo port install php54-solr

you can choose your relevant available versions @ http://www.macports.org/ports.php?by=name&substr=solr

Setting up Global Search for Moodle

After installing the php-pecl-solr extension, users will have to download the required Apache Solr release (version 4.x for solr-php extension 1.0.3-alpha or 3.x for solr-php extension version <=1.0.2), unzip it and keep it in an external directory of Moodle.

Users will have to replace solconfig.xml and schema.xml inside the downloaded directory example/solr/collection1/conf/ with the ones that Global Search will provide in /search/solr/conf/ directory.

Once the files have been copied and replaced, users will have to start the java jetty server start.jar located in /example/ directory by executing java -jar start.jar. For the production setup you may prefert to run solr on tomcat 6 or 7 and Ubuntu server.

Admins will then have to Enable Global Search in Site Administration > Plugins > Global Search > Manage Global Search

Searchable content in Global Search

Following are the modules/resources covered so far in Global Search. I'll be continuing my work, including other modules shortly as well.

All the contents of the following modules including all uploaded rich document media(PDFs, PPTXs, .TXTs, etc.) will be indexed and made searchable taking proper care of security through Moodle capabilities.

  • Book Resource
  • Forum Module
  • Glossary Module
  • Label Resource
  • Lesson Module
  • Page Resource
  • File Resource
  • Url Resource
  • Wiki Module

The search results will display highlighted matched queries alongwith context links/direct links to the corresponding record.

Implementation and Milestones

  • Writing cron jobs:(17th June - 21st June)
    • Addition of records.
    • Deletion of records.[shouldn't be focused upon just now]
    • Update of records.
  • Design the solr schema and solconfig files. (22nd June - 26th June)
    • These files will be embedded in the in a separate directory under the Global Search directory. Users will have to copy these two files to the Apache Solr example directory that they will download which will run the Solr jetty server. See Installation for more.
    • schema.xml contains all the properties about the documents fields which are being indexed. There may be different fields pertaining to different modules.
    • solrconfig.xml contains the configurational parameters for solr.
  • Writing the core search API for all searchable modules. (27th June - 3rd July)
    • _SEARCH_ITERATOR($from=0)
    • _SEARCH_DOCUMENTS($id)
  • Adding proper security to the search API.(4th July - 8th July)
    • _SEARCH_ACCESS($id)
      • Use of Access API
        • ACCESS_DENIED
        • ACCESS_GRANTED
      • ACCESS_DELETED: Situations where the records may have been deleted-hence not viewable.
  • Reviewing all the above once. (9th July - 10th July)
  • Integrating Apache Tika to handle indexing from external files (PDFs, PPTX etc.) (11th July - 13th July)
  • Writing the search functions for querying. (Just a basic search UI to be used at this moment). (14th July - 17th July)
    • Input for query.
    • Input for filter fields for filtering the search results.
    • Input for AND/OR.
    • Phrase searches.
    • Stemming.
    • Support for wildcards.
  • Admin page for search configuration options. (18th July - 21st July) (The UI page here will be the default type as being currently used. For example, Site Administration>Advance features)
    • Deletion of index.
      • Deletion of entire index in one go.
      • Deletion of index by specific modules only.(For example, only the index of records belonging to 'forum' module is to be deleted).
    • configurational options for cron
      • Time of cron run.
  • Implementing and releasing the first prototype version: 1.0 for developers' feedback. (22nd July - 28th July)
    • See the Prototype section.
  • Preparing for mid-term evaluation. (29th July - 1st August)
  • MID_TERM EVALUATION (2nd August)
  • Re-designing the search page. (2nd August - 7th August) (Taking ideas from community+discussion in forum)
  • Improving the prototype after feedback from developers.(8th August - 15th August)
    • Bug-fixing.
    • Fixing security leaks.
    • Improving relevance & speed of search results.
  • Running Test cases and performance testing. (16th August - 21st August) (Performance testing will be good at this point as the code would have been optimized to some level as instructed by the developers above)
  • Debugging. (22nd August - 29th August)
  • Finalizing the Global Search documentation. (30th August - 6th September)
    • Discussing it with my mentors whether everything has been properly covered or not.
  • Buffer Period. (7th September - 8th September)
    • Making sure everything above has been implemented correctly and efficiently.
  • Submitting my code to Moodle and Google. (9th September - 15th September)
  • Suggested Pencils Down Period. (16th September)
  • Performing edits to the documentation after feedback from the Moodle community. (17th September - 22nd September)
  • Firm Pencils Down and Final Evaluation. (23rd September)

Quick setup for testing

This covers the features and procedure to install Global Search plugin. It would be very good if developers may come forward to test it and give their feedback which would be very crucial for improving it. Developers may check out the cases in the Testing section. Developers may add their own preferred test cases and results (pass/fail) in that section which would be helpful for the other developers or you may comment on Global Search discussion on Developer Forum.

  • Here are some things that I've summarized that may be focused upon:
    • Finding out any security leaks that I may have missed out.
    • Relevance/speed of displaying search results.
    • Taking ideas to optimize the source code wherever possible.
    • Getting feedback.
    • Addition/deletion of any search features that you may feel would be useful.
  • Features included in this prototype:
    • Indexing/Searching of content in all the above mentioned modules/resources.
    • Support for indexing/searching rich documents.
    • Admin configuration options.
    • Support for advanced search queries as stated above.
  • Steps to install and test Global Search plugin:
    • Make sure you've installed your preferred PHP Solr extension (see sections above).
    • You can simply clone my Github branch and checkout gs2rebased branch.
git clone https://github.com/prateeksachan/moodle.git
git checkout gs2rebased
    • Go to Setting up Global Search in Moodle after installing PHP Solr extension section above.
    • Open ../moodle/admin in your browser, and update the admin Global Search settings.
    • Indexing is through cron. You will have to run the cron script to index content.

Search workflow

Simple version

  1. Moodle sends a query to standard solr handler.
  2. solr returns 1000 results. Results include all fields required to render search result row. Results are ordered by relevancy.
  3. Moodle checks each result in order to establish if it should be visible for current user or not
  4. Once 100 visible results are found, Moodle stops checking the rest and displays 100 results

The best case scenario is that Moodle needs to check only 100 returned results (first 100 are accessible for the current user). The worst case scenario is that out of 1000 returned documents, none of them is available for the current user. In this case Moodle performs 1000 checks and displays "no results found" message.

Advanced version

More advanced version will use a logic on solr side to pre-filter some of the results.

Testing

A list of test cases.

New content

  • index
  • add new forum post
  • re-index
  • make sure new content is searchable

Updated content

  • index
  • edit existing forum post
  • re-index
  • make sure new content is searchable
  • make sure old content is not searchable

Deleted content

  • index
  • delete existing forum post
  • re-index
  • make sure deleted content is not searchable

Backup restored

  • index
  • restore whole course from a backup
  • re-index
  • make sure restored content is searchable

Access Tests

Common Access

  • Add courses.
  • Create activity modules/resources that are supported by Global Search (See section Searchable content in Global Search).
  • Insert content or attach files (wherever possible)
  • Index
  • Make sure the results are visible only to those who have access to the respective course's activity/resource.
  • Set Common Module Settings of activities/resources as Hidden.
  • Make sure those activities/resources do not appear in search results.

Forum Activity Module Access

  • Create a course.
  • Create groups in it.
  • Create a Forum Activity.
  • Post in the forum discussions from members of different groups. (+you may also attach files)
  • Index.
  • Make sure members see posts only from those groups which they are a member of.

Lesson Activity Module Access

  • Create a course.
  • Create a Lesson Activity.

1. Type 1

  • Assign Availability
    • Available From only
    • Deadline only
    • Time limit
    • Password Protection
    • Combinations of the above
  • Index.
  • Make sure there are no access leaks when searching as a student.
  • Make sure the teacher, manager are able to see the searches.

2. Type 2

  • Assign Prerequisite Lesson
    • Set Dependent On Prerequisite feature.
    • Set Time spent Prerequisite feature.
    • Toggle completed Prerequisite feature.
    • Set Grade Prerequisite feature.
    • Combinations of the above.
  • Index
  • Make sure there are no access leaks when searching as a student.
  • Make sure the teacher, manager are able to see the searches.

Wiki Activity Module Access

  • Create a course.
  • Create Groups.
  • Create a Wiki Activity.
  • Set Group Mode On

1. Type 1

  • Group Mode: Separate Groups
  • Add Wikipages. (+you may also attach files)
  • Index
  • Make sure there are no access leaks: A group shouldn't see results from other groups.

2. Type 2

  • Group Mode: Visible Groups
  • Add Wikipages.(+you may also attach files)
  • Index
  • Make sure there are no access leaks: A group shouldn't see results from other groups.

Results

Credits

See also

Personal tools
User docs (English)