Difference between revisions of "Global search (GSoC2013)"

Jump to: navigation, search
(Installation)
(Installation - OSX using macports)
Line 223: Line 223:
 
====Installation - OSX using macports====
 
====Installation - OSX using macports====
 
  - sudo port install apache-solr4
 
  - sudo port install apache-solr4
  - sudo port install php-solr
+
  - sudo port install php54-solr
 +
 
 +
you can choose your relevant available versions @ http://www.macports.org/ports.php?by=name&substr=solr
  
 
== Credits ==
 
== Credits ==

Revision as of 07:27, 8 July 2013

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

Global search
Project state Coding period
Tracker issue MDL-31989
Discussion Writing Moodle's Global Search
Assignee Prateek Sachan

GSOC '13

Introduction

Global Search will have the feature of searching keywords within the entire Moodle site across modules keeping the security intact.

  • It will display results based on relevance weightage.
  • Security will be preserved throughout the search.
  • Search Modules will enable chosen search engine integration with ease. Admins will have the option for selecting the modules that could be made "searchable"
  • It will include keywords from other files types (like PDFs, PPTs, HTML content and others).
  • Following are the features that I'm considering in implementing in the first version of Global Search:
  1. Groupings of
    AND
    and
    OR
    . Eg.:
    ("query1" AND "query2") OR "query3"
  2. Searching for phrases. Results with matched phrase will have higher priority and hence will be shown higher in the results.
  3. Wildcard (
    *
    ) (
    ?
    ) feature.
  4. Stemming. Eg.:
    bag
    will return results both from
    bag
    and
    bags

Searchable content

This section illustrates the content that will be indexed to be made searchable. Necessary security will be implemented at appropriate places.

Book Resource

This section outlines the content that will be indexed and made searchable from within the Book Resource.

  • Name of Book.
  • Introductory content of the book.
  • Title of all chapters within the book.
  • Content of all chapters within the book.

The search results will give a link to the chapter.

Forum Module

This section outlines the content that will be indexed and made searchable from within the Forum Module.

  • All forum posts.
  • User's name who have posted on the forum.
  • Attached files in the posts. (PDFs, PPTX, etc.) (Necessary security will be checked. Users will not be able to see the file's link unless they have the necessary capability to do so.)

The search results will give a link to a forum's post.

Label Resource

This section outlines the content that will be indexed and made searchable from within the Label Resource.

  • All Label names.
  • Introductory content.

The search results will give a Label's course link.


Lesson Module

This section outlines the content that will be indexed and made searchable from within the Lesson Module.

  • All Lesson names.
  • Lesson Media files. (To be discussed with my mentors).
  • Lesson pages within each lesson.
    • Content Page: Page's title and content.
    • Cluster: Cluster's title and content.
    • Question: Question's title and Content.

The search results will give a link to the lesson page.

Page Resource

This section outlines the content that will be indexed and made searchable from within the Page Resource.

  • Name of Page.
  • Introductory content of Page.
  • Content of Page.

The search results will give a link to the page.

Url Resource

This section outlines the content that will be indexed and made searchable from within the Url Resource.

  • Name of Url.
  • Introductory content of Url.
  • External link of Url resource.

The search results will give a link to the Url Resource.

Wiki Module

This section outlines the content that will be indexed and made searchable from within the Wiki Module.

  • Name of Wiki.
  • Introductory content of Wiki.
  • Comments. (To be discussed with my mentors)
  • Files.(To be discussed with my mentors)
  • All Wiki Sub-Pages.
    • Title of Wiki Sub-Pages.
    • Content of Wiki Sub-Pages.

The search results will give a link to the wiki sub-page.

Implementation and Milestones

  • Writing cron jobs:(17th June - 21st June)
    • Addition of records.
    • Deletion of records.[shouldn't be focused upon just now]
    • Update of records.
  • Design the solr schema and solconfig files. (22nd June - 26th June)
    • These files will be embedded in the in a separate directory under the Global Search directory. Users will have to copy these two files to the Apache Solr example directory that they will download which will run the Solr jetty server. See Installation for more.
    • schema.xml
      contains all the properties about the documents fields which are being indexed. There may be different fields pertaining to different modules.
    • solrconfig.xml
      contains the configurational parameters for solr.
  • Writing the core search API for all searchable modules. (27th June - 3rd July)
    • _SEARCH_ITERATOR($from=0)
    • _SEARCH_DOCUMENTS($id)
  • Adding proper security to the search API.(4th July - 8th July)
    • _SEARCH_ACCESS($id)
      • Use of Access API
        • ACCESS_DENIED
        • ACCESS_GRANTED
      • ACCESS_DELETED: Situations where the records may have been deleted-hence not viewable.
  • Reviewing the all the above once. (9th July - 10th July)
  • Integrating Apache Tika to handle indexing from external files (
    PDFs
    ,
    PPTX
    etc.) (11th July - 13th July)
  • Writing the search functions for querying. (Just a basic search UI to be used at this moment). (14th July - 17th July)
    • Input for query.
    • Input for filter fields for filtering the search results.
    • Input for AND/OR.
    • Phrase searches.
    • Stemming.
    • Support for wildcards.
  • Admin page for search configuration options. (18th July - 27th July) (The UI page here will be the default type as being currently used. For example, Site Administration>Advance features)
    • Deletion of index.
      • Deletion of entire index in one go.
      • Deletion of index by specific modules only.(For example, only the index of records belonging to 'forum' module is to be deleted).
    • configurational options for cron
      • Time of cron run.
  • Preparing for mid-term evaluation. (28th July - 1st August)
  • MID_TERM EVALUATION (2nd August)
  • Re-designing the search page. (2nd August - 7th August) (Taking ideas from community+discussion in forum)
  • Implementing the first prototype. (8th August - 15th August)
    • Asking the community developers to test it.
      • Taking their guidance to optimize the code wherever possible.
      • Getting feedback.
  • Running Test cases and performance testing. (16th August - 21st August) (Performance testing will be good at this point as the code would have been optimized to some level as instructed by the developers above)
  • Debugging. (22nd August - 29th August)
  • Finalizing the Global Search documentation. (30th August - 6th September)
    • Discussing it with my mentors whether everything has been properly covered or not.
  • Buffer Period. (7th September - 8th September)
    • Making sure everything above has been implemented correctly and efficiently.
  • Submitting my code to Moodle and Google. (9th September - 15th September)
  • Suggested Pencils Down Period. (16th September)
  • Performing edits to the documentation after feedback from the Moodle community. (17th September - 22nd September)
  • Firm Pencils Down and Final Evaluation. (23rd September)

Search workflow

Simple version

  1. Moodle sends a query to standard solr handler.
  2. solr returns 1000 results. Results include all fields required to render search result row. Results are ordered by relevancy.
  3. Moodle checks each result in order to establish if it should be visible for current user or not
  4. Once 100 visible results are found, Moodle stops checking the rest and displays 100 results

The best case scenario is that Moodle needs to check only 100 returned results (first 100 are accessible for the current user). The worst case scenario is that out of 1000 returned documents, none of them is available for the current user. In this case Moodle performs 1000 checks and displays "no results found" message.

Advanced version

More advanced version will use a logic on solr side to pre-filter some of the results.

Installation

For using Global Search, users will have to install the PHP Solr PECL extension on server.

Following is the procedure for installing the extension in UNIX:

There are two dependencies of the extension:

  • CURL extension (libcurl 7.15.0 or later is required)
  • LIBXML extension (libxml2 2.6.26 or later is required)

Test whether the required extensions are installed or not by executing the following in a php file (Remember to delete the file as it has important information about your system):

echo phpinfo();

If the system does not have required versions of libcurl or libxml libraries, follow the steps given below. You will have to download the libraries

and compile them from source into a separate install prefix.

  1. For libcurl:
    wget http://curl.haxx.se/download/curl-7.19.6.tar.gz

tar -zxvf curl-7.19.6.tar.gz cd curl-7.19.6 sudo ./configure --prefix=/root/custom/software sudo make sudo make install

  1. For libxml:
    wget ftp://xmlsoft.org/libxml2/libxml2-2.7.6.tar.gz

tar -zxvf libxml2-2.7.6.tar.gz cd libxml2-2.7.6 sudo ./configure --prefix=/root/custom/software sudo make sudo make install

After installing the above dependencies, you will need to restart your apache server by executing
sudo service apache2 restart
Next, you will be ready to install the PECL extension for Solr by cloning the following repository for Solr 4.x. (Please Note: Current;y, the official
php-pecl-solr
is not compatible with
Solr 4.x
. The following repository provides a small fix to make it compatible with
Solr 4.x
and will go to the official release.)
  • git clone https://github.com/lukaszkujawa/php-pecl-solr.git
  • cd php-pecl-solr/
  • phpize
    • This a shell script used to prepare the build environment for a php extension to be compiled. If you don't have
      phpize
      , you can install it by executing
      sudo apt-get install php5-dev
If the
libxml2
and
libcurl
libraries were compiled from source, then you will have to pass the
libcurl
prefix to the configure script for
CURL
and
LIBXML
respectively as shown below:
  • sudo ./configure --enable-solr --with-curl=/root/custom/software --with-libxml-dir=/root/custom/software
  • If you already have the latest versions of the libraries then executing
    sudo ./configure
    is sufficient.
  • sudo make
  • sudo make install
The above procedure will compile and install it in the
extension_dir
directory in the
php.ini
file. To enable, the installed extension, you could follow any of the following two steps: 1. Navigate to the directory
etc/php5/conf.d
and create a new
solr.ini
file with the following line:
extension=solr.so

OR

2. Open your
php.ini
file and include the following line:
extension=solr.so
You may follow any of the above two steps. You will need to restart your apache server after that by executing
sudo service apache2 restart
You can now view the solr extension by executing
echo phpinfo();
in browser or
php -m
in Terminal (
Ctrl+Alt+T
)

After installing the php-pecl-solr extension, users will have to download Apache Solr, unzip it and keep it in an external directory of Moodle.

Users will have to replace
solconfig.xml
and
schema.xml
inside the downloaded directory
/example/solr/collection1/conf
with the ones that Global Search will provide. Once, the files have been copied and replaced, users will have to start the java jetty server
start.jar
located in
/example/
directory by executing
java -jar start.jar
.

Installation - OSX using macports

- sudo port install apache-solr4
- sudo port install php54-solr

you can choose your relevant available versions @ http://www.macports.org/ports.php?by=name&substr=solr

Credits

See also