Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Logging 2: Difference between revisions

From MoodleDocs
No edit summary
Line 457: Line 457:
There are four steps of how event is converted into report:
There are four steps of how event is converted into report:


# Handling of events and filtering what needs to be logged ("What to log")
# Handling of events and filtering what needs to be logged.
# Storing the events data in the log storage (DB, filesystem, etc.) ("How to log")
# Storing the events data in the log storage (DB, filesystem, etc.).
# Retrieving the data from log storage - each plugin implements some methods to query and extract logs back to Moodle
# Retrieving the data from log storage - each plugin implements some methods to query and extract logs back to Moodle.
# Displaying the data in form of the report ("How to display")
# Displaying the data in form of the report.


One plugin can cover one or several steps. It is also possible to create a report that has built-in logging and listens to events (all 4 steps). Also external logging systems do not need to care about steps 3 and 4 at all.
One plugin can cover one or several steps. It is also possible to create a report that has built-in logging and listens to events (all 4 steps). Also external logging systems do not need to care about steps 3 and 4 at all.

Revision as of 07:13, 24 May 2013

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

Logging stage 2
Project state In very early specification
Tracker issue MDL-37658
Discussion
Assignee moodle.com DEV team


Why is Logging an important topic?

Logging is fundamental to research, reporting and analytics.

  • Not much of new Moodle development is based on measured data. We need to
    • Prove there is a problem
    • Prove that a given change is an improvement.
  • Data comes from logging
  • Better data would allow researchers to study what happens in online teaching.
  • Better data would give admins more feedback on how to run their site (both technically and process)
  • Better data would give teachers better feedback to improve their teaching process.
  • Better data would give students better feedback to improve the learning process.
  • We can't predict all the analysis that might happen in future.


Examples of possible developments that would require better logs

Here are some random ideas, please add more!

  • Timely notifications of critical events (defined by users), eg more than five people posted in the same forum within an hour.
  • "Red circle" notification badge counter icons everywhere in Moodle highlighting things that need attention for each user.
  • Colour heatmaps overlaid on course page showing recent usage patterns on a course
  • Visualisation of live activity across a whole site (for admins)
  • Live engagement analytics that take every little action into account
  • Identification of students at risk based on complete history

Existing Problems

  • Logging is added adhoc by developers, has spotty coverage. No admin logging!
  • Some logged actions do not contain necessary information.
  • Performance and scalability problem.
    • Log tables are joined in many popular queries and are slow.
    • All logs go to database and it's not scalable.
    • It is not possible to configure level of logging
    • Large number of database writes even when carrying out ‘read-only’ requests.
  • No archiving, so have to delete old logs.
  • Deletion is wholesale, no control.
  • No synchronization with server logs to help debugging performance issues
  • No storage of traces when there are problems, so we can debug Moodle issues

Existing log reading analysis

Files that make use of get_logs*

There are core functions (get_logs, get_logs_usercourse and get_logs_userday) but other classes also contain functions with the same name which are not related to the mdl_log table. Will we will be replacing these?

Locations using or declaring the functions get_logs*

backup/util/structure/backup_nested_element.class.php

The backup process contains it's own get_logs function that does not obtain data from the Moodle log table, but it's own backup_logs table. The structure is as follows -

id bigint backupid character varying(32) loglevel smallint message character varying(255) timecreated bigint

A few things to consider. How will this integrate into the new system with the backupid? Which field would that be placed in? It also has a loglevel, is that going to be a universal log level or unique to backup? If so, which fields to store in? Are we just going to leave it as it is?

course/lib.php

This is called for the function build_logs_array which is only ever called when the user has requested to print the logs for the course via the log report (report/log). So, is not called all the time (fortunately).

lib/datalib.php

This is where the core log functions are created, but are not used here. The functions get_logs_usercourse and get_logs_userday are only ever called in the report/log/graphs.php.

mod/workshop/allocation/scheduled/lib.php

This is a stdClass unique to the workshop that is json_encoded before being saved in the database field resultlog in the table 'workshopallocation_scheduled'.

The log stdClass is as follows. $log = new stdClass(); $log->message = $message; $log->type = $type; $log->indent = $indent;

report/log/graph.php

Self explanatory. This is a report based on the log records. Could potentially execute large queries but at least it is only performed when requested by user.

Files that contain usages of {log} in database queries

auth/mnet/auth.php

$mnethostlogssql = " SELECT

   mhostlogs.remoteid, mhostlogs.time, mhostlogs.userid, mhostlogs.ip,
   mhostlogs.course, mhostlogs.module, mhostlogs.cmid, mhostlogs.action,
   mhostlogs.url, mhostlogs.info, mhostlogs.username, c.fullname as coursename,
   c.modinfo

FROM

   (
        SELECT
            l.id as remoteid, l.time, l.userid, l.ip, l.course, l.module, l.cmid,
            l.action, l.url, l.info, u.username
        FROM
             {user} u
             INNER JOIN {log} l on l.userid = u.id
        WHERE
             u.mnethostid = ?
             AND l.id > ?
        ORDER BY remoteid ASC
        LIMIT 500
   ) mhostlogs
   INNER JOIN {course} c on c.id = mhostlogs.course

ORDER by mhostlogs.remoteid ASC";

Function keepalive_client: A heavy SQL query is performed that joins the log table to check whether the user who logged in via mnet is still active.

lib/statslib.php

$sql = "INSERT INTO {stats_user_weekly} (stattype, timeend, courseid, userid, statsreads)

       SELECT 'logins', timeend, courseid, userid, COUNT(statsreads)
          FROM (
                 SELECT $nextstartweek AS timeend, ".SITEID." as courseid, l.userid, l.id AS statsreads
                   FROM {log} l
                 WHERE action = 'login' AND $logtimesql
               ) inline_view
        GROUP BY timeend, courseid, userid
          HAVING COUNT(statsreads) > 0";

$DB->execute($sql);

Function stats_cron_weekly: Used to gather weekly statistics. Retrieves all 'login' actions in the log table between a certain time period and inserts them into the stats_user_weekly table.

$sql = "INSERT INTO {stats_user_monthly} (stattype, timeend, courseid, userid, statsreads)

       SELECT 'logins', timeend, courseid, userid, COUNT(statsreads)
         FROM (
                  SELECT $nextstartmonth AS timeend, ".SITEID." as courseid, l.userid, l.id AS statsreads
                    FROM {log} l
                   WHERE action = 'login' AND $logtimesql
              ) inline_view
     GROUP BY timeend, courseid, userid";

$DB->execute($sql);

Function stats_cron_monthly: Used to gather monthly statistics. Retrieves all 'login' actions in the log table between a certain time period and inserts them into the stats_user_monthly table.

if ($firstlog = $DB->get_field_sql('SELECT MIN(time) FROM {log}')) {

   return $firstlog;

}

Function stats_get_start_from: Returns starting date of the statistics. If there are currently no stats the variable $CFG->statsfirstrun is checked, and if it is equal to 'all', then a simple database query is performed on the log table to retrieve the lowest time recorded.

$sql = 'INSERT INTO {temp_log1} (userid, course, action)

       SELECT userid, course, action FROM {log}
        WHERE time >= ? AND time < ?';

$DB->execute($sql, array($timestart, $timeend));

Function stats_temp_table_fill: Fills the temporary stats tables (temp_log1 and temp_log2) with new data by performing a single select query on the log table to retrieve data from a given time range.

lib/datalib.php

These are where the functions mentioned in the section above are used as they perform the actual query on the log table.

  1. Function get_logs.
  2. Function get_logs_usercourse.
  3. Function get_logs_userday.

lib/deprecatedlib.php

$sql = "SELECT u.id, u.firstname, u.lastname, MAX(l.time)

         FROM {user} u, {role_assignments} ra, {log} l
        WHERE l.time > ?
              AND l.course = ?
              AND l.module = 'course'
              AND l.action = 'enrol'
              AND ".$DB->sql_cast_char2int('l.info')." = u.id
              AND u.id = ra.userid
              AND ra.contextid ".get_related_contexts_string($context)."
     GROUP BY u.id, u.firstname, u.lastname
     ORDER BY MAX(l.time) ASC";

$params = array($timestart, $courseid); return $DB->get_records_sql($sql, $params);

Function get_recent_enrolments: Performs a query with the user, role_assignments and log table to return recent enrolments.

lib/cronlib.php

$sql = "SELECT info, count(*)

         FROM {log}
        WHERE module = 'login' AND action = 'error'
              AND time > ?
     GROUP BY ip
       HAVING count(*) >= ?";

Function notify_login_failures: The above query is performed twice except the group by variable changes.

  1. Gets all the IPs with more than notifyloginthreshold failures since lastnotifyfailure and insert them into the cache_flags temp table.
  2. Get all the INFOs with more than notifyloginthreshold failures since lastnotifyfailure and insert them into the cache_flags temp table.

The following is then performed -

$sql = "SELECT * FROM (

       SELECT l.*, u.firstname, u.lastname
             FROM {log} l
             JOIN {cache_flags} cf ON l.ip = cf.name
        LEFT JOIN {user} u         ON l.userid = u.id
            WHERE l.module = 'login' AND l.action = 'error'
                  AND l.time > ?
                  AND cf.flagtype = 'login_failure_by_ip'
       UNION ALL
           SELECT l.*, u.firstname, u.lastname
             FROM {log} l
             JOIN {cache_flags} cf ON l.info = cf.name
        LEFT JOIN {user} u         ON l.userid = u.id
            WHERE l.module = 'login' AND l.action = 'error'
                  AND l.time > ?
                  AND cf.flagtype = 'login_failure_by_info') t
       ORDER BY t.time DESC";
   $params = array($CFG->lastnotifyfailure, $CFG->lastnotifyfailure);

This selects all the login errors logged belonging to the IPs and infos since lastnotifyfailure that are stored in the cache_flags table.

report/outline/index.php

if (!$logstart = $DB->get_field_sql("SELECT MIN(time) FROM {log}")) {

   print_error('logfilenotavailable');

}

A simple query to return the minimum time in the log table - used to make sure there are logs to report.

$sql = "SELECT cm.id, COUNT('x') AS numviews, MAX(time) AS lasttime

         FROM {course_modules} cm
              JOIN {modules} m ON m.id = cm.module
              JOIN {log} l     ON l.cmid = cm.id
        WHERE cm.course = ? AND l.action LIKE 'view%' AND m.visible = 1
     GROUP BY cm.id";

$views = $DB->get_records_sql($sql, array($course->id));

Returns the number of views a module has had.

report/participation/index.php

$minlog = $DB->get_field_sql('SELECT min(time) FROM {log} WHERE course = ?', array($course->id));

A simple query to return the minimum time in the log table - used in later query.

$sql = "SELECT ra.userid, u.firstname, u.lastname, u.idnumber, l.actioncount AS count

       FROM (SELECT * FROM {role_assignments} WHERE contextid $relatedcontexts AND roleid = :roleid ) ra
       JOIN {user} u ON u.id = ra.userid
       LEFT JOIN (
           SELECT userid, COUNT(action) AS actioncount FROM {log} WHERE cmid = :instanceid AND time > :timefrom AND $actionsql GROUP BY userid
       ) l ON (l.userid = ra.userid)";

A more complex query used to return the number of times a user has either posted or viewed a module.

Files that make use of $DB functions to query the logs table directly

I am ignoring the usages of log_display - based on what Petr said I assume we can just get rid of this completely

admin/handlevirus.php

$log = $DB->get_record("log", array("module"=>"upload", "info"=>$file, "action"=>"upload")

A simple check to see if the file was recorded as being uploaded.

backup/util/helper/backup_cron_helper.class.php

$logexists = $DB->record_exists_select('log', $sqlwhere, $params)

Function run_automated_backup: This query is performed twice in this function. It is used to check the log if there were any modifications to the course content.

blocks/recent_activity/block_recent_activity.php

$logs = $DB->get_records_select('log',

                               "time > ? AND course = ? AND
                                module = 'course' AND
                                (action = 'add mod' OR action = 'update mod' OR action = 'delete mod')",
                                array($timestart, $course->id), "id ASC");

Function get_structural_changes: Returns list of recent changes in the course structure. Used only within this block.

course/lib.php

$params = array('userid'=>$USER->id, 'url'=>"view.php?id=$courseid", 'since'=>$since); $select = "module = 'course' AND action = 'new' AND userid = :userid AND url = :url AND time > :since";

return $DB->record_exists_select('log', $select, $params);

Function can_delete_course: If the user does not have the capability moodle/course:delete but has the capability moodle/course:create. The above checks when the user created the course, and if it was less than a day ago they can delete it.

lib/datalib.php

if (is_siteadmin()) {

   if ($count->attempts = $DB->count_records_select('log', $select, $params)) {
       $count->accounts = $DB->count_records_select('log', $select, $params, 'COUNT(DISTINCT info)');
       return $count;
   }

} else if ($mode = 'everybody') {

   if ($count->attempts = $DB->count_records_select('log', "$select AND info = :username", $params)) {
       return $count;
   }

}

Function count_login_failures: Is used by renderers (both lib/outputrenderers.php and mymobile/renderers.php).

lib/deprecatedlib.php

$logs = $DB->get_records_select('log', "time > ? AND course = ? AND

                                       module = 'course' AND
                                       (action = 'add mod' OR action = 'update mod' OR action = 'delete mod')",
                                       array($timestart, $course->id), "id ASC");

Function print_recent_activity - self explanatory.

lib/uploadlib.php

if (!$record = $DB->get_record('log', array('info'=>$oldpath, 'module'=>'upload'))) {

   return false;

}

Function clam_change_log: This function does not seem to be used anywhere within Moodle.

mod/book/lib.php

$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'book',

                                     'action'=>'view', 'info'=>$folder->id), 'time ASC')

  1. Used in the function book_user_outline.

mod/folder/lib.php

$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'folder',

                                     'action'=>'view', 'info'=>$folder->id), 'time ASC')

  1. Used in the function folder_user_outline.
  2. Used in the function folder_user_complete.

mod/imscp/lib.php

$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'imscp',

                                     'action'=>'view', 'info'=>$folder->id), 'time ASC')

  1. Used in the function imscp_user_outline.
  2. Used in the function imscp_user_complete.

mod/page/lib.php

$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'page',

                                     'action'=>'view', 'info'=>$folder->id), 'time ASC')

  1. Used in the function page_user_outline.
  2. Used in the function page_user_complete.

mod/resource/lib.php

$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'resource',

                                     'action'=>'view', 'info'=>$folder->id), 'time ASC')

  1. Used in the function resource_user_outline.
  2. Used in the function resource_user_complete.

mod/url/lib.php

$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'url',

                                     'action'=>'view', 'info'=>$folder->id), 'time ASC')

  1. Used in the function url_user_outline.
  2. Used in the function url_user_complete.

Places where we need more logs

  • Errors and warnings
    • People report errors, but cannot duplicate it.
    • Logging history with stack trace would be very useful
    • Also dump session variables and all other data that can be useful for debugging
  • Admin activities
  • Micro activities in a page (more than one per load)
  • AJAX calls (eg on course page)
  • Should be careful about logging confidential information (e.g. do not log any POST data)
  • Shouldn’t delete logs when course is deleted
  • All events should all be logged
    • Add more events (they are cheap and very useful)
    • Need more standardisation of event data for all triggered events
    • Add a catch-all handler that pushes events to the logging system
  • --> Action logging improvements META - MDL-28443

Logging Proposal

This section is deliberately not called "Logging API" because Moodle does not provide any logging API. All plugins that perform logging and analyse logs to display reports can do it as they wish. Events API defines how to subscribe to events and use them.

  • In places where we previously added information to 'the' log we must generate the new event.
  • The standard Moodle distribution will include simple DB logging plugin storing the data in new format. There will also be an section (optional) legacy logging plugin that will store data in the old format in {log} table. All standard reports will be upgraded to use the new logging system that will use a logging retrieval plugin to access log information. Custom report plugins will be able to query the {log} table, but should ideally shift to the new logging system over time, so that organisations are not forced to continue double logging.

From event to report

There are four steps of how event is converted into report:

  1. Handling of events and filtering what needs to be logged.
  2. Storing the events data in the log storage (DB, filesystem, etc.).
  3. Retrieving the data from log storage - each plugin implements some methods to query and extract logs back to Moodle.
  4. Displaying the data in form of the report.

One plugin can cover one or several steps. It is also possible to create a report that has built-in logging and listens to events (all 4 steps). Also external logging systems do not need to care about steps 3 and 4 at all.

Moodle standard distribution provides a suggestion on logging-report chain that can be followed by 3rd party plugins and may be not.

Logging plugins relation

Standard logging plugins

Standard Moodle 2.6 distribution will include 4 plugins:

Legacy logging plugin (tool_loglegacy)

Covers steps 1-3 from above. Listens to all events, for events that have non-empty 'legacylog' property adds the data to {log} table using the 'legacylog' information.

Core functions get_logs(), get_logs_usercourse(), get_logs_userday() will be checking if this plugin is installed and redirect to its appropriate functions.

All report plugins that use table {log} should use this plugin.

Event logging handling plugin (tool_eventobserver)

Responsible for step 1 from above. Has functions pluginname_register_log_instance() and pluginname_unregister_log_instance() that allow to add/remove log storage instance and also allows admin to configure what kind of events to store for each log instance.

Event logging DB storage plugin (tool_logdbstorage)

Responsible for step 2 from above. Has function pluginname_get_log_instances(). Allows admin to create storage instances and assign event handling plugin to fill each of them

Event logging driver plugin (tool_logdbdriver)

Responsible for step 3 from above. This plugin can only work with tool_logdbstorage. It provides functions to access data in log.

Summary

We can't actually predict all the use cases so we must think generically and plan for worst cases.

  • Things that don't need to use logs should not use logs. (Recent activity, etc)
  • Make logging calls as cheap as possible
  • Use Events API for the originating calls to create lots of hooks for other things
  • Use MUC-like plug-ins to determine where to put logs permanently (NoSQL, SAS, file...). Default will be to the current log table.
  • Log everything we possibly can think of.
  • Define log level settings (on each logging call) so different sites can choose what they want logged and so control size, speed etc
  • add_to_log retained for backward compatibility, make it a wrapper for new logger function.
  • provide hooks to expose logs everywhere. For example, each page should have a link that shows logs and stats for that page.

See also

Example logging systems:

Examples of data-based learning technology development: