Logging 2
Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.
Logging stage 2 | |
---|---|
Project state | In very early specification |
Tracker issue | MDL-37658 |
Discussion | |
Assignee | moodle.com DEV team |
Why is Logging an important topic?
Logging is fundamental to research, reporting and analytics.
- Not much of new Moodle development is based on measured data. We need to
- Prove there is a problem
- Prove that a given change is an improvement.
- Data comes from logging
- Better data would allow researchers to study what happens in online teaching.
- Better data would give admins more feedback on how to run their site (both technically and process)
- Better data would give teachers better feedback to improve their teaching process.
- Better data would give students better feedback to improve the learning process.
- We can't predict all the analysis that might happen in future.
Examples of possible developments that would require better logs
Here are some random ideas, please add more!
- Timely notifications of critical events (defined by users), eg more than five people posted in the same forum within an hour.
- "Red circle" notification badge counter icons everywhere in Moodle highlighting things that need attention for each user.
- Colour heatmaps overlaid on course page showing recent usage patterns on a course
- Visualisation of live activity across a whole site (for admins)
- Live engagement analytics that take every little action into account
- Identification of students at risk based on complete history
Problems
- Logging is added adhoc by developers, has spotty coverage. No admin logging!
- Log tables are joined in many popular queries and are slow.
- No archiving, so have to delete old logs
- Deletion is wholesale, no control.
- No synchronization with server logs to help debugging performance issues
- No storage of traces when there are problems, so we can debug Moodle issues
Users of log table at the moment
A lot of the time-critical uses of the log table can be moved to new tables specifically for this purpose. It will speed things up a lot.
- recent activity (block, myMoodle) EVENTS -> NEW TABLE
- conditional access EVENTS -> NEW TABLE
- failed logins EVENTS -> NEW TABLE
- files via clamav EVENTS -> NEW TABLE
Other things can can continue to query the main logs:
- course activity report, participation report, user outline report
- stats
- cron
- automated backup locking/heartbeats (probably)
- auth/mnet/auth.php
- backup/restore (target logs and own logging)
- grade history (currently separate log)
And some future things that will need the full log:
- Engagement reporting
Places where we need more logs
- Errors and warnings
- People report errors, but cannot duplicate it.
- Logging history with stack trace would be very useful
- Also dump session variables and all other data that can be useful for debugging
- Admin activities
- Micro activities in a page (more than one per load)
- AJAX calls (eg on course page)
- Should be careful about logging confidential information (e.g. do not log any POST data)
- Shouldn’t delete logs when course is deleted
- All events should all be logged
- Add more events (they are cheap and very useful)
- Need more standardisation of event data for all triggered events
- Add a catch-all handler that pushes events to the logging system
- --> Action logging improvements META - MDL-28443
Existing log reading analysis
Files that make use of get_logs*
There are core functions (get_logs, get_logs_usercourse and get_logs_userday) but other classes also contain functions with the same name which are not related to the mdl_log table. Will we will be replacing these?
Locations using or declaring the functions get_logs*
backup/util/structure/backup_nested_element.class.php
The backup process contains it's own get_logs function that does not obtain data from the Moodle log table, but it's own backup_logs table. The structure is as follows -
id bigint
backupid character varying(32)
loglevel smallint
message character varying(255)
timecreated bigint
A few things to consider. How will this integrate into the new system with the backupid? Which field would that be placed in? It also has a loglevel, is that going to be a universal log level or unique to backup? If so, which fields to store in? Are we just going to leave it as it is?
course/lib.php
This is called for the function build_logs_array which is only ever called when the user has requested to print the logs for the course via the log report (report/log). So, is not called all the time (fortunately).
lib/datalib.php
This is where the core log functions are created, but are not used here. The functions get_logs_usercourse and get_logs_userday are only ever called in the report/log/graphs.php.
mod/workshop/allocation/scheduled/lib.php
This is a stdClass unique to the workshop that is json_encoded before being saved in the database field resultlog in the table 'workshopallocation_scheduled'.
The log stdClass is as follows.
$log = new stdClass();
$log->message = $message;
$log->type = $type;
$log->indent = $indent;
report/log/graph.php
Self explanatory. This is a report based on the log records. Could potentially execute large queries but at least it is only performed when requested by user.
Files that contain usages of {log} in database queries
auth/mnet/auth.php
$mnethostlogssql = "
SELECT
mhostlogs.remoteid, mhostlogs.time, mhostlogs.userid, mhostlogs.ip,
mhostlogs.course, mhostlogs.module, mhostlogs.cmid, mhostlogs.action,
mhostlogs.url, mhostlogs.info, mhostlogs.username, c.fullname as coursename,
c.modinfo
FROM
(
SELECT
l.id as remoteid, l.time, l.userid, l.ip, l.course, l.module, l.cmid,
l.action, l.url, l.info, u.username
FROM
{user} u
INNER JOIN {log} l on l.userid = u.id
WHERE
u.mnethostid = ?
AND l.id > ?
ORDER BY remoteid ASC
LIMIT 500
) mhostlogs
INNER JOIN {course} c on c.id = mhostlogs.course
ORDER by mhostlogs.remoteid ASC";
Function keepalive_client: A heavy SQL query is performed that joins the log table to check whether the user who logged in via mnet is still active.
lib/statslib.php
$sql = "INSERT INTO {stats_user_weekly} (stattype, timeend, courseid, userid, statsreads)
SELECT 'logins', timeend, courseid, userid, COUNT(statsreads)
FROM (
SELECT $nextstartweek AS timeend, ".SITEID." as courseid, l.userid, l.id AS statsreads
FROM {log} l
WHERE action = 'login' AND $logtimesql
) inline_view
GROUP BY timeend, courseid, userid
HAVING COUNT(statsreads) > 0";
$DB->execute($sql);
Function stats_cron_weekly: Used to gather weekly statistics. Retrieves all 'login' actions in the log table between a certain time period and inserts them into the stats_user_weekly table.
$sql = "INSERT INTO {stats_user_monthly} (stattype, timeend, courseid, userid, statsreads)
SELECT 'logins', timeend, courseid, userid, COUNT(statsreads)
FROM (
SELECT $nextstartmonth AS timeend, ".SITEID." as courseid, l.userid, l.id AS statsreads
FROM {log} l
WHERE action = 'login' AND $logtimesql
) inline_view
GROUP BY timeend, courseid, userid";
$DB->execute($sql);
Function stats_cron_monthly: Used to gather monthly statistics. Retrieves all 'login' actions in the log table between a certain time period and inserts them into the stats_user_monthly table.
if ($firstlog = $DB->get_field_sql('SELECT MIN(time) FROM {log}')) {
return $firstlog;
}
Function stats_get_start_from: Returns starting date of the statistics. If there are currently no stats the variable $CFG->statsfirstrun is checked, and if it is equal to 'all', then a simple database query is performed on the log table to retrieve the lowest time recorded.
$sql = 'INSERT INTO {temp_log1} (userid, course, action)
SELECT userid, course, action FROM {log}
WHERE time >= ? AND time < ?';
$DB->execute($sql, array($timestart, $timeend));
Function stats_temp_table_fill: Fills the temporary stats tables (temp_log1 and temp_log2) with new data by performing a single select query on the log table to retrieve data from a given time range.
lib/datalib.php
These are where the functions mentioned in the section above are used as they perform the actual query on the log table.
- Function get_logs.
- Function get_logs_usercourse.
- Function get_logs_userday.
lib/deprecatedlib.php
$sql = "SELECT u.id, u.firstname, u.lastname, MAX(l.time)
FROM {user} u, {role_assignments} ra, {log} l
WHERE l.time > ?
AND l.course = ?
AND l.module = 'course'
AND l.action = 'enrol'
AND ".$DB->sql_cast_char2int('l.info')." = u.id
AND u.id = ra.userid
AND ra.contextid ".get_related_contexts_string($context)."
GROUP BY u.id, u.firstname, u.lastname
ORDER BY MAX(l.time) ASC";
$params = array($timestart, $courseid);
return $DB->get_records_sql($sql, $params);
Function get_recent_enrolments: Performs a query with the user, role_assignments and log table to return recent enrolments.
lib/cronlib.php
$sql = "SELECT info, count(*)
FROM {log}
WHERE module = 'login' AND action = 'error'
AND time > ?
GROUP BY ip
HAVING count(*) >= ?";
Function notify_login_failures: The above query is performed twice except the group by variable changes.
- Gets all the IPs with more than notifyloginthreshold failures since lastnotifyfailure and insert them into the cache_flags temp table.
- Get all the INFOs with more than notifyloginthreshold failures since lastnotifyfailure and insert them into the cache_flags temp table.
The following is then performed -
$sql = "SELECT * FROM (
SELECT l.*, u.firstname, u.lastname
FROM {log} l
JOIN {cache_flags} cf ON l.ip = cf.name
LEFT JOIN {user} u ON l.userid = u.id
WHERE l.module = 'login' AND l.action = 'error'
AND l.time > ?
AND cf.flagtype = 'login_failure_by_ip'
UNION ALL
SELECT l.*, u.firstname, u.lastname
FROM {log} l
JOIN {cache_flags} cf ON l.info = cf.name
LEFT JOIN {user} u ON l.userid = u.id
WHERE l.module = 'login' AND l.action = 'error'
AND l.time > ?
AND cf.flagtype = 'login_failure_by_info') t
ORDER BY t.time DESC";
$params = array($CFG->lastnotifyfailure, $CFG->lastnotifyfailure);
This selects all the login errors logged belonging to the IPs and infos since lastnotifyfailure that are stored in the cache_flags table.
report/outline/index.php
if (!$logstart = $DB->get_field_sql("SELECT MIN(time) FROM {log}")) {
print_error('logfilenotavailable');
}
A simple query to return the minimum time in the log table - used to make sure there are logs to report.
$sql = "SELECT cm.id, COUNT('x') AS numviews, MAX(time) AS lasttime
FROM {course_modules} cm
JOIN {modules} m ON m.id = cm.module
JOIN {log} l ON l.cmid = cm.id
WHERE cm.course = ? AND l.action LIKE 'view%' AND m.visible = 1
GROUP BY cm.id";
$views = $DB->get_records_sql($sql, array($course->id));
Returns the number of views a module has had.
report/participation/index.php
$minlog = $DB->get_field_sql('SELECT min(time) FROM {log} WHERE course = ?', array($course->id));
A simple query to return the minimum time in the log table - used in later query.
$sql = "SELECT ra.userid, u.firstname, u.lastname, u.idnumber, l.actioncount AS count
FROM (SELECT * FROM {role_assignments} WHERE contextid $relatedcontexts AND roleid = :roleid ) ra
JOIN {user} u ON u.id = ra.userid
LEFT JOIN (
SELECT userid, COUNT(action) AS actioncount FROM {log} WHERE cmid = :instanceid AND time > :timefrom AND $actionsql GROUP BY userid
) l ON (l.userid = ra.userid)";
A more complex query used to return the number of times a user has either posted or viewed a module.
Files that make use of $DB functions to query the logs table directly
I am ignoring the usages of log_display - based on what Petr said I assume we can just get rid of this completely
admin/handlevirus.php
$log = $DB->get_record("log", array("module"=>"upload", "info"=>$file, "action"=>"upload")
A simple check to see if the file was recorded as being uploaded.
backup/util/helper/backup_cron_helper.class.php
$logexists = $DB->record_exists_select('log', $sqlwhere, $params)
Function run_automated_backup: This query is performed twice in this function. It is used to check the log if there were any modifications to the course content.
blocks/recent_activity/block_recent_activity.php
$logs = $DB->get_records_select('log',
"time > ? AND course = ? AND
module = 'course' AND
(action = 'add mod' OR action = 'update mod' OR action = 'delete mod')",
array($timestart, $course->id), "id ASC");
Function get_structural_changes: Returns list of recent changes in the course structure. Used only within this block.
course/lib.php
$params = array('userid'=>$USER->id, 'url'=>"view.php?id=$courseid", 'since'=>$since);
$select = "module = 'course' AND action = 'new' AND userid = :userid AND url = :url AND time > :since";
return $DB->record_exists_select('log', $select, $params);
Function can_delete_course: If the user does not have the capability moodle/course:delete but has the capability moodle/course:create. The above checks when the user created the course, and if it was less than a day ago they can delete it.
lib/datalib.php
if (is_siteadmin()) {
if ($count->attempts = $DB->count_records_select('log', $select, $params)) {
$count->accounts = $DB->count_records_select('log', $select, $params, 'COUNT(DISTINCT info)');
return $count;
}
} else if ($mode == 'everybody') {
if ($count->attempts = $DB->count_records_select('log', "$select AND info = :username", $params)) {
return $count;
}
}
Function count_login_failures: Is used by renderers (both lib/outputrenderers.php and mymobile/renderers.php).
lib/deprecatedlib.php
$logs = $DB->get_records_select('log', "time > ? AND course = ? AND
module = 'course' AND
(action = 'add mod' OR action = 'update mod' OR action = 'delete mod')",
array($timestart, $course->id), "id ASC");
Function print_recent_activity - self explanatory.
lib/uploadlib.php
if (!$record = $DB->get_record('log', array('info'=>$oldpath, 'module'=>'upload'))) {
return false;
}
Function clam_change_log: This function does not seem to be used anywhere within Moodle.
mod/book/lib.php
$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'book',
'action'=>'view', 'info'=>$folder->id), 'time ASC')
- Used in the function book_user_outline.
mod/folder/lib.php
$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'folder',
'action'=>'view', 'info'=>$folder->id), 'time ASC')
- Used in the function folder_user_outline.
- Used in the function folder_user_complete.
mod/imscp/lib.php
$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'imscp',
'action'=>'view', 'info'=>$folder->id), 'time ASC')
- Used in the function imscp_user_outline.
- Used in the function imscp_user_complete.
mod/page/lib.php
$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'page',
'action'=>'view', 'info'=>$folder->id), 'time ASC')
- Used in the function page_user_outline.
- Used in the function page_user_complete.
mod/resource/lib.php
$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'resource',
'action'=>'view', 'info'=>$folder->id), 'time ASC')
- Used in the function resource_user_outline.
- Used in the function resource_user_complete.
mod/url/lib.php
$logs = $DB->get_records('log', array('userid'=>$user->id, 'module'=>'url',
'action'=>'view', 'info'=>$folder->id), 'time ASC')
- Used in the function url_user_outline.
- Used in the function url_user_complete.
Solution summary
We can't actually predict all the use cases so we must think generically and plan for worst cases.
- Things that don't need to use logs should not use logs. (Recent activity, etc)
- Make logging calls as cheap as possible
- Use Events API for the originating calls to create lots of hooks for other things
- Use MUC-like plug-ins to determine where to put logs permanently (NoSQL, SAS, file...). Default will be to the current log table.
- Log everything we possibly can think of.
- Define log level settings (on each logging call) so different sites can choose what they want logged and so control size, speed etc
- add_to_log retained for backward compatibility, make it a wrapper for new logger function.
- provide hooks to expose logs everywhere. For example, each page should have a link that shows logs and stats for that page.
Moodle -> Logs
- MUC-like plugins to interface between our logging calls and log backends
- File
- Database table
- http://logstash.net/ ?
- http://fluentd.org/ ?
- http://logging.apache.org/log4php/ ?
Logs -> Moodle
- Each plugin implements some methods to query and extract logs back to Moodle
- Do we auto-sync anything back to Moodle tables for convenience?
See also
Examples of data-based learning technology development: