Revision as of 21:10, 30 May 2013

Basically what's in here looks fine to me. I wrote a proposal Logging API proposal which includes some of these changes as a single development chunk, and should not cause problems for others implementing the rest of it later. The main focus of my proposal is providing the option to move logs out of the database (because the number of database writes is probably Moodle's single worst performance characteristic) but it should help enable the other enhancements from this proposal as well, in future.

Sam marshall 19:40, 12 February 2013 (WST)

Summary: Events/Logging API 2.6

Logging is implemented in form of event-listening by plugins. Plugins listening to the events may be:

generic logging plugin implementing interface allowing other plugins to query information
reporting plugin that aggregates only information that it needs and stores it optimised for its queries

Our tasks:

Events API must ensure that everything that potentially may be interesting to the plugins is included in event data (but does not query extra data). There should be as fast as possible to get the list of listeners to notify: Moodle -> Logs
Logging API is the form of plugins communication to the generic logging systems. It might not be extra effective but it stores and allows to retrieve everything that happens: Logs -> Moodle
We make sure that everything that is logged/evented now continues to do so + log much more
Create at least one plugin for generic logging

Marina Glancy 13:05, 15 May 2013 (WST)

New logging proposal

Current problems

many actions are not logged
some logged actions do not contain necessary information
log storage does not scale
hardcoded display actions, but only plugins know how to interpret data
it is not possible to configure level of logging
performance

Possible solution

We could either improve current system or implement a completely new logging framework. Improvements of the current log table and api cannot resolve all existing problems. Possible long term solution may be to split the logging into several self-contained parts:

triggering events - equivalent to current calls of add_to_log() function
storage of information - equivalent to current hardcoded storage in log database table
fetching of information - equivalent to reading from log table
processing of log information - such as processing of statistics
reporting - log and participation reports

We could solve the logging and event problems at the same time. Technically all log actions should be accompanied by event triggers. The logging and event data are similar too.

Log and event data structures

Necessary event information:

component - the frankenstyle component that understands the action, what changed
action - similar to areas elsewhere
user - who is responsible, who did it, current user usually
context - where the event happened
data - simple stdClass object that describes the action completely
type - create/read/update/delete/other (CRUD)
level - logging level 1...
time when event was triggered
optional event data - 2.5 style event data for backwards compatibility

Optional information:

studentid - affected student for fast filtering
courseid - affected course

Backwards compatibility

add_to_log() calls would be ignored completely with debuggin message in the future. Old log table would be kept unchnaged, but no new data would be added.

Old event triggers would generate legacy events.

Event triggers would be gradually updated to trigger new events with more meta data, event may optionally contain legacy event data and legacy even name. The legacy events would be triggered automatically after handling new event.

The event handle definition would be fully backwards compatible, new events would be named component/action. All new events would be instant.

Cron events

The current cron even design is overcomplicated and often results in bad coding style. Cron events should be deprecated and forbidden for new events. Cron events also create performance problems.

"Cron events are abused, stale and slow." - also the above. Please give specific examples of what you mean by this. Quiz uses cron events for a good reason (although you are welcome to suggest better ways to achieve the same results). This has never caused performance problems, and I don't recall any performance problem caused by cron events.--Tim Hunt 15:10, 17 May 2013 (WST)

Database transactions

At present there is a hack that delays events via cron if transaction in progress. Instead we could create buffer for events in memory and wait until we are notified from DML layer that transaction was committed or rolled back.

It is important here to minimise the event data size.

Auxiliary event data

At present we are sometimes sending a lot more even data because we may need things like full course records when even reference some course. This prevents repeated fetching of data dromdatabase.

This is bad for several reasons:

we can not store this onformation in logs - it is too big and not necessary
the data may get stale - especially in cron, but also other handlers may modify the data (event from event), the db transactions increase the chances for atle data too

The auxiliary data should be replaced by some new cache or extension of dml level - record cache. It could be prefilled automatically or manually whentriggering events.

Infinite recursion trouble

One event may modify data which results in new even which triggers the same event. We detect this somehow and stop event processing.

Catch all event handler

We need a '*' handler definition, this would allow log handlers to processes any subset of events without core modifications.

Handler definitions

At present handlers are stored in database table during installation and upgrade. That is not necessary any more and we could use MUC instead. The list of all handlers will be always relatively small and should fit into RAM.

Handler definitions should contain priority.

Implementation steps

redesign events internals while keeping bc
new logging API
convert log actions and event triggers
inplement log plugins

Petr Škoda (škoďák)

Event logger or (event and logging)

I agree with Petr's proposal to enhance event system, but I think event and logging should be kept separate. Mixing them will force developer to pollute event system which might lead to race-around problem.

With current proposal we are not considering logging event-less situations like sending email after forum is posted, or debug/exception or cron status. They all can be achieved by triggering an event but should be pollute event system for some use-cases like:

Logging memory usage while executing an api (create_user, view create user page)
Logging mail contents (for forum etc.)
Logging debug information for replicating issue (Help admin and moodle developer to fix issues)

Also, we should consider how logforphp[1] and Monolog[2] are implemented. They use concepts of channelling log information, putting them in category and write to different logging stores. This will help us to create flexible system with can collect rich data for analysis, research and debugging. Also, we should consider logging general information in stream/file and important information in db to avoid performance issues.

We should also consider following standards like PSR-3 or RFC-5424

Rajesh Taneja (rajeshtaneja)

Logging plugins diagram

Logging plugins relation

Configuration example:

Imagine admin created 2 instances of each "DB log storage", "Filesystem log storage" and "Log storage with built-in driver" and called them dbstorage1, dbstorage2, filestorage1, filestorage2, universalstorage1 and universalstorage2

DB log storage::log_storage_instances() will return:

dbstorage1
dbstorage2

Simple DB log driver::log_storage_instances() will return:

dbstorage1 (Simple DB log driver)
dbstorage2 (Simple DB log driver)

When Report1 is installed it will query all plugins that implement both functions: log_storage_instances() and get_log_records(), and then invoke log_storage_isntances() from each of them and return the list for admin to choose the data source from:

filestorage1 (Records DB log driver)
filestorage2 (Records DB log driver)
dbstorage1 (Records DB log driver)
dbstorage2 (Records DB log driver)
filestorage1 (FS to DB log driver)
filestorage2 (FS to DB log driver)
universalstorage1
universalstorage2

Marina Glancy 11:01, 16 May 2013 (WST)

Minor comments

The log and event data structure that Petr proposed to me initially looked good (though it does seem to be missing date / time which I am guessing is just an oversight). But after talking to Rajesh about querying the data I think that I might be more in favour of Some sort of flat system that doesn't require serialising the data. Adrian Greeve 09:32, 17 May 2013 (WST)

Fred's part

Logging

Data the logger should receive

Those are the information that should be passed on to the logging system. For performances, it's probably better that this information arrives full and that no more queries should be performed to retrieve, say, the course name. The more information we're passing, the more we can store. Once this is defined, then we will know what an event has to send when it's triggered.

Mandatory fields *

Event specification version number*
Event name*
Type*
- Error
- User action
  - Procedural action
  - Manual action
- System log
Datetime* (milliseconds?)
Context ID*
Category ID
Category name
Course ID*
Course name
Course module ID*
Course module name
Component*
- core
- course
- mod_assign
- mod_workshop
Subject* (Defines the object on which the action is performed)
- user
- section
- assignment
- submission_phase
Subject ID*
Subject name (Human readable identifier of the object: Mark Johnson, Course ABC, ...)
Subject URL (URL to notice the changes)
Action* (The exact action that is performed on the object)
- created
- moved (for a course, a section, a module, a user between groups)
- submitted (for an assignment, or a message)
- ended (for a submission phase for instance)
Actor* (user, cli, cron, ...)
Actor ID/User ID* (ID associated to the actor, typically the user id)
Real actor/user ID* (When logged in as, store the real user ID)
Actor IP address*
Associated object (Object associated to the subject. Ie: category of origin when a course is moved. User to whom a message is sent.)
- section
- category
- user (to whom you sent a message)
Associated object ID
Transaction type* (CRUD)
- create
- read
- update
- delete
Level* (Not convinced that we should use that, because it's very subjective, except if we have guidelines and only a few levels (2, max 3))
- major
- normal
- minor
Message (human readable log)
Data (serialized data for more information)
Current URL

We have to decide how extensively we want to provide information to external support. For example, when an assignment is submitted, we could provide the URL to the assignment. But that means that this event specifically has to receive the data, or the module has to provide a callback to generate the information based on the event data. Also, external systems cannot call the assignment callback.

In any case, not everyone is going to be happy. The more processing to get the data, the slower it gets. The less data we provide, the less information can be worked on...

Also, if we provide slots for information, but most of the events do not use them, the data becomes unusable as it is inconsistent except for the one event completing the fields.

Fields policy

We could define different levels of fields to be set depending on CRUD. For example, if an entry is deleted, we might want to log more than just the entry ID, but also its content, its name etc... surely the data field could contain most of it, but we still need to define policies.

Events

Important factors

Processing has to be very quick
Developers are lazy and so the function call should be as easy as possible
Validating the data is very costly, we have to avoid that
An event should not be defined, but triggered, for performance reason

An event is not defined, it is fired and that's the only moment the system knows about it. Two different components should not define the same action, for example: enrol/user:created and core/user:created. The user:created event should be triggered in the core method creating a user, once and only.

Triggering an event

Very basic and quick example of how we could now trigger the events, filling up the required data and parsing the rest to allow for the developers to quickly trigger one without having to provide extensive information.

event_trigger('mod_assign/assignment:submitted', $assignmentid, $extradata);

/**

* Trigger an event
*
* An event name should have the following form:
* $component/$subject:$action
*
* The component is the frankenstyle name of the plugin triggering the event.
* The subject is the object on which the action is performed. If the name could be
*     bind the the corresponding model/tablename that would be great.
* The action is what is happening, typically created/read/updated/deleted, but
*     it could also be 'moved', 'submitted', 'loggedin', ...
*
* The parameter $subjectid is ideally the ID of the subject in its own table.
*
* @param string $name of the event
* @param string $subjectid ID of the subject of the event
* @param array|stdClass of the data to pass on
* @param int $level of the event
* @return void
*/

function event_trigger($name, $subjectid, $data = null, $level = LEVEL_NORMAL) {

   global $USER;
   $data = (object) $data;

   // For the specification 2, here are some hardcoded values.
   $data->eventname = $name;
   $data->version = 2;
   $data->type = 'event';

   // Get the component, event, action and subject.
   // $name = $component/$action:$subject
   // $event = $action:$subject
   list($component, $event) = explode('/', $value, 2);
   list($subject, $action) = explode(':', $event, 2);
   $data->component = $component;
   $data->subject = $subject;
   $data->action = $action;

   $data->subjectid = $subjectid;
   $data->level = $level;

   // Defaults.
   if (!isset($data->time)) {
       $data->time = microtime(true);
   }

   if (!isset($data->contextid)) {
       $data->contextid = $PAGE->get_contextid();
   }

   if (!isset($data->courseid)) {
       $data->courseid = $PAGE->get_courseid();
   }

   if (!isset($data->moduleid)) {
       $data->courseid = $PAGE->get_coursemoduleid();
   }

   if (!isset($data->actor) && !isset($data->actorid)) {
       $data->actor = 'user';
       $data->actorid = $USER->id;
   } else {
       throw Exception('Actor and Actor ID must both be specified');
   }

   if (session_is_loggedinas()) {
       $data->realuserid = session_get_realuser()->id;
   }

   // It would be preferrable that this data is not computed on the fly,
   // but I have added that for now so that the dev doesn't have to set
   // one extra parameter.
   if (!isset($data->crud)) {
       if ($data->action == 'created' || $data->action == 'added') {
           $data->crud = 'c';
       } else if ($data->action == 'read' || $data->action == 'viewed') {
           $data->crud = 'r';
       } else if ($data->action == 'updated' || $data->action == 'edited') {
           $data->crud = 'u';
       } else if ($data->action == 'deleted' || $data->action == 'removed') {
           $data->crud = 'd';
       } else {
           throw Exception('Crud is needed!');
       }
   }

   // Not mandatory, but possible to be guessed.
   if (!isset($data->currenturl)) {
       $data->currenturl = $PAGE->get_url();
   }

   dispatch_event($name, $data);

}

Using such a function would:

Force the developer to send the minimal required information via parameters
Guess the fields such as component, action or subject from the event name (this processing is really small and easy versus the overhead of asking the dev to enter those)
Allow for any information that can be guessed by the system to be defined in the $data parameter
Allow any extra information to be defined in the $data parameter

An alternative way would be send an event object to this method. An event object would be a class event_create() for example, in which should be set all the required variables. I didn't get in that direction because I think it's taking up a bit more memory, but also this could be more work for developers just to trigger an event.

Both ways have definitely their pros and cons.

Validation of event data

I don't think we should validate the information passed as this processing is relatively expensive. But we should be strict on the fact that extra data should be set in $data->data. A solution to prevent developers from abusing the stdClass properties, is to recreate the object based on the allowed keys, that's probably cheap. dispatch_event() could be the right place to do that.

Deprecate Cron events

An event is supposed to happen when it has been triggered, not being delayed. I think we should leave the plugins handle the events the way they want it, and schedule a cronjob is they want to delay the processing. Of course, we should still support them for a while.

Backwards compatibility

If the event naming convention changes, this could be a bit trickier to achieve. Especially to ensure that 3rd party plugins trigger and catch events defined by other 3rd party plugins.

The old method events_trigger() should:

Be deprecated;
Trigger the old event;
Not trigger the new events as it's tricky to remap the data to form the new $data parameter.

The new method to trigger events should

Trigger the new event;
Trigger the old-style event for non-updated observers;
Keep the old $eventdata for old-style events.

Core events triggered

git gr events_trigger\( | sed -E 's/.*events_trigger\(([^,]+),.*/\1/' | sort | uniq

Frédéric Massart 11:07, 20 May 2013 (WST)

Logging bulk actions.

Just an off thought that I had. Some (if not all) of the functions that are called from the events_trigger only handle single actions (e.g. user_updated). To save a lot of processing, can we make sure that multiple updates / deletions / creations are done as one? Adrian Greeve 09:32, 17 May 2013 (WST)

Why observer priorites?

As discussed yesterday, I propose we must have priorities associated with event observers/handlers. This leads to possiblity of events being used by plugins and core extensively in future. A simple idea could be to re-write the completion system to use the new event infrastucture instead of making its on calls to view pages etc. Here is a simple use case that shows why priorities are needed:-

Consider a student called "Derp" enrolled in a course called "Fruits" which has two activites "Bananas" and "kiwi fruit"
Derp completed the activity “Bananas” which triggers a event (say 'activity_complete') with two observers (say A and B). 
Observer A unlocks the other activity “Kiwi fruit” if activity “Bananas” is completed and if user is enrolled in course “Veggies”
Observer B enrols user in course “veggies” if activity “Bananas” is completed.
Without observer priorities observer B can be notified of the change before observer A or the other way around leading to conflicting and undesirable results.

Ankit Agarwal 10:48, 17 May 2013 (WST)

Can we support Tincan Experience API via a retrieval plugin/reporting tool

Here is an example of tincan API call:-

Derp attempted 'Example Activity'

{
    "id": "26e45efa-f243-419e-a603-0c69783df121",
    "actor": {
        "name": "Derp",
        "mbox": "mailto:depr@derp.com",
        "objectType": "Agent"
    },
    "verb": {
        "id": "http://adlnet.gov/expapi/verbs/attempted",
        "display": {
            "en-US": "attempted"
        }
    },
    "context": {
        "contextActivities": {
            "category": [
                {
                    "id": "http://corestandards.org/ELA-Literacy/CCRA/R/2",
                    "objectType": "Activity"
                }
            ]
        }
    },
    "timestamp": "2013-05-17T04:02:18.298Z",
    "stored": "2013-05-17T04:02:18.298Z",
    "authority": {
        "account": {
            "homePage": "http://cloud.scorm.com/",
            "name": "anonymous"
        },
        "objectType": "Agent"
    },
    "version": "1.0.0",
    "object": {
        "id": "http://www.example.com/tincan/activities/aoivGYMz",
        "definition": {
            "name": {
                "en-US": "Example Activity"
            },
            "description": {
                "en-US": "Example activity definition"
            }
        },
        "objectType": "Activity"
    }
}

From tincan specs, here the properties of a call:-

UUID assigned by LRS if not set by the Activity Provider.
actor* Who the Statement is about, as an Agent or Group Object. Represents the "I" in "I Did This".
verb* Action of the Learner or Team Object. Represents the "Did" in "I Did This".
object* Activity, Agent, or another statement that is the Object of the Statement.Represents the "This" in "I Did This". Note that Objects which are provided as a value for this field should include an "objectType" field. If not specified, the Object is assumed to be an Activity.
result Result Object, further details representing a measured outcome relevant to the specified Verb.
context Context that gives the Statement more meaning. Examples: a team the Actor is working with, altitude at which a scenario was attempted in a flight simulator.
timestamp Timestamp (Formatted according to ISO 8601) of when the events described within this Statement occurred. If not provided, LRS should set this to the value of "stored" time.
stored Timestamp (Formatted according to ISO 8601) of when this Statement was recorded. Set by LRS.
authority Agent who is asserting this Statement is true. Verified by the LRS based on authentication, and set by LRS if left blank.
version The Statement‟s associated xAPI version, formatted according to Semantic Versioning 1.0.0
attachments Array of Headers for attachments to the Stateme attachment

Among these actor, verb and object are mandatory fields. We are recording all those in our logs. In addition the LRS (retrival plugin?) can populate rest of the fields if it wanted to. The only information that will not be stored by the default storage plugin is "stored" field.

Performance

I would like to see this spec propose some performance criteria at that start, that can be testing during development. For example:

How many writes / events per second should a log back-end be able to handle. (I know, it depends on the hardware.)
On a typical Moodle page load, how much of the load time should be taken up by the add_to_log call? Can we promise that it will be less than in Moodle 2.5?

I guess that those are the two main ones, if we can find a way to quantify them. Note that performance of add_to_log on a developer machine is not an issue. The issue really comes with high load, where there might be lock contention that slows down log writes.--Tim Hunt 15:16, 17 May 2013 (WST)

Existing log writing analysis

add_to_log

course id
module name (forum, journal, resource, course, user etc)
action (view, update, add, delete, possibly followed by another word to clarify)
url (file and parameters used to see the results of the action)
info (additional description information)
cm (course module ID)
user (if the user is different to $USER)

If the user is logged in as someone else than no logs are created.

If $CFG->logguests is set and false then no logs are created.

Multiple checks are done on the different fields to ensure that the data is ok for inserting.

try catch for inserting the record (insert_record_raw - for no checking, insert into the database as quickly as possible)

if there is an error a debugging message is sent. if $CFG->supportemail is true and $CFG->noemailever is not set then an email is sent with a limit of one email a day.

info section Some logs have:

written information
just a number
- user ID
- course ID
question type

Action contains:

action using get string
Straight English
nothing at all.

Some sections don't log anything eg bulk upload of users via csv.

See the function add_to_log for further details. lib/datalib.php

user_accesstime_log

course id

no log created if the user isn't logged in, is logged in as someone else, or is a guest user.

updates the user table with (update_record_raw - see above) and changes $USER->lastaccess at the same time.

user ID Last ip address last access time

Checks made on user_lastaccess table and inserts / updates are made as appropriate.

See the function user_accesstime_log for further details. lib/datalib.php

Other logs

Only one insert into the config_log table (lib/adminlib.php) - lib function config_write (30 calls in 4 files) log contains:

user ID
time modified
plugin
name
value
old value

log_trasfer called three times

two direct inserts into the portfolio log (lib/portfolio/exporter.php) - lib functions log_transfer (3 calls in 3 files) log contains:

user ID
portfolio
caller file
caller component
caller sha1
caller class
continue url
return url
temp data ID
time

upgrade_log (12 calls in 1 file)

log contains:

type
plugin
version
target version
info
details
backtrace
userid
timemodified

Adrian Greeve 09:47, 20 May 2013 (WST)

Object-oriented model of events

This is the proposal for object-oriented events:

All events descriptions are objects extending event_base (which is defined in core)
Object class name is unique identifier of the event
Class name is pluginfullname_event_xxx. Core events will have prefix 'moodle'
Plugins store each event object in plugindir/classes/event/xxx.php (Petr is considering autoloading and/or namespaces)
event_base also implements cacheable_object

Event object has:

the protected properties that Fred is working on, also public getter methods for each of them
properties 'legacyname' and 'legacydata'
Two constructors (or static methods creating an instance):
- to create event that is about to be triggered with autofilling constant and/or default data for this particular event
- to restore event object after it has been stored in log. No default attributes here, it may even be declared final
Function to trigger event. Also static method that creates an instance and triggers event - for lazy developers.
Static function returning human readable event name, preferably using get_sting (lang_string)
Static function returning human readable event description
Function returning human readable event contents (to be used on events restored from log)
Function checking if current user is able to see this event (to be used on events restored from log)

Yes, this is slightly slower from performance point of view than passing an array/stdClass with data. But the difference is not so big and such structure is easier to understand and implement for developers and at the same time solves a lot of problems when displaying logs. Only constructor and trigger functions are used when event is actually triggered. Everything else might or might not be used when displaying the logs.

Backward compartibility:

Function events_trigger() will create and trigger an instance of event_legacy class
For events that already exist in Moodle 2.5 the additional legacy information should be added to the event data (in properties 'legacyname' and 'legacydata')

Event handlers:

Event handlers can be described as it is done now in plugindir/db/events.php, this file is parsed during install/upgrade of plugin and all handlers are removed on uninstall
It is possible to subscribe to all events (*)
Event handlers can also have an attribute 'sortorder' (positive or negative, default 0). At the same time admin can overwrite the handlers sequence
If event handler refers to the old (2.5) name of event, it will be used only for events that contain legacy name and data. If it refers to 2.6 event class name it will be used with full data
Core contains class events_manager and function get_events_manager() that returns a single instance of it. This class is responsible for retrieving/updating information about event handlers, processing event triggers, managing the events queue, processing events in cron.
events_manager has methods to dynamically register/unregister handlers but they are not recommended (same as it is with dynamic caches definitions). Ideally they should only be used in unit and/or behat tests

How to store and retrieve the list of handlers is the most vulnerable part for performance. AFAIK Petr already has ideas about it.

And I want to emphasize again (also Tim raised this point again), that performance of storing and retrieving of the events data in log tables is the responsibility of plugins and not core. We will include in core a plugin that repeats the 2.5 functionality and stores events with legacy data in table {log}. Also we will develop three plugins:

events dispatcher that allows to configure the filter which events to store in log
DB log storage with auto-archiving of the old data
driver for DB log storage that given the time range and other filters returns the list of events from DB log storage.

At the same time we will rewrite all reports that use table {log} to the new engine and move the functions that use {log} table to the legacy log plugin.

abstract class event_base {
  // ... constants and all properties as protected variables
  protected final function __construct() {}
  protected function __construct($args);
  public static final function create($args) {
    return new self($args);
  }
  public static final funciton restore($object) {
    $event = new self();
    // .. restore each property from $object to $event
  }
  public static final function create_and_trigger($args) {
    $event = self::create($args);
    $event->trigger();
  }
  public final function trigger() {
    // ... 
  }
  public static function event_name();
  public static function event_description();
  public function can_view($user = null);
  public function event_data();
}

Assuming we have class mod_myplugin_event_something_happened extends event_base, to trigger event we may use:

mod_myplugin_event_something_happened::create_and_trigger(array('courseid' => XXX, 'userid' => YYY, 'data' => ZZZ));

or

$event = mod_myplugin_event_something_happened::create(array('courseid' => XXX, 'userid' => YYY, 'data' => ZZZ));
// ... more code that may delete or modify the entities used when creating an event object
$event->trigger();

Marina Glancy 11:08, 20 May 2013 (WST)

Public standards compliance

According to PHP-FIG (PHP Framework Interop Group), we should try implement LoggerInterface. It is not important for us to implement this, but as they follow RFC standards and supported by open source community like symphony, it will be nice to consider.

To implement standard logging protocols like RFC 5424 or RFC-3164, we should have following information in event:

$type: The category to which this message belongs.

$message: The message to store in the log.

$variables: Array of variables to replace in the message on display or NULL if message is already translated or not possible to translate.

$severity: The severity of the message; one of the following values as defined in RFC 3164:

EMERGENCY: Emergency, system is unusable.

ALERT: Alert, action must be taken immediately.

CRITICAL: Critical conditions.

ERROR: Error conditions.

WARNING: Warning conditions.

NOTICE: (default) Normal but significant conditions.

INFO: Informational messages.

DEBUG: Debug-level messages.

$timestamp: Time when event happened.

--Rajesh Taneja 11:13, 20 May 2013 (WST)

skodak's mini spec

Logging and reporting overview

There are two independent parts, first the sending of triggered events into some storage, second part is working with the stored events in reports or statistics.

The general process of storing of events and related data formats are already defined by the event definition and event dispatching algorithm in Event 2.

The access to logged data is a significantly more complex problem, this mini-spec tries to describe one potential solution. Other proposals should consider use cases and performance issues described here.

Writing events to log storages

The algorithm is:

Some code triggers event.
Event is received by hight priority log event observer.
Log event observer instructs each active log storage plugin to store the event.
Log observer may optionally use some filtering criteria and skip some log storage plugins for some event (usually for performance reasons).
Log storage plugin is free to store the event in any way (as fast as possible).

Public API:

Log observer must be able to get a list of all active log storage plugins
Each log storage plugin must have store() method that accepts event instance parameter

Sample plugin, class and method names:

tool_logmanager::event_triggered(core_event_base $event) - event observer for "*" event class names, calls store($event) method from all active storage plugins
tool_logmanager::get_log_storages($activeonly) - if activeonly true it uses a setting to return a list of active log storage plugins, if not it does class_exists() for each admin tool plugin and returns a list of all plugins with class tool_xxxxx_log_storage where xxxxx is the plugin name
tool_logdb_log_storage::store(core_event_base $event) - writes events to tool_logdb_data table in standard Moodle database, the table columns are matching standard event properties
tool_lognosql_log_storage::store($event) - sample plugin that stores event data in some nosql database
tool_logcsvfile_log_storage::store($event) - sample plugin that appends line to CSV log file

Administration UI:

Enabling/disabling of individual log storages - tool_logmanager
Optional filtering of events - may be both in tool_logmanager and individual log storage plugins
Each log storage plugin may have more settings - NoSQL server connection data in tool_lognosql, filepath in tool_logcsvfile, log rotation, etc.

Admin tools may add any number of new folders and settings or external pages in admin setting tree, there would be a new section Plugins/Logging.

By default tool_logdb would be automatically enabled in new installs from tool/logmanager/db/install.php, if present.

Reading data stored in log storages

Multiple others plugins will be interested in reading or processing of event data stored in data storages.

Public APIs

There may be different levels of APIs, each of them having strong and weak points:

tool_xxxxx_log_storage::get_logged_events($select, $params, $order, $paging, &$totalcount) - loggers returning event instances as array with basic filtering parameters
tool_xxxxx_log_storage::get_log_table() - loggers allowing direct read access to database table (or view) containing event records with standard columns defined by core_event_base
tool_logmanager::get_logged_events($select, $params, $order, $paging, &$totalcount) aggregated array of event instances fetched from one of log storage plugins (or multiple) using get_logged_events() or reading the plugin log table directly.

tool_xxxx_log_storage::get_logged_events($filter, $order, $paging)

This is the most basic API for log storage plugins that allows reading of individual logged events. All general report plugins that deal with individual events may use this API.

It is equivalent to legacy get_logs() function:

get_logs($select, array $params=null, $order='l.time DESC', $limitfrom=, $limitnum=, &$totalcount)

Pros:

Very simple for SQL based storages
Data is returned as arrays of PHP event class instances
Easy to use in simple reports

Cons:

We can either define our meta syntax for filters and ordering, or we can use SQL syntax and emulate it in non-SQL log backends that support reading.
All event data must go through PHP in-memory structures, it is therefore not possible to operate on large data sets such as a day or week of all logs.
Reports need to query tool_logmanager() for active storages and select one that implements this API.
Filtering/reading performance is limited by the physical storage of data, fast log storage does not mean reports will be fast too - it is usually the opposite.

Necessary admin settings:

Reports may define setting for selection of a log storage plugin from the list of available storage plugins with get_logged_events() capability, such as tool_logmanager::get_log_storages(true) with get_logged_events() method or use the first one available.

tool_xxxxx_log_storage::get_log_table()

Storage plugins that store data in Moodle database table may publish the table name (or view) for direct use in SQL queries (reading only).

Pros:

Extremely simple to implement.
get_logged_events() can be always implemented on top of this, tool_logmanager may contain shared implementation for this.
Suitable for various statistics reports.
Current syntax for filtering and sorting - SQL.
Data can be easily replicated to external systems.

Cons:

Suitable for data stored in SQL tables only.
Writing needs to be optimised heavily.
Read/write concurrency needs some hacking on large installs.
External databases need to be somehow mapped to be accessible as normal DB tables/views.
External databases might need some new support in DML layer.

tool_logmanager::get_logged_events()

The event manager may select a suitable log storage for reading, or it can be controlled via settings. Both types of APIs above may be used for getting of the data from log storages.

Existing reports

Live logs report

This report lists events that a being stored in log storages in real time.

It is compatible with any log storage that get_logged_events() with “created less than (now−60*60) ” filtering, ordered by timecreated and paging.

Logs report

Simple listing of events with very basic filtering by course, user, date (in current timezone) and actions.

It is compatible with all storages that implement get_logged_events() with basic “equals” and date filtering, ordered by timecreated and paging.

Activity report (outline)

This report shows the total number of educational actions for each activity in course.

The data cannot be obtained using get_logged_events()! The events need to be counted in database, that is why only SQL based log storages may be used for this. It would be absurdly slow if we were iterating get_logged_events() results one by one in PHP memory.

Alternative for external databases could be some count_logged_events() with appropriate parameters.

This report needs to select one active log storage with get_log_table() method. It is very important to know the dictionary of interesting action words or some education level of each action. This might be also improved by new plugin callbacks.

Nice to have features:

specify manual date range instead of all data in logs
views per each week/day
limit to enrolled users only
limit to guests only
more action types for each activity

Course participation report

This report shows counts of actions of selected users in selected course activities. It is similar to Activity report which is showing totals for all users in course. There is also an option to send bulk messages to selected users.

It requires the same get_log_table() method, it might theoretically use count_logged_events() for external database tables.

Nice to have features:

show actions for more activities on one page
manually specify date ranges
more action options (post/view)

Statistics report

This report calculates monthly, weekly and daily action counts for view and post actions. The data are aggregated per course and per user.

Again it is not possible to loop through existing logged events from the whole day using PHP iteration over get_logged_events(). It is not even possible to use count_logged_events() because it would have to be done for each user and course in the system (tens of thousands of complex queries on a large site once per day).

Theoretically this could be implemented via a new observer that would be doing the bean counting each day, but most probably the performance would not be good enough on large sites.

The only sensible approach seems to be use get_log_table(). It might be interesting to add some hooks that allow processing of stats in replicated database.

New user timeline report

The current user log report lists actions carried out by particular user, sometimes we want a report that also includes other event related to that user, such as received grades, creation of account by administrator, user enrolled into course, user kicked out from course, course completion in cron, received messages, etc.

This could be implemented using the new ‘affecteduser’ field defined in new events.

It could be very useful for parent reports - all information related to the child displayed chronologically on one report with optional filtering.

This report would use basic get_logged_events() method.

Performance tricks

In Moodle 2.5 each add_to_log() does one database insert, if we increase number of logged actions we might have to buffer the inserts and implement new bulk insert method in database drivers. The only technical problem is that we must flush the buffer before the end of PHP execution, workaround might be to flush the buffers in page footer and continue inserting each entry afterwards. Cron or CLI scripts would need other workarounds.

Statistics processing performance was recently improved by copying examined data (one day) into separate table before starting the SQL aggregations. This could be used in other complex statistics too.

Upgrade strategies

Create new separate report plugins that work with new data and keep old plugins unchanged for historical data.
Somehow migrate the existing data from log table into new table and remove old reports.
Alter current reports to work with new and old db tables at the same time.

The current log table can be kept for a few more releases, we can keep the original loglifetime setting that controls the add_to_log(). It would be strongly recommended to disable old logging unless there are some custom legacy reports that expect the log table (it might be better to disable adding to old log table by default during upgrade).

Advanced customisations

Some partners or institutions might want to replace all all or some of the reports with custom plugins. The recommended strategy would be to write new log storage plugins with standard or custom API. External storages that implement get_logged_events() would be usable for basic reporting, the standard statistics reports would not probably work with external systems.

It would be also possible to replace (or supplement) all standard logging and reporting plugins and implement custom logging subsystem.

Petr Škoda (škoďák) 04:58, 31 May 2013 (WST)

Revision as of 21:10, 30 May 2013 (view source) Petr Škoda (škoďák) (talk \| contribs) (→‎tool_logmanager::get_logged_events()) ← Older edit		Revision as of 21:10, 30 May 2013 (view source) Petr Škoda (škoďák) (talk \| contribs) (→‎Live logs report) Newer edit →
Line 790:		Line 790:

	It is compatible with any log storage that get_logged_events() with “created less than (now−60*60) ” filtering, ordered by timecreated and paging.		It is compatible with any log storage that get_logged_events() with “created less than (now−60*60) ” filtering, ordered by timecreated and paging.


	====Logs report====		====Logs report====

Documentation

Talk:Logging 2: Difference between revisions

Revision as of 21:10, 30 May 2013

Summary: Events/Logging API 2.6

New logging proposal

Current problems

Related problems

Events

Statistics

Possible solution

Log and event data structures

Backwards compatibility

Cron events

Database transactions

Auxiliary event data

Infinite recursion trouble

Catch all event handler

Handler definitions

Implementation steps

Event logger or (event and logging)

Logging plugins diagram

Minor comments

Fred's part

Logging

Data the logger should receive

Fields policy

Events

Important factors

Triggering an event

Validation of event data

Deprecate Cron events

Backwards compatibility

Core events triggered

Logging bulk actions.

Why observer priorites?

Can we support Tincan Experience API via a retrieval plugin/reporting tool

Related links

Performance

Existing log writing analysis

add_to_log

user_accesstime_log

Other logs

Object-oriented model of events

Public standards compliance

skodak's mini spec

Logging and reporting overview

Writing events to log storages

Reading data stored in log storages

Public APIs

tool_xxxx_log_storage::get_logged_events($filter, $order, $paging)

tool_xxxxx_log_storage::get_log_table()

tool_logmanager::get_logged_events()

Existing reports

Live logs report

Logs report

Activity report (outline)

Course participation report

Statistics report

New user timeline report

Performance tricks

Upgrade strategies

Advanced customisations