Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Talk:Logging 2: Difference between revisions

From MoodleDocs
Line 206: Line 206:
* '''Actor ID/User ID*''' (ID assiacted to the actor, typically the user id)
* '''Actor ID/User ID*''' (ID assiacted to the actor, typically the user id)
* '''Real actor/user ID*''' (When logged in as, store the real user ID)
* '''Real actor/user ID*''' (When logged in as, store the real user ID)
* '''Actor IP address*'''
* Associated object (Object associated to the subject. Ie: category of origin when a course is moved. User to whom a message is sent.)
* Associated object (Object associated to the subject. Ie: category of origin when a course is moved. User to whom a message is sent.)
** section
** section
Line 386: Line 387:
</code>
</code>


[[User:Frédéric Massart|Frédéric Massart]] 10:57, 20 May 2013 (WST)
[[User:Frédéric Massart|Frédéric Massart]] 11:07, 20 May 2013 (WST)


== Logging bulk actions. ==
== Logging bulk actions. ==

Revision as of 03:07, 20 May 2013

Basically what's in here looks fine to me. I wrote a proposal Logging API proposal which includes some of these changes as a single development chunk, and should not cause problems for others implementing the rest of it later. The main focus of my proposal is providing the option to move logs out of the database (because the number of database writes is probably Moodle's single worst performance characteristic) but it should help enable the other enhancements from this proposal as well, in future.

Sam marshall 19:40, 12 February 2013 (WST)

Summary: Events/Logging API 2.6

Logging is implemented in form of event-listening by plugins. Plugins listening to the events may be:

  • generic logging plugin implementing interface allowing other plugins to query information
  • reporting plugin that aggregates only information that it needs and stores it optimised for its queries

Our tasks:

  1. Events API must ensure that everything that potentially may be interesting to the plugins is included in event data (but does not query extra data). There should be as fast as possible to get the list of listeners to notify: Moodle -> Logs
  2. Logging API is the form of plugins communication to the generic logging systems. It might not be extra effective but it stores and allows to retrieve everything that happens: Logs -> Moodle
  3. We make sure that everything that is logged/evented now continues to do so + log much more
  4. Create at least one plugin for generic logging

Marina Glancy 13:05, 15 May 2013 (WST)

New logging proposal

Current problems

  1. many actions are not logged
  2. some logged actions do not contain necessary information
  3. log storage does not scale
  4. hardcoded display actions, but only plugins know how to interpret data
  5. it is not possible to configure level of logging
  6. performance

Related problems

Events

Events are triggered in few places mostly for write actions only. The event data structure is inconsistent. Cron events are abused, stale and slow. The error states and transaction support is problematic.

Statisticts

Statistics processing is relatively slow. The results are not reliable becuase the algorithm does not fully understand the logged data.

Possible solution

We could either improve current system or implement a completely new logging framework. Improvements of the current log table and api cannot resolve all existing problems. Possible long term solution may be to split the logging into several self-contained parts:

  • triggering events - equivalent to current calls of add_to_log() function
  • storage of information - equivalent to current hardcoded storage in log database table
  • fetching of information - equivalent to reading from log table
  • processing of log information - such as processing of statistics
  • reporting - log and participation reports

We could solve the logging and event problems at the same time. Technically all log actions should be accompanied by event triggers. The logging and event data are similar too.

Log and event data structures

Necessary event information:

  1. component - the frankenstyle component that understands the action, what changed
  2. action - similar to areas elsewhere
  3. user - who is responsible, who did it, current user usually
  4. context - where the event happened
  5. data - simple stdClass object that describes the action completely
  6. type - create/read/update/delete/other (CRUD)
  7. level - logging level 1...
  8. optional event data - 2.5 style event data for backwards compatibility

Optional information:

  1. studentid - affected student for fast filtering
  2. courseid - affected course

Backwards compatibility

add_to_log() calls would be ignored completely with debuggin message in the future. Old log table would be kept unchnaged, but no new data would be added.

Old event triggers would generate legacy events.

Event triggers would be gradually updated to trigger new events with more meta data, event may optionally contain legacy event data and legacy even name. The legacy events would be triggered automatically after handling new event.

The event handle definition would be fully backwards compatible, new events would be named component/action. All new events would be instant.

Cron events

The current cron even design is overcomplicated and often results in bad coding style. Cron events should be deprecated and forbidden for new events. Cron events also create performance problems.

"Cron events are abused, stale and slow." - also the above. Please give specific examples of what you mean by this. Quiz uses cron events for a good reason (although you are welcome to suggest better ways to achieve the same results). This has never caused performance problems, and I don't recall any performance problem caused by cron events.--Tim Hunt 15:10, 17 May 2013 (WST)

Database transactions

At present there is a hack that delays events via cron if transaction in progress. Instead we could create buffer for events in memory and wait until we are notified from DML layer that transaction was committed or rolled back.

It is important here to minimise the event data size.

Auxiliary event data

At present we are sometimes sending a lot more even data because we may need things like full course records when even reference some course. This prevents repeated fetching of data dromdatabase.

This is bad for several reasons:

  • we can not store this onformation in logs - it is too big and not necessary
  • the data may get stale - especially in cron, but also other handlers may modify the data (event from event), the db transactions increase the chances for atle data too

The auxiliary data should be replaced by some new cache or extension of dml level - record cache. It could be prefilled automatically or manually whentriggering events.

Infinite recursion trouble

One event may modify data which results in new even which triggers the same event. We detect this somehow and stop event processing.

Catch all event handler

We need a '*' handler definition, this would allow log handlers to processes any subset of events without core modifications.

Handler definitions

At present handlers are stored in database table during installation and upgrade. That is not necessary any more and we could use MUC instead. The list of all handlers will be always relatively small and should fit into RAM.

Handler definitions should contain priority.

Implementation steps

  1. redesign events internals while keeping bc
  2. new logging API
  3. convert log actions and event triggers
  4. inplement log plugins

Petr Škoda (škoďák)

Event logger or (event and logging)

I agree with Petr's proposal to enhance event system, but I think event and logging should be kept separate. Mixing them will force developer to pollute event system which might lead to race-around problem.

With current proposal we are not considering logging event-less situations like sending email after forum is posted, or debug/exception or cron status. They all can be achieved by triggering an event but should be pollute event system for some use-cases like:

  1. Logging memory usage while executing an api (create_user, view create user page)
  2. Logging mail contents (for forum etc.)
  3. Logging debug information for replicating issue (Help admin and moodle developer to fix issues)

Also, we should consider how logforphp[1] and Monolog[2] are implemented. They use concepts of channelling log information, putting them in category and write to different logging stores. This will help us to create flexible system with can collect rich data for analysis, research and debugging. Also, we should consider logging general information in stream/file and important information in db to avoid performance issues.

We should also consider following standards like PSR-3 or RFC-5424

Rajesh Taneja (rajeshtaneja)

Logging plugins diagram

Logging plugins relation

Configuration example:

Imagine admin created 2 instances of each "DB log storage", "Filesystem log storage" and "Log storage with built-in driver" and called them dbstorage1, dbstorage2, filestorage1, filestorage2, universalstorage1 and universalstorage2

DB log storage::log_storage_instances() will return:

  • dbstorage1
  • dbstorage2

Simple DB log driver::log_storage_instances() will return:

  • dbstorage1 (Simple DB log driver)
  • dbstorage2 (Simple DB log driver)

When Report1 is installed it will query all plugins that implement both functions: log_storage_instances() and get_log_records(), and then invoke log_storage_isntances() from each of them and return the list for admin to choose the data source from:

  • filestorage1 (Records DB log driver)
  • filestorage2 (Records DB log driver)
  • dbstorage1 (Records DB log driver)
  • dbstorage2 (Records DB log driver)
  • filestorage1 (FS to DB log driver)
  • filestorage2 (FS to DB log driver)
  • universalstorage1
  • universalstorage2

Marina Glancy 11:01, 16 May 2013 (WST)

Minor comments

The log and event data structure that Petr proposed to me initially looked good (though it does seem to be missing date / time which I am guessing is just an oversight). But after talking to Rajesh about querying the data I think that I might be more in favour of Some sort of flat system that doesn't require serialising the data. Adrian Greeve 09:32, 17 May 2013 (WST)

Fred's part

Logging

Data the logger should receive

Those are the information that should be passed on to the logging system. For performances, it's probably better that this information arrives full and that no more queries should be performed to retrieve, say, the course name. The more information we're passing, the more we can store. Once this is defined, then we will know what an event has to send when it's triggered.

Mandatory fields *

  • Event specification version number*
  • Event name*
  • Type*
    • Error
    • User action
      • Procedural action
      • Manual action
    • System log
  • Datetime* (milliseconds?)
  • Context ID*
  • Category ID
  • Category name
  • Course ID*
  • Course name
  • Course module ID*
  • Course module name
  • Component*
    • core
    • course
    • mod_assign
    • mod_workshop
  • Subject* (Defines the object on which the action is performed)
    • user
    • section
    • assignment
    • submission_phase
  • Subject ID*
  • Subject name (Human readable identifier of the object: Mark Johnson, Course ABC, ...)
  • Subject URL (URL to notice the changes)
  • Action* (The exact action that is performed on the object)
    • created
    • moved (for a course, a section, a module, a user between groups)
    • submitted (for an assignment, or a message)
    • ended (for a submission phase for instance)
  • Actor* (user, cli, cron, ...)
  • Actor ID/User ID* (ID assiacted to the actor, typically the user id)
  • Real actor/user ID* (When logged in as, store the real user ID)
  • Actor IP address*
  • Associated object (Object associated to the subject. Ie: category of origin when a course is moved. User to whom a message is sent.)
    • section
    • category
    • user (to whom you sent a message)
  • Associated object ID
  • Transaction type* (CRUD)
    • create
    • read
    • update
    • delete
  • Level* (Not convinced that we should use that, because it's very subjective, except if we have guidelines and only a few levels (2, max 3))
    • major
    • normal
    • minor
  • Message (human readable log)
  • Data (serialized data for more information)
  • Current URL

We have to decide how extensively we want to provide information to external support. For example, when an assignment is submitted, we could provide the URL to the assignment. But that means that this event specifically has to receive the data, or the module has to provide a callback to generate the information based on the event data. Also, external systems cannot call the assignment callback.

In any case, not everyone is going to be happy. The more processing to get the data, the slower it gets. The less data we provide, the less information can be worked on...

Also, if we provide slots for information, but most of the events do not use them, the data becomes unusable as it is inconsistent except for the one event completing the fields.

Fields policy

We could define different levels of fields to be set depending on CRUD. For example, if an entry is deleted, we might want to log more than just the entry ID, but also its content, its name etc... surely the data field could contain most of it, but we still need to define policies.

Events

Important factors

  1. Processing has to be very quick
  2. Developers are lazy and so the function call should be as easy as possible
  3. Validating the data is very costly, we have to avoid that
  4. An event should not be defined, but triggered, for performance reason

An event is not defined, it is fired and that's the only moment the system knows about it. Two different components should not define the same action, for example: enrol/user:created and core/user:created. The user:created event should be triggered in the core method creating a user, once and only.

Triggering an event

Very basic and quick example of how we could now trigger the events, filling up the required data and parsing the rest to allow for the developers to quickly trigger one without having to provide extensive information.

event_trigger('mod_assign/assignment:submitted', $assignmentid, $extradata);

/**

* Trigger an event
*
* An event name should have the following form:
* $component/$subject:$action
*
* The component is the frankenstyle name of the plugin triggering the event.
* The subject is the object on which the action is performed. If the name could be
*     bind the the corresponding model/tablename that would be great.
* The action is what is happening, typically created/read/updated/deleted, but
*     it could also be 'moved', 'submitted', 'loggedin', ...
*
* The parameter $subjectid is ideally the ID of the subject in its own table.
*
* @param string $name of the event
* @param string $subjectid ID of the subject of the event
* @param array|stdClass of the data to pass on
* @param int $level of the event
* @return void
*/

function event_trigger($name, $subjectid, $data = null, $level = LEVEL_NORMAL) {

   global $USER;
   $data = (object) $data;
   // For the specification 2, here are some hardcoded values.
   $data->eventname = $name;
   $data->version = 2;
   $data->type = 'event';
   // Get the component, event, action and subject.
   // $name = $component/$action:$subject
   // $event = $action:$subject
   list($component, $event) = explode('/', $value, 2);
   list($subject, $action) = explode(':', $event, 2);
   $data->component = $component;
   $data->subject = $subject;
   $data->action = $action;
   $data->subjectid = $subjectid;
   $data->level = $level;
   // Defaults.
   if (!isset($data->time)) {
       $data->time = microtime(true);
   }
   if (!isset($data->contextid)) {
       $data->contextid = $PAGE->get_contextid();
   }
   if (!isset($data->courseid)) {
       $data->courseid = $PAGE->get_courseid();
   }
   if (!isset($data->moduleid)) {
       $data->courseid = $PAGE->get_coursemoduleid();
   }
   if (!isset($data->actor) && !isset($data->actorid)) {
       $data->actor = 'user';
       $data->actorid = $USER->id;
   } else {
       throw Exception('Actor and Actor ID must both be specified');
   }
   if (session_is_loggedinas()) {
       $data->realuserid = session_get_realuser()->id;
   }
   // It would be preferrable that this data is not computed on the fly,
   // but I have added that for now so that the dev doesn't have to set
   // one extra parameter.
   if (!isset($data->crud)) {
       if ($data->action == 'created' || $data->action == 'added') {
           $data->crud = 'c';
       } else if ($data->action == 'read' || $data->action == 'viewed') {
           $data->crud = 'r';
       } else if ($data->action == 'updated' || $data->action == 'edited') {
           $data->crud = 'u';
       } else if ($data->action == 'deleted' || $data->action == 'removed') {
           $data->crud = 'd';
       } else {
           throw Exception('Crud is needed!');
       }
   }
   // Not mandatory, but possible to be guessed.
   if (!isset($data->currenturl)) {
       $data->currenturl = $PAGE->get_url();
   }
   dispatch_event($name, $data);

}

Using such a function would:

  • Force the developer to send the minimal required information via parameters
  • Guess the fields such as component, action or subject from the event name (this processing is really small and easy versus the overhead of asking the dev to enter those)
  • Allow for any information that can be guessed by the system to be defined in the $data parameter
  • Allow any extra information to be defined in the $data parameter

An alternative way would be send an event object to this method. An event object would be a class event_create() for example, in which should be set all the required variables. I didn't get in that direction because I think it's taking up a bit more memory, but also this could be more work for developers just to trigger an event.

Both ways have definitely their pros and cons.

Validation of event data

I don't think we should validate the information passed as this processing is relatively expensive. But we should be strict on the fact that extra data should be set in $data->data. A solution to prevent developers from abusing the stdClass properties, is to recreate the object based on the allowed keys, that's probably cheap. dispatch_event() could be the right place to do that.

Deprecate Cron events

An event is supposed to happen when it has been triggered, not being delayed. I think we should leave the plugins handle the events the way they want it, and schedule a cronjob is they want to delay the processing. Of course, we should still support them for a while.

Backwards compatibility

If the event naming convention changes, this could be a bit trickier to achieve. Especially to ensure that 3rd party plugins trigger and catch events defined by other 3rd party plugins.

The old method events_trigger() should:

  • Be deprecated;
  • Trigger the old event;
  • Not trigger the new events as it's tricky to remap the data to form the new $data parameter.

The new method to trigger events should

  • Trigger the new event;
  • Trigger the old-style event for non-updated observers;
  • Keep the old $eventdata for old-style events.

Core events triggered

git gr events_trigger\( | sed -E 's/.*events_trigger\(([^,]+),.*/\1/' | sort | uniq

Frédéric Massart 11:07, 20 May 2013 (WST)

Logging bulk actions.

Just an off thought that I had. Some (if not all) of the functions that are called from the events_trigger only handle single actions (e.g. user_updated). To save a lot of processing, can we make sure that multiple updates / deletions / creations are done as one? Adrian Greeve 09:32, 17 May 2013 (WST)

Why observer priorites?

As discussed yesterday, I propose we must have priorities associated with event observers/handlers. This leads to possiblity of events being used by plugins and core extensively in future. A simple idea could be to re-write the completion system to use the new event infrastucture instead of making its on calls to view pages etc. Here is a simple use case that shows why priorities are needed:-

Consider a student called "Derp" enrolled in a course called "Fruits" which has two activites "Bananas" and "kiwi fruit"
Derp completed the activity “Bananas” which triggers a event (say 'activity_complete') with two observers (say A and B). 
Observer A unlocks the other activity “Kiwi fruit” if activity “Bananas” is completed and if user is enrolled in course “Veggies”
Observer B enrols user in course “veggies” if activity “Bananas” is completed.
Without observer priorities observer B can be notified of the change before observer A or the other way around leading to conflicting and undesirable results.

Ankit Agarwal 10:48, 17 May 2013 (WST)

Can we support Tincan Experience Api via a retrival plugin/reporting tool

This section is a work in progress. Here is an example of tincan API call:-

Derp attempted 'Example Activity'
{
    "id": "26e45efa-f243-419e-a603-0c69783df121",
    "actor": {
        "name": "Derp",
        "mbox": "mailto:depr@derp.com",
        "objectType": "Agent"
    },
    "verb": {
        "id": "http://adlnet.gov/expapi/verbs/attempted",
        "display": {
            "en-US": "attempted"
        }
    },
    "context": {
        "contextActivities": {
            "category": [
                {
                    "id": "http://corestandards.org/ELA-Literacy/CCRA/R/2",
                    "objectType": "Activity"
                }
            ]
        }
    },
    "timestamp": "2013-05-17T04:02:18.298Z",
    "stored": "2013-05-17T04:02:18.298Z",
    "authority": {
        "account": {
            "homePage": "http://cloud.scorm.com/",
            "name": "anonymous"
        },
        "objectType": "Agent"
    },
    "version": "1.0.0",
    "object": {
        "id": "http://www.example.com/tincan/activities/aoivGYMz",
        "definition": {
            "name": {
                "en-US": "Example Activity"
            },
            "description": {
                "en-US": "Example activity definition"
            }
        },
        "objectType": "Activity"
    }
}

Ankit Agarwal 12:20, 17 May 2013 (WST)

Performance

I would like to see this spec propose some performance criteria at that start, that can be testing during development. For example:

  • How many writes / events per second should a log back-end be able to handle. (I know, it depends on the hardware.)
  • On a typical Moodle page load, how much of the load time should be taken up by the add_to_log call? Can we promise that it will be less than in Moodle 2.5?

I guess that those are the two main ones, if we can find a way to quantify them. Note that performance of add_to_log on a developer machine is not an issue. The issue really comes with high load, where there might be lock contention that slows down log writes.--Tim Hunt 15:16, 17 May 2013 (WST)

Existing log writing analysis

add_to_log

  • course id
  • module name (forum, journal, resource, course, user etc)
  • action (view, update, add, delete, possibly followed by another word to clarify)
  • url (file and parameters used to see the results of the action)
  • info (additional description information)
  • cm (course module ID)
  • user (if the user is different to $USER)


If the user is logged in as someone else than no logs are created.

If $CFG->logguests is set and false then no logs are created.

Multiple checks are done on the different fields to ensure that the data is ok for inserting.

try catch for inserting the record (insert_record_raw - for no checking, insert into the database as quickly as possible)

if there is an error a debugging message is sent. if $CFG->supportemail is true and $CFG->noemailever is not set then an email is sent with a limit of one email a day.

See the function add_to_log for further details. lib/datalib.php

user_accesstime_log

  • course id

no log created if the user isn't logged in, is logged in as someone else, or is a guest user.

updates the user table with (update_record_raw - see above) and changes $USER->lastaccess at the same time.

user ID Last ip address last access time

Checks made on user_lastaccess table and inserts / updates are made as appropriate.

See the function user_accesstime_log for further details. lib/datalib.php

Adrian Greeve 09:47, 20 May 2013 (WST)

Object-oriented model of events

This is the proposal for object-oriented events:

  • All events descriptions are objects extending event_base (which is defined in core)
  • Object class name is unique identifier of the event
  • Class name is pluginfullname_event_xxx. Core events will have prefix 'moodle'
  • Plugins store each event object in plugindir/classes/event/xxx.php (Petr is considering autoloading and/or namespaces)
  • event_base also implements cacheable_object

Event object has:

  • the protected properties that Fred is working on, also public getter methods for each of them
  • properties 'legacyname' and 'legacydata'
  • Two constructors (or static methods creating an instance):
    • to create event that is about to be triggered with autofilling constant and/or default data for this particular event
    • to restore event object after it has been stored in log. No default attributes here, it may even be declared final
  • Function to trigger event. Also static method that creates an instance and triggers event - for lazy developers.
  • Static function returning human readable event name, preferably using get_sting (lang_string)
  • Static function returning human readable event description
  • Function returning human readable event contents (to be used on events restored from log)
  • Function checking if current user is able to see this event (to be used on events restored from log)

Yes, this is slightly slower from performance point of view than passing an array/stdClass with data. But the difference is not so big and such structure is easier to understand and implement for developers and at the same time solves a lot of problems when displaying logs. Only constructor and trigger functions are used when event is actually triggered. Everything else might or might not be used when displaying the logs.

Backward compartibility:

  • Function events_trigger() will create and trigger an instance of event_legacy class
  • For events that already exist in Moodle 2.5 the additional legacy information should be added to the event data (in properties 'legacyname' and 'legacydata')

Event handlers:

  • Event handlers can be described as it is done now in plugindir/db/events.php, this file is parsed during install/upgrade of plugin and all handlers are removed on uninstall
  • It is possible to subscribe to all events (*)
  • Event handlers can also have an attribute 'sortorder' (positive or negative, default 0). At the same time admin can overwrite the handlers sequence
  • If event handler refers to the old (2.5) name of event, it will be used only for events that contain legacy name and data. If it refers to 2.6 event class name it will be used with full data
  • Core contains class events_manager and function get_events_manager() that returns a single instance of it. This class is responsible for retrieving/updating information about event handlers, processing event triggers, managing the events queue, processing events in cron.
  • events_manager has methods to dynamically register/unregister handlers but they are not recommended (same as it is with dynamic caches definitions). Ideally they should only be used in unit and/or behat tests

How to store and retrieve the list of handlers is the most vulnerable part for performance. AFAIK Petr already has ideas about it.

And I want to emphasize again (also Tim raised this point again), that performance of storing and retrieving of the events data in log tables is the responsibility of plugins and not core. We will include in core a plugin that repeats the 2.5 functionality and stores events with legacy data in table {log}. Also we will develop three plugins:

  • events dispatcher that allows to configure the filter which events to store in log
  • DB log storage with auto-archiving of the old data
  • driver for DB log storage that given the time range and other filters returns the list of events from DB log storage.

At the same time we will rewrite all reports that use table {log} to the new engine and move the functions that use {log} table to the legacy log plugin.