Difference between revisions of "Talk:Logging 2"

Jump to: navigation, search
(Why observer priorites?)
Line 212: Line 212:
 
== Why observer priorites? ==
 
== Why observer priorites? ==
 
As discussed yesterday, I propose we must have priorities associated with event observers/handlers. This leads to possiblity of events being used by plugins and core extensively in future. A simple idea could be to re-write the completion system to use the new event infrastucture instead of making its on calls to view pages etc. Here is a simple use case that shows why priorities are needed:-
 
As discussed yesterday, I propose we must have priorities associated with event observers/handlers. This leads to possiblity of events being used by plugins and core extensively in future. A simple idea could be to re-write the completion system to use the new event infrastucture instead of making its on calls to view pages etc. Here is a simple use case that shows why priorities are needed:-
 
+
<pre>
 
Consider a student called "Derp" enrolled in a course called "Fruits" which has two activites "Bananas" and "kiwi fruit"
 
Consider a student called "Derp" enrolled in a course called "Fruits" which has two activites "Bananas" and "kiwi fruit"
 
Derp completed the activity “Bananas” which triggers a event (say 'activity_complete') with two observers (say A and B).  
 
Derp completed the activity “Bananas” which triggers a event (say 'activity_complete') with two observers (say A and B).  
Line 218: Line 218:
 
Observer B enrols user in course “veggies” if activity “Bananas” is completed.
 
Observer B enrols user in course “veggies” if activity “Bananas” is completed.
 
Without observer priorities observer B can be notified of the change before observer A or the other way around leading to conflicting and undesirable results.
 
Without observer priorities observer B can be notified of the change before observer A or the other way around leading to conflicting and undesirable results.
 +
</pre>
 +
[[User:Ankit Agarwal|Ankit Agarwal]] 10:48, 17 May 2013 (WST)

Revision as of 02:48, 17 May 2013

Basically what's in here looks fine to me. I wrote a proposal Logging API proposal which includes some of these changes as a single development chunk, and should not cause problems for others implementing the rest of it later. The main focus of my proposal is providing the option to move logs out of the database (because the number of database writes is probably Moodle's single worst performance characteristic) but it should help enable the other enhancements from this proposal as well, in future.

sam marshall 19:40, 12 February 2013 (WST)

Summary: Events/Logging API 2.6

Logging is implemented in form of event-listening by plugins. Plugins listening to the events may be:

  • generic logging plugin implementing interface allowing other plugins to query information
  • reporting plugin that aggregates only information that it needs and stores it optimised for its queries

Our tasks:

  1. Events API must ensure that everything that potentially may be interesting to the plugins is included in event data (but does not query extra data). There should be as fast as possible to get the list of listeners to notify: Moodle -> Logs
  2. Logging API is the form of plugins communication to the generic logging systems. It might not be extra effective but it stores and allows to retrieve everything that happens: Logs -> Moodle
  3. We make sure that everything that is logged/evented now continues to do so + log much more
  4. Create at least one plugin for generic logging

Marina Glancy 13:05, 15 May 2013 (WST)

New logging proposal

Current problems

  1. many actions are not logged
  2. some logged actions do not contain necessary information
  3. log storage does not scale
  4. hardcoded display actions, but only plugins know how to interpret data
  5. it is not possible to configure level of logging
  6. performance

Related problems

Events

Events are triggered in few places mostly for write actions only. The event data structure is inconsistent. Cron events are abused, stale and slow. The error states and transaction support is problematic.

Statisticts

Statistics processing is relatively slow. The results are not reliable becuase the algorithm does not fully understand the logged data.

Possible solution

We could either improve current system or implement a completely new logging framework. Improvements of the current log table and api cannot resolve all existing problems. Possible long term solution may be to split the logging into several self-contained parts:

  • triggering events - equivalent to current calls of add_to_log() function
  • storage of information - equivalent to current hardcoded storage in log database table
  • fetching of information - equivalent to reading from log table
  • processing of log information - such as processing of statistics
  • reporting - log and participation reports

We could solve the logging and event problems at the same time. Technically all log actions should be accompanied by event triggers. The logging and event data are similar too.

Log and event data structures

Necessary event information:

  1. component - the frankenstyle component that understands the action, what changed
  2. action - similar to areas elsewhere
  3. user - who is responsible, who did it, current user usually
  4. context - where the event happened
  5. data - simple stdClass object that describes the action completely
  6. type - create/read/update/delete/other (CRUD)
  7. level - logging level 1...
  8. optional event data - 2.5 style event data for backwards compatibility

Optional information:

  1. studentid - affected student for fast filtering
  2. courseid - affected course

Backwards compatibility

add_to_log() calls would be ignored completely with debuggin message in the future. Old log table would be kept unchnaged, but no new data would be added.

Old event triggers would generate legacy events.

Event triggers would be gradually updated to trigger new events with more meta data, event may optionally contain legacy event data and legacy even name. The legacy events would be triggered automatically after handling new event.

The event handle definition would be fully backwards compatible, new events would be named component/action. All new events would be instant.

Cron events

The current cron even design is overcomplicated and often results in bad coding style. Cron events should be deprecated and forbidden for new events. Cron events also create performance problems.

Database transactions

At present there is a hack that delays events via cron if transaction in progress. Instead we could create buffer for events in memory and wait until we are notified from DML layer that transaction was committed or rolled back.

It is important here to minimise the event data size.

Auxiliary event data

At present we are sometimes sending a lot more even data because we may need things like full course records when even reference some course. This prevents repeated fetching of data dromdatabase.

This is bad for several reasons:

  • we can not store this onformation in logs - it is too big and not necessary
  • the data may get stale - especially in cron, but also other handlers may modify the data (event from event), the db transactions increase the chances for atle data too

The auxiliary data should be replaced by some new cache or extension of dml level - record cache. It could be prefilled automatically or manually whentriggering events.

Infinite recursion trouble

One event may modify data which results in new even which triggers the same event. We detect this somehow and stop event processing.

Catch all event handler

We need a '*' handler definition, this would allow log handlers to processes any subset of events without core modifications.

Handler definitions

At present handlers are stored in database table during installation and upgrade. That is not necessary any more and we could use MUC instead. The list of all handlers will be always relatively small and should fit into RAM.

Handler definitions should contain priority.

Implementation steps

  1. redesign events internals while keeping bc
  2. new logging API
  3. convert log actions and event triggers
  4. inplement log plugins

Petr Škoda (škoďák)

Event logger or (event and logging)

I agree with Petr's proposal to enhance event system, but I think event and logging should be kept separate. Mixing them will force developer to pollute event system which might lead to race-around problem.

With current proposal we are not considering logging event-less situations like sending email after forum is posted, or debug/exception or cron status. They all can be achieved by triggering an event but should be pollute event system for some use-cases like:

  1. Logging memory usage while executing an api (create_user, view create user page)
  2. Logging mail contents (for forum etc.)
  3. Logging debug information for replicating issue (Help admin and moodle developer to fix issues)

Also, we should consider how logforphp[1] and Monolog[2] are implemented. They use concepts of channelling log information, putting them in category and write to different logging stores. This will help us to create flexible system with can collect rich data for analysis, research and debugging. Also, we should consider logging general information in stream/file and important information in db to avoid performance issues.

We should also consider following standards like PSR-3 or RFC-5424

Rajesh Taneja (rajeshtaneja)

Logging plugins diagram

Logging plugins relation

Configuration example:

Imagine admin created 2 instances of each "DB log storage", "Filesystem log storage" and "Log storage with built-in driver" and called them dbstorage1, dbstorage2, filestorage1, filestorage2, universalstorage1 and universalstorage2

DB log storage::log_storage_instances() will return:

  • dbstorage1
  • dbstorage2

Simple DB log driver::log_storage_instances() will return:

  • dbstorage1 (Simple DB log driver)
  • dbstorage2 (Simple DB log driver)

When Report1 is installed it will query all plugins that implement both functions: log_storage_instances() and get_log_records(), and then invoke log_storage_isntances() from each of them and return the list for admin to choose the data source from:

  • filestorage1 (Records DB log driver)
  • filestorage2 (Records DB log driver)
  • dbstorage1 (Records DB log driver)
  • dbstorage2 (Records DB log driver)
  • filestorage1 (FS to DB log driver)
  • filestorage2 (FS to DB log driver)
  • universalstorage1
  • universalstorage2

Marina Glancy 11:01, 16 May 2013 (WST)

Minor comments

The log and event data structure that Petr proposed to me initially looked good (though it does seem to be missing date / time which I am guessing is just an oversight). But after talking to Rajesh about querying the data I think that I might be more in favour of Some sort of flat system that doesn't require serialising the data. Adrian Greeve 09:32, 17 May 2013 (WST)

What should a log be able to store?

To me (Fred), those are the information that should be passed on to the logging system. For performances, it's probably better that this information arrives full and that no more queries should be performed to retrieve, say, the context ID. The more information we're passing, the more we can store. Once this is defined, then we will know what an event has to send when it's triggered.

  • Type
    • Error
    • User action
    • System log
  • Datetime (milliseconds?)
  • Context ID
  • Course ID
  • Course module ID
  • Component
    • core
    • course
    • mod_assign
    • mod_workshop
  • Subject (Defines the object on which the action is performed)
    • User
    • Section
    • Assignment
    • Submission phase
  • Subject ID
  • Action (The exact action that is performed on the object)
    • Created
    • Moved (for a course, a section, a module, a user between groups)
    • Submitted (for an assignment, or a message)
    • Ended (for a submission phase for instance)
  • Actor (User performing the operation, or CLI, or Cron, or whatever)
  • Associated object (Object associated to the subject. Ie: category of origin when a course is moved. User to whom a message is sent.)
    • Section
    • Category
    • Other user
  • Associated object ID
  • Type (CRUD)
    • Create
    • Update
    • Read
    • Delete
  • Level (Not convinced that we should use that, because it's very subjective, except if we have guidelines and only a few levels (2, max 3))
    • Major
    • Normal
    • Minor
  • Message (human readable log)
  • Data (serialized data for more information)

Frédéric Massart 12:10, 16 May 2013 (WST)

Logging bulk actions.

Just an off thought that I had. Some (if not all) of the functions that are called from the events_trigger only handle single actions (e.g. user_updated). To save a lot of processing, can we make sure that multiple updates / deletions / creations are done as one? Adrian Greeve 09:32, 17 May 2013 (WST)

Why observer priorites?

As discussed yesterday, I propose we must have priorities associated with event observers/handlers. This leads to possiblity of events being used by plugins and core extensively in future. A simple idea could be to re-write the completion system to use the new event infrastucture instead of making its on calls to view pages etc. Here is a simple use case that shows why priorities are needed:-

Consider a student called "Derp" enrolled in a course called "Fruits" which has two activites "Bananas" and "kiwi fruit"
Derp completed the activity “Bananas” which triggers a event (say 'activity_complete') with two observers (say A and B). 
Observer A unlocks the other activity “Kiwi fruit” if activity “Bananas” is completed and if user is enrolled in course “Veggies”
Observer B enrols user in course “veggies” if activity “Bananas” is completed.
Without observer priorities observer B can be notified of the change before observer A or the other way around leading to conflicting and undesirable results.

Ankit Agarwal 10:48, 17 May 2013 (WST)