Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Logging API proposal

From MoodleDocs
Revision as of 14:40, 11 February 2013 by sam marshall (talk | contribs) (New proposal, first version)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

I (sam marshall, Open University) would like to propose a change to the logging API for Moodle 2.5 (or 2.6 depending on timeframe), which we would code. This might be dependent on our developer availability.

Introduction

This proposal address the problem that Moodle currently stores all log information in a database table.

Problem

The current Moodle logging API stores all log information in a database table (mdl_log). As a result, there are a large number of database writes even when carrying out ‘read-only’ requests. This is seen by system administration staff as a key factor that puts heavier load on the database and prevents Moodle scaling to large instances (either due to cost limitations, or even to limitations on the database server capacity that is technically possible within certain platforms). Databases store information extremely robustly (in terms of protection against loss or corruption), which is not necessary for log information.

To give an example of the scale of this problem, I looked at the number of rows in the mdl_log table for one 24-hour period in the current live OU system: it is about 1,600,000. If spread evenly through the day, this would be about 20 rows per second; the actual peak that day was 69 rows in one second. By comparison other tables in the system that receive heavy write access look positively miniscule (for the same time period, our ForumNG posts table only had about 18,000 new entries). And some deployments of Moodle – or cases where people are considering Moodle for deployment – will expect significantly higher usage than the OU.

There are other problems with the logging API, such as:

  • Information has to be stored in a limited number of restricted-size fields with little structure. It isn’t possible to add more specific fields for certain purposes.
  • There’s no way to programmatically describe the contents of these fields other than db/log.php, which is very limited.
  • Some people would like to increase the quantity of logged data (to provide more detail about a student’s actions), and conversely to have the ability to filter logged data to remove unwanted data.

My proposal does not directly address any of these other problems, although it has been designed to consider these likely improvements in future. I suggest these should be addressed later in a separate development. Although important, these other problems are not critical; they do not place a serious barrier against scaling Moodle for large usage.

Solutions

Different solutions are appropriate to different cases:

  1. Logging to a database table (current situation): May be convenient for small sites because you already need to set up a database, so you don’t have to set up logging separately.
  2. Logging to a file: Will improve performance and reduce database load.
  3. Logging to a separate database: Useful if you need full database power on log entries, but want to separate it from your normal Moodle database for ease of backup/wiping/etc.
  4. Logging to a file (multi-server): When using multiple front-end servers it may be desirable to log information to files then combine them, or something complicated.
  5. Logging to a separate log server or system log facility: It might be desirable to send log information to some other system component. This could be a standard operating system component or something that is part of the infrastructure at a specific institution.

In the case of 3, 4, and 5 these are only relevant for large installations and may depend heavily on local infrastructure.

Proposal

My proposal is:

  • Implement a plugin system for logging.
  • As part of core Moodle, implement plugins for logging to a database table and to a file (1 and 2 above).
  • The default behaviour should remain as at present (logging to database table with same name); this will simplify upgrades and avoid any problem with existing integrations.
  • Third party Moodle users can implement more complex plugins, such as those in 3-5 above, if required.

Implementation

API: basic developer perspective

  • New function get_log_system() requires the necessary library files and returns the log system class.
  • The log system contains functions to log data and access logged data.
  • Example: get_log_system()->log($courseid, $component, $action, $url, $info, $cm);
  • add_to_log will be implemented using this function.

Reduced functionality

Logging systems are generally designed to write data efficiently and not to read it. But there are parts of Moodle code which rely on high-speed access to the log table, including SQL queries. (An example is the information about recent activities on a course.) These cannot be implemented when logging uses a different method.

There should be no reduction in functionality when database logging is used, but it might be necessary to reduce functionality in other cases.

  • Log system will include two methods to check whether this type of access is available.
  • Possible method names: ->can_read_data($date), ->can_query_data($date).
  • The intention is that can_read_data will be true for the standard database and file implementation, while can_query_data returns true only for the database implementation.
  • All core code which reads mdl_log will be modified to add calls to these functions and behave appropriately, reducing functionality if necessary.
  • Example: if can_query_data returns false then the recent activities block will automatically hide itself from student view, and in editing mode will display information such as ‘Recent activity information is not available unless database logging is enabled’.
  • Example: If can_query_data returns false, but can_read_data returns true, then the log viewing screens may contain reduced search options.
  • Example: If can_read_data returns false, the log viewing screens will give an error similar to the one above.
  • Date information is present so that plugins can control the data retention period and this can be reported suitably to the user.

Because there is no requirement for plugins to implement any of the ‘read’ features, it will be very easy to write custom plugins (for an institution that does not intend to use the standard Moodle log viewing features).

Backward compatibility (code)

The current logging API is documented at Logging_API.

The short version is that if database logging is used, these functions should all continue to work without change. It might be desirable to deprecate them and recommend alternative functions inside the log_system class, but this is entirely optional.

  • While it suffers some similar problems, the user_accesstime_log function is actually a different issue (it doesn’t use the mdl_log table) and is out of scope of this change.
  • The add_to_log function will be implemented using the new API and will continue to work.
  • The get functions will continue to work if database logging is used. If another form of logging is used, some of the functions might throw exceptions. Anyone using these functions will be expected to check the ‘can_...’ functions mentioned above first.

Backward compatibility (whole system)

The database logging plugin is essentially a special case and should be handled to ensure complete backward compatibility. This is beneficial because there are likely to be many cases where Moodle users have done custom integration on the mdl_log table (outside Moodle code), such as transferring data to remote systems, deleting old data at specific times, or applying custom database configurations.

  • There will be no change at all to the mdl_log table (even though logically it should really be named something like logger_db).
  • The db logger will be the default for all existing Moodle installs. It will probably also be the default for new Moodle installs (although this could be determined).
  • It is probably not important to maintain the exact location of settings related to the db logger; the values of these settings will be retained during upgrade, but the settings will be moved to within the db logger plugin.

API: log entry

There will be a new class called log_entry (or similar) which is used when adding something to the log. (There will be shortcut methods so you don’t have to manually construct one of these if you just want add_to_log.)

This will contain most of the existing data from log entries (component, cm, time, etc.), some of which can normally be set automatically by the constructor.

Full list of fields with example values:

  • time

1360119273

  • userid

151375

  • ip

255.255.99.99 (or IPv6?)

  • courseid (in mdl_log: course)

200516

  • component (in mdl_log: module)

forumng

  • instanceid (in mdl_log: cmid)

75508

  • action

view

  • data

array(‘url’ => ‘view.php?id=75508’, ‘info’ => ‘15065’)

In some cases I changed the field names from those in mdl_log, but the intention is that the actual content of data, if it is logged using the same add_to_log command, will remain identical to mdl_log to ensure backward compatibility. (To recap, there will be no changes to the field names within the mdl_log table itself.) The differences are:

  • courseid – field is actually called ‘course’ in mdl_log, renaming to current Moodle convention.
  • component – field is called ‘module’ in mdl_log; should contain the raw module name (e.g. forumng) for modules, and the full Frankenstyle name (e.g. block_whatever or core_whatever) for other parts of the sysetm.
  • instanceid – field is called ‘cmid’ in mdl_log; name is chosen to be consistent with context table. The intention is that this goes with the component; for any module component it will be the cmid, but e.g. blocks may use it for block instance id in future.
  • data – contains the ‘url’ and ‘info’ fields from table; see below.

The ‘data’ field is intended for arbitrary extra data that is not necessarily relevant for each log entry as key-value string pairs. In this proposal I intend to use this for the ‘url’ and ‘info’ fields only (and the database logger will only use those fields to create its output; other fields will be ignored) but you can see that this could be used to help implement future features if more detailed information is logged. It could also be used by third-party developers if they are creating both a custom plugin, and a custom logger, and want to communicate between them. (Note: We don’t plan to do this at the OU at present, but I can imagine it being a useful option.)

API: plugin perspective

The plugin structure is as follows:

  • There are a number of logger plugins (plugin type ‘logger’, such as logger_db and logger_file).
  • These are stored in moodle/logger or moodle/admin/logger.
  • Each plugin can be enabled or disabled and the sort order can be configured, similar to auth plugins. The result is a comma-separated list of enabled plugins in $CFG->logger.
  • Each plugin has its own standard Moodle settings which you can configure regardless of whether it’s enabled or not (so you can do the settings then enable it).
  • Plugins also can have standard lang files, db files, cron, etc. as necessary.

Processing of a log entry works like this:

  • The log system sends the entry to first plugin in the list.
  • If it returns true, entry is also sent to next plugin, and so on. (This technically allows log filters to be implemented but I’m not necessarily suggesting this is the best place for them.)
  • Throwing an exception counts as returning true, except that an additional log entry will be raised to report the exception. Exceptions that are thrown while reporting this log system exception will be ignored.

The reason for allowing multiple loggers is that you might want to do something like, enable database logging but have it delete all entries older than 7 days, and use file logging for longer-term logs. (If you were concerned about storage size of the db table, but not about write performance, and you wanted to enable the ‘db features’.)

The ‘get’ methods will be implemented like this:

  • Find the first plugin for which can_read or can_query (as appropriate) returns true, if any. (If there are none, throw exception.]
  • Call method on that plugin.

File logger options

I propose that a basic file logger should be implemented as part of Moodle core.

  • Log files will be stored in a customisable location on disk (not within Moodle filesystem). The default may be dataroot/logs.
  • Within this folder, files will be stored in a structure that supports reading and avoids having too many files in a folder; e.g. there may be a folder per course id and then one dated log file per day inside that folder.
  • The logger will support reading (so that Moodle logs can be displayed inside the system) but not querying.
  • An option will allow the reading support to be disabled in case of multi-server use where the logs on different servers would be different so reading them doesn’t make sense.
  • Log data will be written one line per entry as a UTF-8 format text file (with suitable escaping of data if required). The log text will be minimally readable, e.g. we will format time (using server time zone) instead of just writing the number.
  • An option will support automatic deleting of log files after a certain date.

This is a simple implementation intended for single servers, or for multiple servers if you’re prepared to deal with having separate log files for each one. Specifically, the logger will not guarantee correct operation on shared storage. This will be documented in the settings page.

Summary of proposed changes

  • New plugin type.
  • New log system API (probably lib/logsystemlib.php) and new function in moodlelib.
  • ‘logger_db’ plugin incorporates code from existing add_to_log, and some existing system settings and cron behaviour related to mdl_log.
  • ‘logger_file’ plugin (new code).
  • Implement existing log API functions in terms of new API. (Optionally, move to deprecated area if it is decided to deprecate them.)
  • Find all existing uses of existing log API functions and mdl_log table in core code. Add checks for the ‘can_...’ functions in log_system and handle suitably. (The largest part of this work will be the log viewing screens, but smaller areas throughout Moodle will also be affected.) Change core code to use new best practice when logging data.

Conclusion

  • By making these changes we can give sites the option to remove a critical performance bottleneck by dramatically reducing the number of database writes from a Moodle installation.
    • This may improve performance on smaller sites, and will make very large installations more feasible.
  • Because the default is to continue with database logging, and there is no change to the mdl_log table, it is likely that the change will cause few problems for existing Moodle sites even if they have special handling in this performance-sensitive area.
  • With default options there will be no reduction in existing Moodle functionality for staff.
  • The scope of this change is kept to a minimum in order to make implementation feasible, but it lays the groundwork for potential future enhancements such as detailed/specific log information and log filtering.