Logging API proposal

Jump to: navigation, search

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.


I (sam marshall, Open University) would like to propose a change to the logging API for Moodle 2.5 (or 2.6 depending on timeframe), which we would code. This might be dependent on our developer availability.

Tracker issue: MDL-38135

Forum discussion

This change relates to the Logging 2 proposals.

Introduction

This proposal address the problem that Moodle currently stores all log information in a database table.

Problem

The current Moodle logging API stores all log information in a database table (mdl_log). As a result, there are a large number of database writes even when carrying out ‘read-only’ requests. This is seen by system administration staff as a key factor that puts heavier load on the database and prevents Moodle scaling to large instances (either due to cost limitations, or even to limitations on the database server capacity that is technically possible within certain platforms).

Databases store information extremely robustly (in terms of protection against loss or corruption), which is not necessary for log information. It would be more efficient to store this information somewhere else.

To give an example of the scale of this problem, I looked at the number of rows in the mdl_log table for one 24-hour period in the current live OU system: it is about 1,600,000. If spread evenly through the day, this would be about 20 rows per second; the actual peak that day was 69 rows in one second. By comparison other tables in the system that receive heavy write access look positively miniscule (for the same time period, our ForumNG posts table only had about 18,000 new entries). And some deployments of Moodle – or cases where people are considering Moodle for deployment – will expect significantly higher usage than the OU.

There are other problems with the logging API, such as:

  • Information has to be stored in a limited number of restricted-size fields with little structure. It isn’t possible to add more specific fields for certain purposes.
  • There’s no way to programmatically describe the contents of these fields other than db/log.php, which is very limited.
  • Some people would like to increase the quantity of logged data (to provide more detail about a student’s actions), and conversely to have the ability to filter logged data to remove unwanted data.

My proposal does not directly address any of these other problems, although it has been designed to consider these likely improvements in future. I suggest these should be addressed later in a separate development. Although important, these other problems are not critical; they do not place a serious barrier against scaling Moodle for large usage.

Solutions

Different solutions are appropriate to different cases:

  1. Logging to a database table (current situation): May be convenient for small sites because you already need to set up a database, so you don’t have to set up logging separately.
  2. Logging to a file (local): May improve performance (especially where the database is on a separate machine), and will reduce database load and size.
  3. Logging to a separate database: Useful if you need full database power on log entries, but want to separate it from your normal Moodle database for ease of backup/wiping/etc.
  4. Logging to a file (multi-server): When using multiple front-end servers it may be desirable to log information to files then combine them, or something complicated.
  5. Logging to a separate log server or system log facility: It might be desirable to send log information to some other system component. This could be a standard operating system component or something that is part of the infrastructure at a specific institution.

In the case of 3, 4, and 5 these are only relevant for large installations and may depend heavily on local infrastructure.

Proposal

My proposal is:

  • Implement a plugin system for logging.
  • As part of core Moodle, implement plugins for logging to a database table and to a file (1 and 2 above).
  • The default behaviour should remain as at present (logging to database table with same name); this will simplify upgrades and avoid any problem with existing integrations.
  • Third party Moodle users can implement more complex plugins, such as those in 3-5 above, if required.

Implementation

API: basic developer perspective

  • New function get_log_system() requires the necessary library files and returns the log system class.
  • The log system contains functions to log data and access logged data.
  • Example: get_log_system()->log($courseid, $component, $action, $url, $info, $cm);
  • add_to_log will be implemented using this function.

Reduced functionality

Logging systems are generally designed to write data efficiently and not to read it. But there are parts of Moodle code which rely on high-speed access to the log table, including SQL queries. (An example is the information about recent activities on a course.) These cannot be implemented when logging uses a different method.

There should be no reduction in functionality when database logging is used, but it might be necessary to reduce functionality in other cases.

  • Log system will include two methods to check whether this type of access is available.
  • Possible method names: ->can_read_data($date), ->can_query_data($date).
  • The intention is that can_read_data will be true for the standard database and file implementation, while can_query_data returns true only for the database implementation.
  • All core code which reads mdl_log will be modified to add calls to these functions and behave appropriately, reducing functionality if necessary.
  • Example: if can_query_data returns false then the recent activities block will automatically hide itself from student view, and in editing mode will display information such as ‘Recent activity information is not available unless database logging is enabled’.
  • Example: If can_query_data returns false, but can_read_data returns true, then the log viewing screens may contain reduced search options.
  • Example: If can_read_data returns false, the log viewing screens will give an error similar to the one above.
  • Date information is present so that plugins can control the data retention period and this can be reported suitably to the user.

Because there is no requirement for plugins to implement any of the ‘read’ features, it will be very easy to write custom plugins (for an institution that does not intend to use the standard Moodle log viewing features).

Backward compatibility (code)

The current logging API is documented at Logging_API.

The short version is that if database logging is used, these functions should all continue to work without change. It might be desirable to deprecate them and recommend alternative functions inside the log_system class, but this is entirely optional.

  • While it suffers some similar problems, the user_accesstime_log function is actually a different issue (it doesn’t use the mdl_log table) and is out of scope of this change.
  • The add_to_log function will be implemented using the new API and will continue to work.
  • The get functions will continue to work if database logging is used. If another form of logging is used, some of the functions might throw exceptions. Anyone using these functions will be expected to check the ‘can_...’ functions mentioned above first.

Backward compatibility (whole system)

The database logging plugin is essentially a special case and should be handled to ensure complete backward compatibility. This is beneficial because there are likely to be many cases where Moodle users have done custom integration on the mdl_log table (outside Moodle code), such as transferring data to remote systems, deleting old data at specific times, or applying custom database configurations.

  • There will be no change at all to the mdl_log table (even though logically it should really be taken out of the main install.xml, moved within the plugin, and renamed something like logger_db).
    • In addition to retaining compatibility, this means that no upgrade step will be necessary - an important concern when dealing with potentially large tables.
  • The db logger will be the default for all existing Moodle installs. It will probably also be the default for new Moodle installs (although this could be determined).
  • It is probably not important to maintain the exact location of settings related to the db logger; the values of these settings will be retained during upgrade, but the settings will be moved to within the db logger plugin.

Joins

There may be cases where Moodle code needs to join the log table and other tables, and in some cases it might be desirable to have these continue to work even when using the file-based logger.

  • Specific cases that are needed by Moodle core should be implemented as 'get' methods in the log system, with suitable parameters (more restricted than just an SQL query).
  • When using the database logger, this should result in the same joins as before.
  • When using the file logger this should be handled in a different way (or reported as not supported).

Difficult areas

Some areas of existing Moodle code present a challenge for this system because they require logs to be read in a relatively complex way, but are relatively important so that it might not be appropriate to say they aren't supported when using file logging.

  • Statistics: The log system API should include a suitable API function that allows the statistics system to access data. This should be implemented so that it can still work in the file-based logger.
    • For large systems, the Moodle statistics system is not appropriate (nor are other analysis systems that work on the Moodle servers and database); calculating statistics on the live system can take a significant proportion of capacity and cause problems for people using the system. People running large systems would probably choose to produce log files of some sort and then process them on an external system. However, the statistics system ought to continue to work for people using smaller systems.
  • View user logs: It's possible to view logs by user, which may be an important feature. This will not work efficiently with the file-based logger because it is effectively indexed only by course. However it should still be possible to implement it by scanning the logs for all courses (provided the time period is reasonable and suitable paging is used); it will just be significantly slower.
  • Backup/restore: There is currently an option to include logs in backup. One possibility is to say this is only available when using database logging. Another possibility is to have some kind of callback in the logger so that the backup system can include logs from the database table, or else include files, or whatever else depending on the plugin.

Log view itself will probably be fairly difficult.

API: log entry

There will be a new class called log_entry (or similar) which is used when adding something to the log. (There will be shortcut methods so you don’t have to manually construct one of these if you just want add_to_log.)

This will contain most of the existing data from log entries (component, cm, time, etc.), some of which can normally be set automatically by the constructor.

Full list of fields with example values:

  • time

1360119273

  • userid

151375

  • ip

255.255.99.99 (or IPv6?)

  • courseid

200516

  • component

forumng

  • instanceid

75508

  • action

view

  • data

array(‘url’ => ‘view.php?id=75508’, ‘info’ => ‘15065’)

In some cases I changed the field names from those in mdl_log, but the intention is that the actual content of data, if it is logged using the same add_to_log command, will remain identical to mdl_log to ensure backward compatibility. (To recap, there will be no changes to the field names within the mdl_log table itself.) The differences are:

  • courseid – field is actually called ‘course’ in mdl_log, renaming to current Moodle convention.
  • component – field is called ‘module’ in mdl_log; should contain the raw module name (e.g. forumng) for modules, and the full Frankenstyle name (e.g. block_whatever or core_whatever) for other parts of the sysetm.
  • instanceid – field is called ‘cmid’ in mdl_log; name is chosen to be consistent with context table. The intention is that this goes with the component; for any module component it will be the cmid, but e.g. blocks may use it for block instance id in future.
  • data – contains the ‘url’ and ‘info’ fields from table; see below.

The ‘data’ field is intended for arbitrary extra data (that is not necessarily relevant for every log entry) as key-value string pairs. In this proposal I intend to use this for the ‘url’ and ‘info’ fields only (and the database logger will only use those fields to create its output; other fields will be ignored) but you can see that this could be used to help implement future features if more detailed information is logged. It could also be used by third-party developers if they are creating both a custom plugin and a custom logger, and want to communicate between them. (Note: We don’t plan to do this at the OU at present, but I can imagine it being a useful option.)

API: plugin perspective

The plugin structure is as follows:

  • There are a number of logger plugins (plugin type ‘logger’, such as logger_db and logger_file).
  • These are stored in moodle/logger or moodle/admin/logger.
  • Each plugin can be enabled or disabled and the sort order can be configured, similar to auth plugins. The result is a comma-separated list of enabled plugins in $CFG->logger.
  • Each plugin has its own standard Moodle settings which you can configure regardless of whether it’s enabled or not (so you can do the settings then enable it).
  • Plugins also can have standard lang files, db files, cron, etc. as necessary.

Processing of a log entry works like this:

  • The log system sends the entry to each enabled plugin in order.
  • If any plugin throws an exception, processing continues and an additional log entry will be raised to report the exception. Exceptions that are thrown while reporting a log system exception will be ignored.

The reason for allowing multiple loggers is that you might want to do something like, enable database logging but have it delete all entries older than 7 days, and use file logging for longer-term logs. (If you were concerned about storage size of the db table, but not about write performance, and you wanted to enable the ‘db features’.)

Ordering of the plugins is currently not significant in this area of code (it's significant for the get methods below), but a future enhancement might be to add a 'filter' method to the logger plugin so that you could make a plugin which chooses to prevent further processing.

The ‘get’ methods will be implemented like this:

  • From among the enabled plugins, in sort order, find the first plugin for which can_read or can_query (as appropriate) returns true, if any. (If there are none, throw exception.]
  • Call method on that plugin.

File logger options

I propose that a basic file logger should be implemented as part of Moodle core.

  • Log files will be stored in a customisable location on disk (not within Moodle filesystem). The default may be dataroot/logs.
  • Within this folder, files will be stored in a structure that supports reading and avoids having too many files in a folder; e.g. there may be a folder per course id and then one dated log file per day inside that folder.
  • The logger will support reading (so that Moodle logs can be displayed inside the system) but not querying.
  • An option will allow the reading support to be disabled in case of multi-server use where the logs on different servers would be different so reading them doesn’t make sense.
  • An option will support automatic deleting of log files after a certain date.
  • Log data will be written one line per entry.
    • This will be a UTF-8 format text file.
    • Data will be escaped where required (e.g. to remove line feeds).
    • We will format time (using server time zone) instead of just writing the number, at the start of the line so that it can be alphabetically sorted if required.
    • Other information will be written in a way that is easy to search for (using tools like grep, or within Moodle when reading back the files). An example might be:
2012-11-10 10:37:23 +0000 userid=1234 ip=123.456.789.012 courseid=1234 component=block_frog instanceid=1234 action=hop data={'url':'view.php?id=124','info':16}

This is a simple implementation intended for single servers, or for multiple servers if you’re prepared to deal with having separate log files for each one. Specifically, the logger will not guarantee correct operation on shared storage. This will be documented in the settings page.

The logger might provide an administrative interface. In particular, when manually investigating log issues, it might be useful to download logs from the entire server (all courses) for a period of time. In this case the server could concatenate files itself and send a combined file to the user. (From this point, it is possible to grep it for required data, just like a less efficient database. You could also import it into a real temporary database for analysis.)

Summary of proposed development tasks

  • Create new plugin type including management interface (hide/show, order)
  • Implement new log system API (probably lib/logsystemlib.php) and new function in moodlelib to get singleton instance.
  • Write ‘logger_db’ plugin based on code from existing add_to_log, and some existing system settings and cron behaviour related to mdl_log.
  • Write ‘logger_file’ plugin (new code).
  • Implement existing log API functions in terms of new API. (Optionally, move to deprecated area if it is decided to deprecate them.)
  • Find all existing uses of existing log API functions and mdl_log table in core code. Add checks for the ‘can_...’ functions in log_system and handle suitably. (The largest part of this work will be the log viewing screens, but smaller areas throughout Moodle will also be affected.) Change core code to use new best practice when logging data.

Conclusion

  • By making these changes we can give sites the option to remove a critical performance bottleneck by dramatically reducing the number of database writes from a Moodle installation.
    • This may improve performance on smaller sites, and will make very large installations more feasible.
  • Because the default is to continue with database logging, and there is no change to the mdl_log table, it is likely that the change will cause few problems for existing Moodle sites even if they have special handling in this performance-sensitive area.
  • With default options there will be no reduction in existing Moodle functionality for staff.
    • If a site uses file logging (or some other plugin) instead of database logging, then some features might become unavailable and others might become slower.
  • The scope of this change is kept to a minimum in order to make implementation feasible, but it lays the groundwork for potential future enhancements such as detailed/specific log information and log filtering.

Risk

The main risk of this development is that it may prove harder than I anticipated to re-engineer parts of the code that read the log file (e.g. if they use complex joins or access data in a pattern that would be very inefficient with my proposed file logging). Some of this code may be in areas which are important for administrative tasks or which provide useful information to teachers.