Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Data formats: Difference between revisions

From MoodleDocs
(Created page with "Dataformat's are plugins that define how a table of data can be exported for download. Moodle comes with plugins for CSV, Excel, ODS, JSON, and HTML. =Components= '''datafo...")
 
m (Text replacement - "<code php>" to "<syntaxhighlight lang="php">")
 
(15 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Dataformat's are plugins that define how a table of data can be exported for download. Moodle comes with plugins for CSV, Excel, ODS, JSON, and HTML.
{{Moodle 3.1}}


Data formats are [[Plugin types|plugins]] that define how a table of data can be exported for download. Moodle comes with plugins for CSV, Excel, ODS, JSON, HTML and PDf.


=Components=
Previously if you wanted to export some table of data you either hand rolled UI of the export yourself using one of the various csvlib or excellib classes, or some of the more modern Moodle code uses table_sql which is not widely used, especially in 3rd party plugins, and highly underrated. This meant that if you want to support csv and excel, or ODS, then adding each new format was more code. This new Dataformat API provides a simple API to handle all of the downloading UI for little effort and removes all format specific code from your plugin code.


'''dataformat/FORMATNAME/lang/en/format_FORMATNAME.php'''


Contains the English language strings used in the format. You can also define strings for other languages.
= Using the Dataformat API =


'''dataformat/FORMATNAME/classes/writer.php
Using the new API is very simple and consists of two steps.


This file is the real meat of the form, it contains class which extends \core\dataformat\base defining how the data stream should be encoded.
== Step 1: Rendering a download selector ==


'''dataformat/FORMATNAME/version.php'''
Use the core renderer to output the download selector UI:


Version definitions, see [[version.php]]. It is highly recommended always to have it and it is required if you have any files in db folder
<syntaxhighlight lang="php">echo $OUTPUT->download_dataformat_selector(get_string('userbulkdownload', 'admin'), 'download.php');</syntaxhighlight>
 
You can pass extra params to override the query param used to specify the format, and append extra query params as needed:
 
<syntaxhighlight lang="php">echo $OUTPUT->download_dataformat_selector(get_string('userbulkdownload', 'admin'), 'download.php', 'myqueryparam', array('foo' => 'bar'));</syntaxhighlight>
 
 
== Step 2: Serving the data for download ==
 
And then in the page you just linked to, you create your data record set, define some metadata for the columns, and then pass these to the Dataformat API which then streams the results to the browser in the users selected format:
 
<syntaxhighlight lang="php">
$dataformat = optional_param('dataformat', '', PARAM_ALPHA);
$columns = array(
    'idnumber' => get_string('idnumber'),
    …
);
$rs = $DB->get_recordset_sql(...);
download_as_dataformat('myfilename', $dataformat, $columns, $rs);
$rs->close();
</syntaxhighlight>
 
The Dataformat API can handle any Iterator, so it can be used for data which isn't actually from the database, and has been generated by some other means.
 
If you are dealing with a relatively straight forward table of sql data, and you also want a standard table, it is strongly recommended you use the table_sql API which itself uses the Dataformat API and you'll get all this for free.
 
There are cases where the data returned from the DB needs processing in some way, this is an option:
 
<syntaxhighlight lang="php">
download_as_dataformat('myfilename', $dataformat, $columns, $rs, function($record) {
    // Process the data in some way.
    // You can add and remove columns as needed
    // as long as the resulting data matches the $column metadata.
});
</syntaxhighlight>
 
== Column metadata considerations ==
 
The $columns metadata is an array of unique key names to human readable names. It is intended that each Dataformat has both available to use as needed. For instance a human facing document format such as Excel uses the lang string as the column headers, but the json format uses the key name instead. If the data you are exporting is primarily for import into other systems, you may decide that it is better for the columns to always be fixed and not localized, like this:
 
<syntaxhighlight lang="php">
$columns = array(
    'idnumber' => 'idnumber',
    'course' => 'course',
    ...
);
</syntaxhighlight>
 
= Creating a new dataformat =
 
Moodle already ships with 5 formats and it is unlikely that a large number of new dataformat plugins will emerge over time. But creating new ones is very easy, apart from the lang and version boiler plate it consists of a single autoloaded class.
 
A practical approach would be to clone and rename one of the existing text based formats such as /dataformat/json or /dataformat/html.
 
== Components ==
 
; dataformat/FORMATNAME/classes/writer.php : This file is the real meat of the form, it contains class which extends <tt>\core\dataformat\base</tt> defining how the data stream should be encoded.
 
; dataformat/FORMATNAME/lang/en/dataformat_FORMATNAME.php : Contains the English language strings used in the format.
 
; dataformat/FORMATNAME/version.php : Version definitions, see [[version.php]]. It is highly recommended always to have it and it is required if you have any files in db folder
 
Generally no other files are needed, but [[Plugin files|all the other files available to other plugins]] could also be used here.
 
== Performance ==
 
The design intent of the data formats is to have low and ideally fixed memory overhead, and to stream data to the browser record by record. This enables very large datasets to be exported easily and safely without running into a whole range of issues including php memory thresholds, php execution timeouts, and other timeouts in every layer of the stack at the apache / nginx layer, a caching layer such as varnish, or load balancers, any intermediate proxies, and finally any browser time-outs.
 
So please avoid building up a complete data structure in memory, or ideally not even on disk either. Instead just encode the data record by record as the data stream is processed and echo it to the browser. The HTML, JSON, and CSV data formats are good examples of this. The Excel and ODS formats are far more performant than in earlier versions of moodle, but still suffer from network inactivity timeouts at very large scale because they buffer the file to disc before finally sending when complete.

Latest revision as of 13:30, 14 July 2021

Moodle 3.1


Data formats are plugins that define how a table of data can be exported for download. Moodle comes with plugins for CSV, Excel, ODS, JSON, HTML and PDf.

Previously if you wanted to export some table of data you either hand rolled UI of the export yourself using one of the various csvlib or excellib classes, or some of the more modern Moodle code uses table_sql which is not widely used, especially in 3rd party plugins, and highly underrated. This meant that if you want to support csv and excel, or ODS, then adding each new format was more code. This new Dataformat API provides a simple API to handle all of the downloading UI for little effort and removes all format specific code from your plugin code.


Using the Dataformat API

Using the new API is very simple and consists of two steps.

Step 1: Rendering a download selector

Use the core renderer to output the download selector UI:

echo $OUTPUT->download_dataformat_selector(get_string('userbulkdownload', 'admin'), 'download.php');

You can pass extra params to override the query param used to specify the format, and append extra query params as needed:

echo $OUTPUT->download_dataformat_selector(get_string('userbulkdownload', 'admin'), 'download.php', 'myqueryparam', array('foo' => 'bar'));


Step 2: Serving the data for download

And then in the page you just linked to, you create your data record set, define some metadata for the columns, and then pass these to the Dataformat API which then streams the results to the browser in the users selected format:

$dataformat = optional_param('dataformat', '', PARAM_ALPHA);
$columns = array(
    'idnumber' => get_string('idnumber'),
    
);
$rs = $DB->get_recordset_sql(...);
download_as_dataformat('myfilename', $dataformat, $columns, $rs);
$rs->close();

The Dataformat API can handle any Iterator, so it can be used for data which isn't actually from the database, and has been generated by some other means.

If you are dealing with a relatively straight forward table of sql data, and you also want a standard table, it is strongly recommended you use the table_sql API which itself uses the Dataformat API and you'll get all this for free.

There are cases where the data returned from the DB needs processing in some way, this is an option:

download_as_dataformat('myfilename', $dataformat, $columns, $rs, function($record) {
    // Process the data in some way.
    // You can add and remove columns as needed
    // as long as the resulting data matches the $column metadata.
});

Column metadata considerations

The $columns metadata is an array of unique key names to human readable names. It is intended that each Dataformat has both available to use as needed. For instance a human facing document format such as Excel uses the lang string as the column headers, but the json format uses the key name instead. If the data you are exporting is primarily for import into other systems, you may decide that it is better for the columns to always be fixed and not localized, like this:

$columns = array(
    'idnumber' => 'idnumber',
    'course' => 'course',
    ...
);

Creating a new dataformat

Moodle already ships with 5 formats and it is unlikely that a large number of new dataformat plugins will emerge over time. But creating new ones is very easy, apart from the lang and version boiler plate it consists of a single autoloaded class.

A practical approach would be to clone and rename one of the existing text based formats such as /dataformat/json or /dataformat/html.

Components

dataformat/FORMATNAME/classes/writer.php
This file is the real meat of the form, it contains class which extends \core\dataformat\base defining how the data stream should be encoded.
dataformat/FORMATNAME/lang/en/dataformat_FORMATNAME.php
Contains the English language strings used in the format.
dataformat/FORMATNAME/version.php
Version definitions, see version.php. It is highly recommended always to have it and it is required if you have any files in db folder

Generally no other files are needed, but all the other files available to other plugins could also be used here.

Performance

The design intent of the data formats is to have low and ideally fixed memory overhead, and to stream data to the browser record by record. This enables very large datasets to be exported easily and safely without running into a whole range of issues including php memory thresholds, php execution timeouts, and other timeouts in every layer of the stack at the apache / nginx layer, a caching layer such as varnish, or load balancers, any intermediate proxies, and finally any browser time-outs.

So please avoid building up a complete data structure in memory, or ideally not even on disk either. Instead just encode the data record by record as the data stream is processed and echo it to the browser. The HTML, JSON, and CSV data formats are good examples of this. The Excel and ODS formats are far more performant than in earlier versions of moodle, but still suffer from network inactivity timeouts at very large scale because they buffer the file to disc before finally sending when complete.