Note: You are currently viewing documentation for Moodle 3.7. Up-to-date documentation for the latest stable version of Moodle may be available here: File API.

Development:File API: Difference between revisions

From MoodleDocs
No edit summary
(135 intermediate revisions by 9 users not shown)
Line 1: Line 1:
This page outlines the current thinking about implementing file storage and access in Moodle 2.0.  It's a SPECIFICATION UNDER CONSTRUCTION!
{{Work in progress}}
{{Infobox Project
|name = File API
|state = Implemented
|tracker = MDL-14589
|discussion = n/a
|assignee = [[User:Petr Škoda (škoďák)|Petr Škoda (škoďák)]]
{{Moodle 2.0}}

A lot of this has been brought over from the discussion about the [[Development:Repository API]] which is intimately connected.
The goals of the new File API are:
* allow files to be stored within Moodle, as part of the content (as we do now).
* use a consistent and flexible approach for all file handling throughout Moodle.
* give modules control over which users can access a file, using capabilities and other local rules.
* make it easy to determine which parts of Moodle use which files, to simplify operations like backup and restore.
* track where files originally came from.
* avoid redundant storage, when the same file is used twice.
* fully support Unicode file names, irrespective of the capabilities of the underlying file system.
The File API is a set of core interfaces to allow the rest of Moodle to store, serve and manage files. It applies only to files that are part of the Moodle site's content. It is not used for internal files, such as those in the following subdirectories of dataroot: temp, lang, cache, environment, filter, search, sessions, upgradelogs, ...
The API can be subdivided into the following parts:
; File storage
: Low level file storage without access control information. Stores the content of files on disc, with metadata in associated database tables.
; File serving
: Lets users accessing a Moodle site get the files (file.php, draftfile.php, pluginfile.php, userfile.php, etc.)
:* Serve the files on request
:* with appropriate security checks
; File related user interfaces
: Provides the interface for (lib/form/file.php, filemanager.php, filepicker.php and files/index.php, draftfiles.php)
:* Form elements allowing users to select a file using the Repository API, and have it stored within Moodle.
:* UI for users to manage their files, replacing the old course files UI
; File browsing API
: Allows code to browse and optionally manipulate the file areas
:* find information about available files in each area.
:* print links to files.
:* optionally move/rename/copy/delete/etc.
== File API internals ==
=== File storage on disk ===
Files are stored in $CFG->dataroot (also known as moodledata) in the filedir subfolder.
Files are stored according to the SHA1 hash of their content. This means each file with particular contents is stored once, irrespective of how many times it is included in different places, even if it is referred to by different names. (This idea comes from the git version control system.) To relate a file on disc to a user-comprehensible path or filename, you need to use the ''files'' database table. See the next section.
Suppose a file has SHA1 hash 081371cb102fa559e81993fddc230c79205232ce. Then it will be stored in on disc as moodledata/filedir/08/13/71/081371cb102fa559e81993fddc230c79205232ce.
This means Moodle can not store two files with the same SHA1 hash, luckily it is extremely unlikely that this would ever happen. Technically it is also possible to implement reliable collision tests (with some performance cost), for now we just test file lengths in addition to SHA1 hash.
=== Files table ===
This table contains one entry for each usage of a file. Enough information is kept here so that the file can be fully identified and retrieved again if necessary. It is necessary because some databases have hard limit on index size.
If, for example, the same image is used in a user's profile, and a forum post, then there will be two rows in this table, one for each use of the file, and Moodle will treat the two as separate files, even though the file is only stored once on disc.
{| class="nicetable"
! Field
! Type
! Default
! Info
| '''id'''
| int(10) 
| auto-incrementing
| The unique ID for this file.
| '''contenthash'''
| varchar(40)
| The sha1 hash of content.
| '''pathnamehash'''
| varchar(40)
| The sha1 hash of "/contextid/component/filearea/itemid/filepath/filename.ext" - prevents file duplicates and allows fast lookup.  It is necessary because some databases have hard limit on index size.
| '''contextid'''
| int(10)
| The context id defined in context table - identifies the instance of plugin owning the file.
| '''component'''
| varchar(50)
| Like "mod_forum", "course", "mod_assignment", "backup"
| '''filearea'''
| varchar(50)
| Like "submissions", "intro" and "content" (images and swf linked from summaries), etc.; "blogs" and "userfiles" are special case that live at the system context.
| '''itemid'''
| int(10)
| Some plugin specific item id (eg. forum post, blog entry or assignment submission or user id for user files)
| filepath
| text
| relative path to file from module content root, useful in Scorm and Resource mod - most of the mods do not need this
| filename
| varchar(255)
| The full Unicode name of this file (case sensitive)
| '''userid'''
| int(10) 
| Optional - general user id field - meaning depending on plugin
| filesize
| int(10)
| size of file - bytes
| mimetype
| varchar(100)
| type of file
| status
| int(10)
| general file status flag - will be used for lost or infected files
| source
| text
| file source - usually url
| author
| varchar(255)
| original author of file, used when importing from other systems
| license
| varchar(255)
| license type, empty means site default
| timecreated
| int(10)
| The time this file was created
| timemodified
| int(10)
| The last time the file was last modified
* non-unique index on (contextid, component, filearea, itemid)
* non-unique index on (contenthash)
* unique index on (pathnamehash).
The plugin type does not need to be specified because it can be derived from the context. Items like blog that do not have their own context will use their own file area inside a suitable context. In this case, the user context.
Entries with filename = '.' represent directories. Directory entries like this are created automatically when a file is added within them.
Note: 'files' plural is used even thought that goes against the [[Development:Database|coding guidelines]] because 'file' is a reserved word in some SQL dialects.
===Implementation of basic operations===
'''Each plugin may directly access only files in own context and areas!'''
Low level access API is defined in ''file_storage'' class which is obtained from <code>get_file_storage()</code>.
====Storing a file====
# Calculate the SHA1 hash of the file contents.
# Check if a file with this SHA1 hash already exists on disc in file directory or file trash. If not, store the file there.
# Add the record for this file to the files table using the low level address
====Reading a file====
# Fetch the record (which includes the SHA1 hash) for the file you want from the files table. You can fetch either all area files or quickly get one file with a specific contenthash.
# Retrieve the contents using the SHA1 hash from the file directory.
====Deleting a file====
# Delete the record from the files table.
# Verify if some other file is still needing the content, if not move the content file into file trash
# Later, admin/cron.php deletes content files from trash directory
== File serving ==
Deals with serving of files - browser requests file, Moodle sends it back. We have three main files. It is important to setup slasharguments on server properly (file.php/some/thing/xxx.jpg), any content that relies on relative links can not work without it (scorm, uploaded html pages, etc.).
=== legacy file.php ===
Serves legacy course files, the file name and parameter structure is critical for backwards compatibility of existing course content.

The page is open for everyone so everyone can help correct mistakes and help with the evolution of this document. However, if you have questions, problems to report or major changes to suggest please add them to the [[Development_talk:File_API|page comments]], or start a discussion in the [ Repositories forum].  We'll endeavour to merge all such suggestions into the main spec before we start development.

Internally the files are stored in <code>array('contextid'=>$coursecontextid, 'component;=>'course', 'filearea'=>'legacy', 'itemid'=>0)</code>

The legacy course files are completely disabled in all new courses created in 2.0. The major problem here is to how to educate our users that they can not make huge piles of files in each course any more.
=== pluginfile.php ===
All plugins should use this script to serve all files.
* plugins decide about access control
* optional XSS protection - student submitted files must not be served with normal headers, we have to force download instead; ideally there should be second wwwroot for serving of untrusted files
* links to these files are constructed on the fly from the relative links stored in database, this means that plugin may link only own files
Absolute file links need to be rewritten if html editing allowed in plugin. The links are stored internally as relative links. Before editing or display the internal link representation is converted to absolute links using simple str_replace() @@thipluginlink/summary@@/image.jpg --> /pluginfile.php/assignmentcontextid/intro/image.jpg, it is converted back to internal links before saving.
Script parameters are virtual file names, in most cases the parameters match the low level file storage, but they do not have to:
pluginfile.php detects the type of plugin from context table, fetches basic info (like $course or $cm if appropriate) and calls plugin function (or later method) which does the access control and finally sends the file to user. ''areaname'' separates files by type and divides the context into several subtrees - for example ''summary'' files (images used in module intros), post attachments, etc.
==== Assignment example ====
The last line example of virtual file that should created on the fly, it is not implemented yet.
====scorm example====
The revision counter is incremented when any file changes in order to prevent caching problems.
====quiz example====
====questions example====
This section was out of date. See [[Development:File_storage_conversion_Quiz_and_Questions]] for the latest thinking.
====blog example====
Blog entries or notes in general do not have context id (because they live in system context, SYSCONTEXTID below is the id of system context).
The note attachments are always served with XSS protection on, ideally we should use separate wwwroot for this. Access control can be hardcoded.

# Allow files to be added directly into Moodle (as we do now)
Internally stored in <code>array('contextid'=>SYSCONTEXTID, 'component'=>'blog', 'filearea'=>'attachment', 'itemid'=>$blogentryid)</code>
# Remember where files came from
# Give modules control over the access to files
# Allow content to be used in multiple Moodle contexts securely and simply via capabilities
# Consistent and simple approach for ALL file handling throughout Moodle


Internally stored in <code>array('contextid'=>SYSCONTEXTID, 'component'=>'blog', 'filearea'=>'post', 'itemid'=>$blogentryid)</code>

The File API is a core set of interfaces that all Moodle code will use to:
# copy files into Moodle
# store files within Moodle
# display files to Moodle users

==Use cases==

Coming soon
=== Temporary files ===
Temporary files are usually used during the lifetime of one script only.
* exports
* imports
* processing by executable files (latex, mimetex)

These files should never use utf-8 file names.

==General Architecture==
=== Legacy file storage and serving ===
Going to use good-old separate directories in $CFG->dataroot.

All file-handling areas in Moodle (eg adding a new resource, adding attachments to a forum post, uploading assignments) will be rewritten to talk to the standard API class methods in a standard way.
file serving and storage:
# user avatars - user/pix.php
# group avatars - user/pixgroup.php
# tex, algebra - filter/tex/* and filter/algebra/*
# rss cache (?full rss rewrite soon?) - backwards compatibility only rss/file.php

When adding a file, the interface will allow a choice from the (active) [[Development:Repository_API]] repository plugins, each of which is a subclass of the core Repository class.  The default plugin is "local" which shows local files already in Moodle (and allows files to be added from desktop there, much like current filemanager which it replaces).
only storage:

As is usual in Moodle, there will be admin settings to disable/enable certain repository plugins as standard, as well as user settings so that users can add their own personal repositories to the standard list (eg [ Yahoo Briefcase] or [ Google Docs]) and to select their default repository.
== File browsing API ==

Once a file has been selected the file will almost always be copied into Moodle there and then. However there will also be options to:
This is what other parts of Moodle use to access files that they do not own.
* only return the URL to the file if it's desired to keep it external (but this does present security and integrity risks), or
* refresh the local file copy regularly and automatically
* refresh the file manually from the File manager interface

All files in Moodle will be listed in a table (see below) allowing us to store various metadata about each file.  The file contents will not be in the database (though we could easily offer that option if we want to), they will be on disk with a name related to the id rather than the "human" name (this avoids a lot of OS Unicode problems).

The module that is responsible for initiating the file will be remembered as the "owner" of that file, and will be responsible for access to that file afterwards, either by publishing the permissions required to access the file or by providing a callback function that can be used.  For example, the assignment plugin may, after allowing a student to select a file to be submitted, add permissions so that people who have grade permissions in that assignment can read it.  Or it may choose to provide a function that can do a more detailed check based on dates and so on.
=== Class: file_browser ===

All files will be served via a single control script in Moodle, located at $CFG->fileroot.  This could be the same as $CFG->wwwroot by default, but will be recommended (for security and avoiding XSS) that Moodle admins set up a second DNS name pointing to this script eg the main site could be at but files would be served via  (We'll have to set session cookies on both domains and keep them in sync somehow).
=== Class: file_info and subclasses ===

The file.php will serve files using slasharguments almost as now.  We just need to replace the courseid with a fileid:  file.php/fileid/dir/dir/file.jpg  (where dir/dir/file.jpg is the virtual path in Moodle).
== File related user interfaces ==

==Local Files==
All files are obtained through from the file repositories.

In general, all external files will be copied locally and stored in Moodle.  This section describes the storage of the files and how we define ACLs (access control lists) for them.  All existing files in the Moodle dataroot course areas will be moved into this new system during the upgrade.
=== Formslib fields ===
* file picker
* file manager
* file upload (obsolete, do not use)

The files will not be stored as they have been in the past.  The new file system is "flat" with each file stored as an id.  The name and path are stored in tables.  To avoid running out of nodes we'll use a hash-like structure like the users directory does, ie:
=== Integration with the HTML editor ===

Each instance of the HTML editor can be told to store related files in a particular file area.

==File serving==
During editing, files are stored in a draft files area. Then when the form is submitted they are moved into the real file area.

There will be at least two different scripts serving files.
Files are selected using the repository file picker.

=== Legacy file manager ===

This script is for general course files and backward compatibility for old modules.
Available only for legacy reasons. It is not supposed to be used.

# File gets a URL like:  file.php/courseid/dir/dir/file.jpg
All the contexts, file areas and files now form a single huge tree structure, although each user only has access to certain parts of that tree. The file manager (files/index.php) allow users to browse this tree, and manage files within it, according to the level of permissions they have.
# File uses path and courseid to get the record from the '''file''' table for all information about this file.
# File uses fileid to get the current ACL from the '''file_access''' table.
# Each line of the ACL is checked, if any of them are true then access is given.
## Use has_capability with context and capability to check permissions
# If access is allowed then serve the file.

Single pane file manager is hard to implement without drag & drop which is notoriously problematic in web based applications. I propose to implement a two pane commander-style file manager. Two pane manager allows you to easily copy/move files between two different contexts (ex: courses).

This script is a new one giving modules "ownership" over files and complete control (if required) over their access.  Modules will generally provide a callback function to determine access to a file.
File manager must not interact directly with filesystem API, instead each module should return traversable tree of files and directories with both real and localised names (localised names are needed for dirs like backupdata).

# File gets a URL like:  modfile.php/contextid/dir/dir/file.jpg
== Backwards compatibility ==
# File uses path and contextid to get the record from the '''file''' table for all information about this file.
# File uses fileid to get the current ACL from the '''file_access''' table.
# Each line of the ACL is checked, if any of them are true then access is given.
## If the capability is "function/accessfunction" then file.php looks for a function called '''accessfunction''' in the module's lib.php to return true/false.
## Otherwise use has_capability with context and capability to check permissions
# If access is allowed then serve the file.

=== Content backwards compatibility ===

=== file ===
This should be preserved as much as possible. This will involve rewriting links in content during the upgrade to 2.0.

This table contains one entry for every file.  Enough information is kept here so that the file can be fully identified and retrieved again if necessary.
Some new features (like resource sharing - if implemented) may not work with existing data that still uses files from course files area.

{| border="1" cellpadding="2" cellspacing="0"
There might be a breakage of links due to special characters stripping in uploaded files which will not match the links in uploaded html files any more. This should not be very common I hope.

===Code backwards compatibility===

Other Moodle code (for example plugins) will have to be converted to the new APIs. See [[Development:Using_the_file_API]] for guidance.
|The full Unicode name of this file

It is not possible to provide backwards-compatibility here. For example, the old $CFG->dataroot/$courseid/ will no longer exist, and there is no way to emulate that, so we won't try.
|The user id of the person who created this entry

|The module that is the "owner" of this file (eg "moodle" or "mod/assignment" or "blocks/html")

== Upgrade and migration ==
|<font color="red">varchar(255)</font>
|The instance of the module that is the "owner" of this file (eg assignment id 5)

When a site is upgraded to Moodle 2.0, all the files in moodledata will have to be migrated. This is going to be a pain, like DML/DDL was :-(
|If true then file.php will use a class in modulename/modfile.php to determine access

The upgrade process should be interruptible (like the Unicode upgrade was) so it can be stopped/restarted any time.
|The id of the file which this is a copy of.  If this is set, then all changes to the parent will be copied to this entry too. <font color="red">comment: not recursive. only one level of "alias" ?</font>

=== Migration of content ===
|The repository instance this is associated with, see [[Development:Repository_API]]

* resources - move files to new resource content file area; can be done automatically for pdf, image resources; definitely not accurate for uploaded web pages
* questions - image file moved to new area, image tag appended to questions
* moddata files - the easiest part, just move to new storage
* coursefiles - there might be many outdated files :-( :-(
|Specifies the update schedule (0 = none, 1 = on demand, other = some period in seconds)
* rss feeds links in readers - will be broken, the new security related code would break it anyway

=== Moving files to files table and file pool ===
|Specifies how long this file can be cached by browsers

The migration process must be interruptable because it might take a very long time. The files would be moved from old location, the restarting would be straightforward.
|The virtual path to the file locally (so we can still have apparent subdirectories etc)

Proposed stages:
#migration of all course files except moddata - finish marked by some $CFG->files_migrated=true; - this step breaks the old file manager and html editor integration
#migration of blog attachments
#migration of question files
|The full path to the original file on the repository
#migration of moddata files - each module is responsible to copy data from converted coursefiles or directly from moddata which is not converted automatically

Some people use symbolic links in coursefiles - we must make sure that those will be copied to new storage in both places, though they can not be linked any more - anybody wanting to have content synced will need to move the files to some repository and set up the sync again.
|The first time this file was imported into Moodle

::Talked about a double task here, when migrating course files to module areas:
::# Parse html files to detect all the dependencies and move them together.
::# Fallback in pluginfile.php so, if something isn't found in module filearea, search for it in course filearea, copying it and finally, serving it.
|The most recent time that this file was imported into Moodle

:: Also we talked about the possibility of add a new setting to resource in order to define if it should work against old coursefiles or new autocontained file areas. Migrated resources will point to old coursefiles while new ones will enforce autocontained file areas.
|The time this file was created (if known), otherwise same as time imported

:: it seems that only resource files will be really complex (because allow arbitrary HTML inclusion). The rest (labels, intros... doesn't) and should be easier to parse.
|The last time the file was modified

::[[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 19:00, 29 June 2008 (CDT)
|The last time this file was accessed for any reason

=== file_access ===

This table describes the ACL for each file, so that checks can easily be made on whether someone can see this file or not.  Note there can be multiple entries per file.  Users can ALWAYS see their own files, so there are no entries here for that.

{| border="1" cellpadding="2" cellspacing="0"

== Other issues ==

=== Unicode support in zip format ===
|The file we are defining access for

Zip format is an old standard for compressing files. It was created long before Unicode existed, and Unicode support was only recently added. There are several ways used for encoding of non-ASCII characters in path names, but unfortunately it is not very standardised. Most Windows packers use DOS encoding.
|The context where this file is being published

Client software:
* Windows built-in compression - bundled with Windows, non-standard DOS encoding only
|<font color="red">text </font>
* WinZip - shareware, Unicode option (since v11.2)
* TotalCommander - shareware, single byte(DOS) encoding only
|The capability that is required to see this file.
* 7-Zip - free, Unicode or DOS encoding depending on characters used in file name (since v4.58beta)
* Info-ZIP - free, uses some weird character set conversions

==Class methods==
PHP extraction:
* Info-ZIP binary execution - no Unicode support at all, mangles character sets in file names (depends on OS, see docs), files must be copied to temp directory before compression and after extraction
* PclZip PHP library - reads single byte encoded names only, problems with random problems and higher memory usage.
* Zip PHP extension - kind of works in latest PHP versions

===File class===
Large file support:
PHP running under 32bit operating systems does not support files >2GB (do not expect fix before PHP 6). This might be a potential problem for larger backups.

This class implements the display and management of files from local storage, with full access checking.  Some of the functions are for single files, while some are optimised for bulk display and searching (eg in the personal files interface).
Tar Alternative:
* tar with gzip compression - easy to implement in PHP + zlib extension (PclTar, Tar from PEAR or custom code)
* no problem with unicode in *nix, Windows again expects DOS encoding :-(
* seems suitable for backup/restore - yay!

# added zip processing class that fully hides the underlying library
# using single byte encoding "garbage in/garbage out" approach for encoding of files in zip archives; add new 'zipencoding' string into lang packs (ex: cp852 DOS charset for Czech locale) and use it during extraction (we might support true unicode later when PHP Zip extension does that)

==Areas in Moodle that need re-writing==
== Not implemented yet ==
* antivirus scanning - this needs a different api because the upload of files is now handled via repository plugins

==See also==
== See also ==

* [[Development:Using the file API]]
* [[Development:Repository API]]
* [[Development:Repository API]]
* [[Development:Portfolio API]]
* [[Development:Resource module file API migration]]
* MDL-14589 - File API Meta issue

Latest revision as of 12:09, 2 December 2010

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on or use the page comments.

Template:Infobox Project Template:Moodle 2.0


The goals of the new File API are:

  • allow files to be stored within Moodle, as part of the content (as we do now).
  • use a consistent and flexible approach for all file handling throughout Moodle.
  • give modules control over which users can access a file, using capabilities and other local rules.
  • make it easy to determine which parts of Moodle use which files, to simplify operations like backup and restore.
  • track where files originally came from.
  • avoid redundant storage, when the same file is used twice.
  • fully support Unicode file names, irrespective of the capabilities of the underlying file system.


The File API is a set of core interfaces to allow the rest of Moodle to store, serve and manage files. It applies only to files that are part of the Moodle site's content. It is not used for internal files, such as those in the following subdirectories of dataroot: temp, lang, cache, environment, filter, search, sessions, upgradelogs, ...

The API can be subdivided into the following parts:

File storage
Low level file storage without access control information. Stores the content of files on disc, with metadata in associated database tables.
File serving
Lets users accessing a Moodle site get the files (file.php, draftfile.php, pluginfile.php, userfile.php, etc.)
  • Serve the files on request
  • with appropriate security checks
File related user interfaces
Provides the interface for (lib/form/file.php, filemanager.php, filepicker.php and files/index.php, draftfiles.php)
  • Form elements allowing users to select a file using the Repository API, and have it stored within Moodle.
  • UI for users to manage their files, replacing the old course files UI
File browsing API
Allows code to browse and optionally manipulate the file areas
  • find information about available files in each area.
  • print links to files.
  • optionally move/rename/copy/delete/etc.

File API internals

File storage on disk

Files are stored in $CFG->dataroot (also known as moodledata) in the filedir subfolder.

Files are stored according to the SHA1 hash of their content. This means each file with particular contents is stored once, irrespective of how many times it is included in different places, even if it is referred to by different names. (This idea comes from the git version control system.) To relate a file on disc to a user-comprehensible path or filename, you need to use the files database table. See the next section.

Suppose a file has SHA1 hash 081371cb102fa559e81993fddc230c79205232ce. Then it will be stored in on disc as moodledata/filedir/08/13/71/081371cb102fa559e81993fddc230c79205232ce.

This means Moodle can not store two files with the same SHA1 hash, luckily it is extremely unlikely that this would ever happen. Technically it is also possible to implement reliable collision tests (with some performance cost), for now we just test file lengths in addition to SHA1 hash.

Files table

This table contains one entry for each usage of a file. Enough information is kept here so that the file can be fully identified and retrieved again if necessary. It is necessary because some databases have hard limit on index size.

If, for example, the same image is used in a user's profile, and a forum post, then there will be two rows in this table, one for each use of the file, and Moodle will treat the two as separate files, even though the file is only stored once on disc.

Field Type Default Info
id int(10) auto-incrementing The unique ID for this file.
contenthash varchar(40) The sha1 hash of content.
pathnamehash varchar(40) The sha1 hash of "/contextid/component/filearea/itemid/filepath/filename.ext" - prevents file duplicates and allows fast lookup. It is necessary because some databases have hard limit on index size.
contextid int(10) The context id defined in context table - identifies the instance of plugin owning the file.
component varchar(50) Like "mod_forum", "course", "mod_assignment", "backup"
filearea varchar(50) Like "submissions", "intro" and "content" (images and swf linked from summaries), etc.; "blogs" and "userfiles" are special case that live at the system context.
itemid int(10) Some plugin specific item id (eg. forum post, blog entry or assignment submission or user id for user files)
filepath text relative path to file from module content root, useful in Scorm and Resource mod - most of the mods do not need this
filename varchar(255) The full Unicode name of this file (case sensitive)
userid int(10) NULL Optional - general user id field - meaning depending on plugin
filesize int(10) size of file - bytes
mimetype varchar(100) NULL type of file
status int(10) general file status flag - will be used for lost or infected files
source text file source - usually url
author varchar(255) original author of file, used when importing from other systems
license varchar(255) license type, empty means site default
timecreated int(10) The time this file was created
timemodified int(10) The last time the file was last modified


  • non-unique index on (contextid, component, filearea, itemid)
  • non-unique index on (contenthash)
  • unique index on (pathnamehash).

The plugin type does not need to be specified because it can be derived from the context. Items like blog that do not have their own context will use their own file area inside a suitable context. In this case, the user context.

Entries with filename = '.' represent directories. Directory entries like this are created automatically when a file is added within them.

Note: 'files' plural is used even thought that goes against the coding guidelines because 'file' is a reserved word in some SQL dialects.

Implementation of basic operations

Each plugin may directly access only files in own context and areas!

Low level access API is defined in file_storage class which is obtained from get_file_storage().

Storing a file

  1. Calculate the SHA1 hash of the file contents.
  2. Check if a file with this SHA1 hash already exists on disc in file directory or file trash. If not, store the file there.
  3. Add the record for this file to the files table using the low level address

Reading a file

  1. Fetch the record (which includes the SHA1 hash) for the file you want from the files table. You can fetch either all area files or quickly get one file with a specific contenthash.
  2. Retrieve the contents using the SHA1 hash from the file directory.

Deleting a file

  1. Delete the record from the files table.
  2. Verify if some other file is still needing the content, if not move the content file into file trash
  3. Later, admin/cron.php deletes content files from trash directory

File serving

Deals with serving of files - browser requests file, Moodle sends it back. We have three main files. It is important to setup slasharguments on server properly (file.php/some/thing/xxx.jpg), any content that relies on relative links can not work without it (scorm, uploaded html pages, etc.).

legacy file.php

Serves legacy course files, the file name and parameter structure is critical for backwards compatibility of existing course content.


Internally the files are stored in array('contextid'=>$coursecontextid, 'component;=>'course', 'filearea'=>'legacy', 'itemid'=>0)

The legacy course files are completely disabled in all new courses created in 2.0. The major problem here is to how to educate our users that they can not make huge piles of files in each course any more.


All plugins should use this script to serve all files.

  • plugins decide about access control
  • optional XSS protection - student submitted files must not be served with normal headers, we have to force download instead; ideally there should be second wwwroot for serving of untrusted files
  • links to these files are constructed on the fly from the relative links stored in database, this means that plugin may link only own files

Absolute file links need to be rewritten if html editing allowed in plugin. The links are stored internally as relative links. Before editing or display the internal link representation is converted to absolute links using simple str_replace() @@thipluginlink/summary@@/image.jpg --> /pluginfile.php/assignmentcontextid/intro/image.jpg, it is converted back to internal links before saving.

Script parameters are virtual file names, in most cases the parameters match the low level file storage, but they do not have to:


pluginfile.php detects the type of plugin from context table, fetches basic info (like $course or $cm if appropriate) and calls plugin function (or later method) which does the access control and finally sends the file to user. areaname separates files by type and divides the context into several subtrees - for example summary files (images used in module intros), post attachments, etc.

Assignment example


The last line example of virtual file that should created on the fly, it is not implemented yet.

scorm example


The revision counter is incremented when any file changes in order to prevent caching problems.

quiz example


questions example

This section was out of date. See Development:File_storage_conversion_Quiz_and_Questions for the latest thinking.

blog example

Blog entries or notes in general do not have context id (because they live in system context, SYSCONTEXTID below is the id of system context). The note attachments are always served with XSS protection on, ideally we should use separate wwwroot for this. Access control can be hardcoded.


Internally stored in array('contextid'=>SYSCONTEXTID, 'component'=>'blog', 'filearea'=>'attachment', 'itemid'=>$blogentryid)


Internally stored in array('contextid'=>SYSCONTEXTID, 'component'=>'blog', 'filearea'=>'post', 'itemid'=>$blogentryid)

Temporary files

Temporary files are usually used during the lifetime of one script only. uses:

  • exports
  • imports
  • processing by executable files (latex, mimetex)

These files should never use utf-8 file names.

Legacy file storage and serving

Going to use good-old separate directories in $CFG->dataroot.

file serving and storage:

  1. user avatars - user/pix.php
  2. group avatars - user/pixgroup.php
  3. tex, algebra - filter/tex/* and filter/algebra/*
  4. rss cache (?full rss rewrite soon?) - backwards compatibility only rss/file.php

only storage:

  1. sessions

File browsing API

This is what other parts of Moodle use to access files that they do not own.

Class: file_browser

Class: file_info and subclasses

File related user interfaces

All files are obtained through from the file repositories.

Formslib fields

  • file picker
  • file manager
  • file upload (obsolete, do not use)

Integration with the HTML editor

Each instance of the HTML editor can be told to store related files in a particular file area.

During editing, files are stored in a draft files area. Then when the form is submitted they are moved into the real file area.

Files are selected using the repository file picker.

Legacy file manager

Available only for legacy reasons. It is not supposed to be used.

All the contexts, file areas and files now form a single huge tree structure, although each user only has access to certain parts of that tree. The file manager (files/index.php) allow users to browse this tree, and manage files within it, according to the level of permissions they have.

Single pane file manager is hard to implement without drag & drop which is notoriously problematic in web based applications. I propose to implement a two pane commander-style file manager. Two pane manager allows you to easily copy/move files between two different contexts (ex: courses).

File manager must not interact directly with filesystem API, instead each module should return traversable tree of files and directories with both real and localised names (localised names are needed for dirs like backupdata).

Backwards compatibility

Content backwards compatibility

This should be preserved as much as possible. This will involve rewriting links in content during the upgrade to 2.0.

Some new features (like resource sharing - if implemented) may not work with existing data that still uses files from course files area.

There might be a breakage of links due to special characters stripping in uploaded files which will not match the links in uploaded html files any more. This should not be very common I hope.

Code backwards compatibility

Other Moodle code (for example plugins) will have to be converted to the new APIs. See Development:Using_the_file_API for guidance.

It is not possible to provide backwards-compatibility here. For example, the old $CFG->dataroot/$courseid/ will no longer exist, and there is no way to emulate that, so we won't try.

Upgrade and migration

When a site is upgraded to Moodle 2.0, all the files in moodledata will have to be migrated. This is going to be a pain, like DML/DDL was :-(

The upgrade process should be interruptible (like the Unicode upgrade was) so it can be stopped/restarted any time.

Migration of content

  • resources - move files to new resource content file area; can be done automatically for pdf, image resources; definitely not accurate for uploaded web pages
  • questions - image file moved to new area, image tag appended to questions
  • moddata files - the easiest part, just move to new storage
  • coursefiles - there might be many outdated files :-( :-(
  • rss feeds links in readers - will be broken, the new security related code would break it anyway

Moving files to files table and file pool

The migration process must be interruptable because it might take a very long time. The files would be moved from old location, the restarting would be straightforward.

Proposed stages:

  1. migration of all course files except moddata - finish marked by some $CFG->files_migrated=true; - this step breaks the old file manager and html editor integration
  2. migration of blog attachments
  3. migration of question files
  4. migration of moddata files - each module is responsible to copy data from converted coursefiles or directly from moddata which is not converted automatically

Some people use symbolic links in coursefiles - we must make sure that those will be copied to new storage in both places, though they can not be linked any more - anybody wanting to have content synced will need to move the files to some repository and set up the sync again.

Talked about a double task here, when migrating course files to module areas:
  1. Parse html files to detect all the dependencies and move them together.
  2. Fallback in pluginfile.php so, if something isn't found in module filearea, search for it in course filearea, copying it and finally, serving it.
Also we talked about the possibility of add a new setting to resource in order to define if it should work against old coursefiles or new autocontained file areas. Migrated resources will point to old coursefiles while new ones will enforce autocontained file areas.
it seems that only resource files will be really complex (because allow arbitrary HTML inclusion). The rest (labels, intros... doesn't) and should be easier to parse.
Eloy Lafuente (stronk7) 19:00, 29 June 2008 (CDT)

Other issues

Unicode support in zip format

Zip format is an old standard for compressing files. It was created long before Unicode existed, and Unicode support was only recently added. There are several ways used for encoding of non-ASCII characters in path names, but unfortunately it is not very standardised. Most Windows packers use DOS encoding.

Client software:

  • Windows built-in compression - bundled with Windows, non-standard DOS encoding only
  • WinZip - shareware, Unicode option (since v11.2)
  • TotalCommander - shareware, single byte(DOS) encoding only
  • 7-Zip - free, Unicode or DOS encoding depending on characters used in file name (since v4.58beta)
  • Info-ZIP - free, uses some weird character set conversions

PHP extraction:

  • Info-ZIP binary execution - no Unicode support at all, mangles character sets in file names (depends on OS, see docs), files must be copied to temp directory before compression and after extraction
  • PclZip PHP library - reads single byte encoded names only, problems with random problems and higher memory usage.
  • Zip PHP extension - kind of works in latest PHP versions

Large file support: PHP running under 32bit operating systems does not support files >2GB (do not expect fix before PHP 6). This might be a potential problem for larger backups.

Tar Alternative:

  • tar with gzip compression - easy to implement in PHP + zlib extension (PclTar, Tar from PEAR or custom code)
  • no problem with unicode in *nix, Windows again expects DOS encoding :-(
  • seems suitable for backup/restore - yay!


  1. added zip processing class that fully hides the underlying library
  2. using single byte encoding "garbage in/garbage out" approach for encoding of files in zip archives; add new 'zipencoding' string into lang packs (ex: cp852 DOS charset for Czech locale) and use it during extraction (we might support true unicode later when PHP Zip extension does that)

Not implemented yet

  • antivirus scanning - this needs a different api because the upload of files is now handled via repository plugins

See also
