Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

File API (old)

From MoodleDocs
Revision as of 13:04, 10 November 2013 by Petr Škoda (škoďák) (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Warning: This page is no longer in use. The information contained on the page should NOT be seen as relevant or reliable.

This page outlines the current thinking about implementing file storage and access in Moodle 2.0. It's a SPECIFICATION UNDER CONSTRUCTION!

This page has been replaced by File API and contains outdated information!

A lot of this has been brought over from the discussion about the Repository API which is intimately connected.

The page is open for everyone so everyone can help correct mistakes and help with the evolution of this document. However, if you have questions, problems to report or major changes to suggest please add them to the page comments, or start a discussion in the Repositories forum. We'll endeavour to merge all such suggestions into the main spec before we start development.


Objectives

  1. Allow files to be added directly into Moodle (as we do now)
  2. Remember where files came from
  3. Give modules control over the access to files
  4. Allow content to be used in multiple Moodle contexts securely and simply via capabilities
  5. Consistent and simple approach for ALL file handling throughout Moodle


Overview

The File API is a core set of interfaces that all Moodle code will use to:

  1. copy files into Moodle
  2. store files within Moodle
  3. display files to Moodle users

It applies only to "user" files. It will NOT apply to local files and caches created by Moodle such as these directories in dataroot: temp, lang, cache, environment, filter, rss, search, sessions, upgradelogs etc

Use cases

Coming soon


General Architecture

All file-handling areas in Moodle (eg adding a new resource, adding attachments to a forum post, uploading assignments) will be rewritten to talk to the standard API class methods in a standard way.

When adding a file, the interface will allow a choice from the (active) Repository_API repository plugins, each of which is a subclass of the core Repository class. The default plugin is "local" which shows local files already in Moodle (and allows files to be added from desktop there, much like current filemanager which it replaces).

As is usual in Moodle, there will be admin settings to disable/enable certain repository plugins as standard, as well as user settings so that users can add their own personal repositories to the standard list (eg Yahoo Briefcase or Google Docs) and to select their default repository.

Once a file has been selected the file will almost always be copied into Moodle there and then. However there will also be options to:

  • only return the URL to the file if it's desired to keep it external (but this does present security and integrity risks), or
  • refresh the local file copy regularly and automatically
  • refresh the file manually from the File manager interface

All files in Moodle will be listed in a table (see below) allowing us to store various metadata about each file. The file contents will not be in the database (though we could easily offer that option if we want to), they will be on disk with a name related to the id rather than the "human" name (this avoids a lot of OS Unicode problems).

The module that is responsible for initiating the file will be remembered as the "owner" of that file, and will be responsible for access to that file afterwards, either by publishing the permissions required to access the file or by providing a callback function that can be used. For example, the assignment plugin may, after allowing a student to select a file to be submitted, add permissions so that people who have grade permissions in that assignment can read it. Or it may choose to provide a function that can do a more detailed check based on dates and so on.

All files will be served via a single control script in Moodle, located at $CFG->fileroot. This could be the same as $CFG->wwwroot by default, but will be recommended (for security and avoiding XSS) that Moodle admins set up a second DNS name pointing to this script eg the main site could be at http://moodle.domain.edu but files would be served via http://moodlefiles.domain.edu/file.php. (We'll have to set session cookies on both domains and keep them in sync somehow).

The file.php will serve files using slasharguments almost as now. See the section below on serving files for the details of this.

Local Files

In general, all external files will be copied locally and stored in Moodle. All existing files in the Moodle dataroot course areas will be moved into this new system during the upgrade.

The files will not be stored as they have been in the past. The new file system is "flat" with each file stored as an name calculated from the content (see below). The full name, path and other metadata will now be stored in tables.

The name of the file will be the SHA1 hash calculated from the content of the file using the PHP function sha1_file(). This results in names like: 231e2dc421be4fcd0172e5afceea3970e2f3d940.jpg

To avoid running out of operating system file nodes we'll use a directory structure with perhaps three levels (a very conservative max of 1000 nodes per directory allows a billion files to be stored):

dataroot 
   /files
      /0
      /1
      /2
        /0
        /1
        /2 
        /3
           /0
           /1
              /231e2dc421be4fcd0172e5afceea3970e2f3d940.jpg
      /3
      /4
      /5 
      /6
      /7
      ...
      /e
      /f

We should probably keep the mime-derived file extensions (eg .jpg) to help people who might be browsing the files directly for some reason.

The big advantage of using this scheme is that if two or more files have the same content, or the same file is used in different contexts then there will only be one copy of the actual data. A simple clean-up function in cron could find and delete file data that no longer have any references to them in the file table (or it could be part of the file API that deletes files).

File serving

There will be at least three different scripts serving files (for full security).

Course files: file.php

This script is for general course files and backward compatibility for old modules.

  1. File gets a URL like: file.php/courseid/dir/dir/file.jpg
  2. File uses path and courseid to get the record from the file table for all information about this file.
  3. File uses fileid to get the current ACL from the file_access table.
  4. Each line of the ACL is checked, if any of them are true then access is given.
    1. Use has_capability with context and capability to check permissions
  5. If access is allowed then serve the file.

Module files: modfile.php

This script is a new one giving modules "ownership" over files and complete control (if required) over their access. Modules will generally provide a callback function to determine access to a file.

  1. File gets a URL like: modfile.php/contextid/dir/dir/file.jpg
  2. File uses path and contextid to get the record from the file table for all information about this file.
  3. File uses fileid to get the current ACL from the file_access table.
  4. Each line of the ACL is checked, if any of them are true then access is given.
    1. If the capability is "function/accessfunction" then file.php looks for a function called accessfunction in the module's lib.php to return true/false.
    2. Otherwise use has_capability with context and capability to check permissions
  5. If access is allowed then serve the file.

This should work for all kinds of modules (not just activity modules)... we need to make that efficient.

User files: userfile.php

This script is a new one for users to share personal files (eg could be images embedded in HTML). The only security is public/private.

  1. File gets a URL like: userfile.php/userid/dir/dir/file.jpg
  2. File uses path and userid to get the record from the file table for all information about this file.
  3. If the file is "private" then require current user to match userid. Otherwise the file is "public" and no check is performed.
  4. Serve the file

From the interface point of view, is a user repository of files for casual use, governed by quotas.

  1. User goes to the "Files" area (similar to current Moodle, but for all users)
  2. User can upload/download/rename/move/delete their own set of files.
  3. Each of these files can either be marked PRIVATE or PUBLIC (to everyone).
  4. A second tab on that page allows everyone to browse all the public files from everyone else.
  5. A user's public files can also be listed near their profile.
  6. The listings are smart about showing various media

Tables

file

This table contains one entry for every file. Enough information is kept here so that the file can be fully identified and retrieved again if necessary.

Field Type Default Info
id int(10) autoincrementing
filename varchar The full Unicode name of this file
mimetype varchar Full mimetype of the file (or should we rely on extension?)
userid int(10) The user id of the person who created this entry
modulename varchar(255) The module that is the "owner" of this file (eg "moodle" or "mod/assignment" or "blocks/html")
moduleinstance int(10) The instance of the module that is the "owner" of this file (eg assignment id 5)
modulecallback boolean If true then file.php will use a class in modulename/modfile.php to determine access
originalfileid int(10) The id of the file which this is a copy of. If this is set, then all changes to the parent will be copied to this entry too.
repositoryid int(10) The repository instance this is associated with, see Repository_API
updates int(10) Specifies the update schedule (0 = none, 1 = on demand, other = some period in seconds)
cachetime int(10) Specifies how long this file can be cached by browsers
moodlepath text The virtual path to the file locally (so we can still have apparent subdirectories etc)
repositorypath text The full path to the original file on the repository
timeimportfirst int(10) The first time this file was imported into Moodle
timeimportlast int(10) The most recent time that this file was imported into Moodle
timecreated int(10) The time this file was created (if known), otherwise same as time imported
timemodified int(10) The last time the file was modified
timeaccessed int(10) The last time this file was accessed for any reason

file_access

This table describes the ACL for each file, so that checks can easily be made on whether someone can see this file or not. Note there can be multiple entries per file. Users can ALWAYS see their own files, so there are no entries here for that.

Field Type Default Info
id int(10) autoincrementing
fileid int(10) The file we are defining access for
contextid int(10) The context where this file is being published
capability text The capability that is required to see this file.

Class methods

File class

This class implements the display and management of files from local storage, with full access checking. Some of the functions are for single files, while some are optimised for bulk display and searching (eg in the personal files interface).

add_file

delete_file

modify_file

...

Areas in Moodle that need re-writing

See also