Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Talk:File API internals

From MoodleDocs

Some quick questions to avoid forgetting them:

1) Will them be under the control of FileAPI (or, as they are now, fixed local storage)?

- dataroot/temp - dataroot/lang - dataroot/cache - dataroot/environment - dataroot/filter - dataroot/rss - dataroot/search - dataroot/sessions - dataroot/upgradelogs

Martin Dougiamas 03:20, 28 April 2008 (CDT) : I don't see these as being in the API - I've updated the spec.

2) Assuming we'll have a cool OOP FileAPI...

- a) Will it support different FileAPI classes (to be able to store in other systems) ? - b) Will it support multiple FileAPI classes working together (like the Repo) ?

Martin Dougiamas 03:20, 28 April 2008 (CDT) : Hmm, I suppose it makes sense to switch the backend from local file storage to specify something else (eg database storage) but multiple File storage places doesn't make sense to me, that is the Repository API and the Portfolio API.

3) I've annotated in red some things that have sounded strange in my first look.

Martin Dougiamas 03:20, 28 April 2008 (CDT) : Thanks, all fixed.

4) Are we going to have "directory records" in the implementation, or that is going to be handled exclusively by the "moodlepath" column?

Martin Dougiamas 03:20, 28 April 2008 (CDT) : Good point. I was thinking of moodlepath only but I wonder if directory records might be more efficient. New table, I guess.

5) One general question... are we going to "force" all modules to be "autocontained" ? How are we going to handle resources, for example (with all those css, links, images..). In general how are we going to handle multiple-file packages?

Martin Dougiamas 03:20, 28 April 2008 (CDT) : they'll be a set of files, probably in a "directory" specified by moodlepath of directory record. What problems do you see? Should we retain better knowledge of the original group of files?


That's all for now, ciao4niao :-) Eloy Lafuente (stronk7) 21:27, 5 April 2008 (CDT)


Making Storage 'content addressable'

One opportunity which this API opens up is the possibility of making the actual storage of files 'content addressable' . That is, if two users upload the same image for example, only store this file once on disk. This brings benefits in reducing the amount of storage and improving caching (especially in increasingly common situations the moodle data store served from a NFS directory or other remote storage similar). To do this we could use a hash like sha-1 on the file and store the file on disk named by its hash (rather than some arbitary id). Then when someone uploads the same file as has already been uploaded, the hash matches and we just point the database record to the same file on disk. This technique is increasingly being used by enterprise-style repositories as well as things like git. I can see the major benefits in things like scorm packages other such things which have 100's of small duplicate files stored multiple times per package and course, so sometimes you can have the same image file stored 20 different times across 20 different packages in one course, which is then duplicated for multiple classes etcetc. --Dan Poltawski 07:32, 1 June 2008 (CDT)

Excellent idea, Dan, it's in   Martin Dougiamas 01:43, 20 June 2008 (CDT)

Specific File Attachments?

We have entries for 'moduleinstance', do we need entries to identify files per other attachments? Such as forum posts, wiki attachments, database attachments, assignment submissions? Mike Churchward 16:27, 9 June 2008 (CDT)

I'm not sure if we need to have such links back to the exact forum post 
or glossary entry.  The idea is that the forum posts (say) would reference 
the file->id.  Would it be useful to have back links too ...?   
Martin Dougiamas 22:30, 22  June 2008 (CDT)

Squid

Consider proxy support. --Helen Foster 16:53, 9 June 2008 (CDT)

Expanding on this from the hackfest discussion - since we are hashing for storage anyway, we could serve the sha1 hash as the Etag for every file quite easily. --Dan Poltawski 04:16, 25 June 2008 (CDT)

Agree, although I'd rehash it, to avoid exposing info about internal "filenames" in new storage over the web. Bit paranoid but ... Eloy Lafuente (stronk7) 04:27, 25 June 2008 (CDT)

Batch uploads (zips)

Should we require that file API (optionally) require some form of batch file upload or zip/unzip function? Mike Churchward 17:14, 9 June 2008 (CDT)

Yeah, it definitely needs to handle zipped files nicely ... Martin Dougiamas 22:33, 22  June 2008 (CDT)

Skodak's rants

The API should be split into several independent parts:

  1. File serving API
    1. file.php
    2. pluginfile.php
    3. userfile.php
    4. rssfile.php
  2. File storage API
    1. optional access control
    2. optional repo sync
  3. File management API
    1. File browsing
    2. File linking (editor integration)
    3. Upload from repository

File serving API

Deals with serving of files - browser requests file, Moodle sends it back. We have three main files. It is important to setup slasharguments on server (file.php/some/thing/xxx.jpg), any content that relies on relative links can not work without it (scorm, uploaded html pages, etc.).

file.php

Serves course files. It would be nice to have some special hardcoded protection of backup files - preventing of backup file downloads/uploads; backups contain a lot of personal info, we could block restoring of backups from other sites too.

Implements basic file access. Ideally only images and files linked from course sections should be there, no XSS protection required - we expect javascript, sw, etc. there, no way to make it "secure". The access control is not critical any more if we move most most of the files into modules;

/file.php/courseid/dir/dir/filename.ext

pluginfile.php

(aka modfile.php) Sends module, block, question files.

  • modules decide about access control
  • optional XSS protection - student submitted files must not be served with normal headers, we have to force download instead; ideally there should be second wwwroot for serving of untrusted files
  • only internal links to selected areas are supported - you can link images in summary area, but not the assignment submissions

Absolute file links need to be rewritten if html editing allowed in module. The links are stored internally as relative links. Before editing or display the internal link representation is converted to absolute links using simple str_replace() @@thipluginlink/summary@@/image.jpg --> /pluginfile.php/assignmentcontextid/summary/image.jpg, it is converted back to internal links before saving.

/pluginfile.php/contextid/areaname/arbitrary/params/or/dirs/filename.ext

pluginfile.php detects the type of plugin from context table, fetches basic info (like $course or $cm if appropriate) and calls plugin function (or later method) which does the access control and finally sends the file to user. areaname separates files by type and divides the context into several subtrees - for example summary files (images used in module intros), post attachments, etc.

blog example

Blog entries or notes in general do not have context id (because they live in system context, SYSCONTEXTID bellow is the id of system context). The note attachments are always served with XSS protection on, ideally we should use separate wwwroot for this. Access control can be hardcoded.

/pluginfile.php/SYSCONTEXTID/blog/blogenryid/attachmentname.ext

assignment example

/pluginfile.php/assignmentcontextid/summary/someimage.jpg
/pluginfile.php/assignmentcontextid/submission/userid/attachmentname.ext
/pluginfile.php/assignmentcontextid/extra/allsubmissionfiles.zip

scorm example

/pluginfile.php/scormcontextid/summary/someimage.jpg
/pluginfile.php/scormcontextid/content/revisionnumber/dir/somescormfile.js

The revision counter is incremented when any file changes in order to prevent caching problems. The lifetime should be adjustable in module settings.

questions example

pluginfile.php/SYSCONTEXTID/question/questionid/file.jpg

quiz example

pluginfile.php/quizcontextid/summary/niceimage.jpg
pluginfile.php/quizcontextid/report/type/export.ods

userfile.php

Personal file storage, intended as an online storage of work in progress like assignments before the submission.

  • read/write own files only for now
  • option to share with others later
  • personal "websites" will not be supported (security)
/userfile.php/userid/dir/dir/filename.ext

rssfile.php

Replaces rss/file.php wchi is kept only for backwards compatibility. RSS files should not require sessions/cookies, urls should contain some sort of security token/key Internally the files may be stored in database or together with other files. Etag support should be implemented to improve performance.

/rssfile.php/contextid/any/parameters/module/wants/rss.xml
/rssfile.php/SYSCONTEXTID/blog/userid/rss.xml

Again modules and plugins decide what gets sent to user.

Physical file storage

  • must be effective - we havea lot of duplicate files
  • support utf-8 on all platforms
  • reasonably fast

TODO

Temporary files

TODO

Legacy file serving

Going to use good-old separate directories in moodledata.

  1. user avatars
  2. group avatars
  3. tex, algebra
  4. rss cache (?full rss rewrite soon?)

File storage API

File contents are stored in moodledata/filepool using sha1 hashes instead of file names.

file table

This table contains one entry for every file. Enough information is kept here so that the file can be fully identified and retrieved again if necessary.

Field Type Default Info
id int(10) autoincrementing
sha1hash varchar(40) The sha1 hash of content.
contextid int(10) The context id defined in context table - identifies the instance of plugin owning the file.
instanceid int(10) Optional - some plugin specific instance id (eg. forum post, blog entry or assignment submission, user id for user files)
plugin varchar(255) The module that is the "owner" of this file (eg "moodle", "blog", "mod/assignment" or "blocks/html")
filetype varchar(255) Like submissions, filemanager files (images and swf linked from summaries), etc.
filename varchar(255) The full Unicode name of this file (case sensitive)
filepath text NULL Optional - relative path to file from module content root, useful in Scorm and Resource mod - most of the mods do not need this
timecreated int(10) The time this file was created (if known), otherwise same as time imported
timemodified int(10) The last time the file was modified
filesize int(10) size of file - bytes
userid int(10) NULL Optional - general id field - meaning depending on plugin

index on "contextid, instanceid"

file_metadata table

This table contains extra metadata about files. Repositories could provide this, or it could be manually edited in the local copy.

Field Type Default Info
id int(10) autoincrementing
fileid int(10) Id of file.
name varchar(255) The name of extra metadata
value text Value

file_acl

This table describes optional ACL for file. This is not required in majority of cases, modules usually hardcode the file access logic, course files should not be used much any more.

Field Type Default Info
id int(10) autoincrementing
fileid int(10) The file we are defining access for
contextid int(10) The context where this file is being published
capability text The capability that is required to see this file.

acl notes

  • this is missing some concept similar to user/group/others, for example in case of user files typical user can not assign permissions or view them - this becomes useless there
  • it is more important to synchronise the availability of file link and the file itself - having link pointing to inaccessible file or file which is accessible when not wanted are both problems
  • browser/proxy caching works against us here - "secret" files should not be cached

repository_sync table

This table contains information how to synchronise data with repositories. Data would be synchronised from cron.php or on demand from file manager. The sync would be one way only (repository-->local file).

Field Type Default Info
id int(10) autoincrementing
fileid int(10) Id of file.
repositoryid int(10) The repository instance this is associated with, see Repository_API
updates int(10) Specifies the update schedule (0 = none, 1 = on demand, other = some period in seconds)
repositorypath text The full path to the original file on the repository
timeimportfirst int(10) The first time this file was imported into Moodle
timeimportlast int(10) The most recent time that this file was imported into Moodle

File management API

This section describes following:

  1. interactions with html editor
  2. file manager
  3. interactions with repositories

Major problems

  1. unicode chars in zip files

Some little comments to be considered (to avoid forgetting them)

  • each context will have its own "file manager"
  • separate "file manager context" files (FMF) and "internal context" (ICF) files (current modedit files, submissions, attachements...)
  • /pluginfile.php/SYSCONTEXTID/{blog|question} and so... will have own FMF too? Or only ICF ?
  • rssfile.php: I'd support both Etag (cool) and Last-Modified (more used), when we receive If-None-Match/If-Modified-Since => 304
  • Way to migrate
  • Way to copy between contexts
  • Links = -1 for them
  • Deletion strategy (locks, quarantine status...)