Note:

If you want to create a new page for developers, you should create it on the Moodle Developer Resource site.

Talk:File API internals: Difference between revisions

From MoodleDocs
m (moved Talk:File API to Talk:File API internals: We need the URL for something more useful Using_the_File_API)
 
(124 intermediate revisions by 8 users not shown)
Line 1: Line 1:
Some quick questions to avoid forgetting them:
===Main tasks===
* File Storage API:
** abstract (M1)
** local pool implementation (M1)
** DB schema (M1)
** deletion, acls, metadata (M1)
** problem: empty directories, file overwriting
* File Manager API:
** unique class, able to handle one "file area" (M2)
** security (M2)
** hack old file manager to be able to work with new fileareas (M3)
** js and non js implementations of FileManager (M4)
** integration with editor (M4)
** integration with formslib (M4)
** integration with repos (M4)
** problem: zip support
* File Serving:
** from pool:
*** file.php
*** pluginfile.php
*** draftfile.php
*** userfile.php
** from other moddata places:
*** rssfile.php
*** user/pix.php
*** user/pixgroup.php
* Migration:
** course files (as much as possible, allow fallback) (M2)
** moddata
**
* Backup & restore:


1) Will them be under the control of FileAPI (or, as they are now, fixed local storage)?
===Milestones===
M1: File storage API completed (this week)
M2: migration of course files + new filephp + FileManager + hacked old file manager (next monday)
M3: ...


- dataroot/temp
- dataroot/lang
- dataroot/cache
- dataroot/environment
- dataroot/filter
- dataroot/rss
- dataroot/search
- dataroot/sessions
- dataroot/upgradelogs


[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : I don't see these as being in the API - I've updated the spec.
2) Assuming we'll have a cool OOP FileAPI...
- a) Will it support different FileAPI classes (to be able to store in other systems) ?
- b) Will it support multiple FileAPI classes working together (like the Repo) ?
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : Hmm, I suppose it makes sense to switch the backend from local file storage to specify something else (eg database storage) but multiple File storage places doesn't make sense to me, that is the Repository API and the Portfolio API. 
3) I've annotated in red some things that have sounded strange in my first look.
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : Thanks, all fixed. 
4) Are we going to have "directory records" in the implementation, or that is going to be handled exclusively by the "moodlepath" column?
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : Good point.  I was thinking of moodlepath only but I wonder if directory records might be more efficient.  New table, I guess.
5) One general question... are we going to "force" all modules to be "autocontained" ? How are we going to handle resources, for example (with all those css, links, images..). In general how are we going to handle multiple-file packages?
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : they'll be a set of files, probably in a "directory" specified by moodlepath of directory record.  What problems do you see?  Should we retain better knowledge of the original group of files?
That's all for now, ciao4niao :-) [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 21:27, 5 April 2008 (CDT)
== Making Storage 'content addressable' ==
One opportunity  which this API opens up is the possibility of making the actual storage of files 'content addressable' . That is, if two users upload the same image for example, only store this file once on disk. This  brings benefits in reducing the amount of storage and improving caching (especially in increasingly common situations the moodle data store served from a NFS directory or other remote storage similar). To do this we could use a hash like sha-1 on the file and store the file on disk named by its hash (rather than some arbitary id). Then when someone uploads the same file as has already been uploaded, the hash matches and we just point the database record to the same file on disk. This technique is increasingly being used by enterprise-style repositories as well as things like git. I can see the major benefits in things like scorm packages other such things which have 100's of small duplicate files stored multiple times per package and course, so sometimes you can have the same image file stored 20 different times across 20 different packages in one course, which is then duplicated for multiple classes etcetc. --[[User:Dan Poltawski|Dan Poltawski]] 07:32, 1 June 2008 (CDT)
Excellent idea, Dan, it's in  [[User:Martin Dougiamas|Martin Dougiamas]] 01:43, 20 June 2008 (CDT)
== Specific File Attachments? ==
We have entries for 'moduleinstance', do we need entries to identify files per other attachments? Such as forum posts, wiki attachments, database attachments, assignment submissions?
[[User:Mike Churchward|Mike Churchward]] 16:27, 9 June 2008 (CDT)
I'm not sure if we need to have such links back to the exact forum post
or glossary entry.  The idea is that the forum posts (say) would reference
the file->id.  Would it be useful to have back links too ...? 
[[User:Martin Dougiamas|Martin Dougiamas]] 22:30, 22  June 2008 (CDT)
==Squid==
Consider proxy support. --[[User:Helen Foster|Helen Foster]] 16:53, 9 June 2008 (CDT)
Expanding on this from the hackfest discussion - since we are hashing for storage anyway, we could
serve the sha1 hash as the Etag for every file quite easily. --[[User:Dan Poltawski|Dan Poltawski]] 04:16, 25 June 2008 (CDT)
:Agree, although I'd rehash it, to avoid exposing info about internal "filenames" in new storage over the web. Bit paranoid but ... [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 04:27, 25 June 2008 (CDT)
== Batch uploads (zips) ==
Should we require that file API (optionally) require some form of batch file upload or zip/unzip function?
[[User:Mike Churchward|Mike Churchward]] 17:14, 9 June 2008 (CDT)
Yeah, it definitely needs to handle zipped files nicely ... [[User:Martin Dougiamas|Martin Dougiamas]] 22:33, 22  June 2008 (CDT)
=Skodak's rants=
The API should be split into several independent parts:
# File serving API
## file.php
## pluginfile.php
## userfile.php
## rssfile.php
# File storage API
## optional access control
## optional repo sync
# File management API
## File browsing
## File linking (editor integration)
## Upload from repository
==File serving API==
Deals with serving of files - browser requests file, Moodle sends it back. We have three main files. It is important to setup slasharguments on server (file.php/some/thing/xxx.jpg), any content that relies on relative links can not work without it (scorm, uploaded html pages, etc.).
===file.php===
Serves course files.
It would be nice to have some special hardcoded protection of backup files - preventing of backup file downloads/uploads; backups contain a lot of personal info, we could block restoring of backups from other sites too.
Implements basic file access. Ideally only images and files linked from course sections should be there, no XSS protection required - we expect javascript, sw, etc. there, no way to make it "secure". The access control is not critical any more if we move most most of the files into modules;
/file.php/courseid/dir/dir/filename.ext
===pluginfile.php===
(aka modfile.php)
Sends module, block, question files.
* modules decide about access control
* optional XSS protection - student submitted files must not be served with normal headers, we have to force download instead; ideally there should be second wwwroot for serving of untrusted files
* only internal links to selected areas are supported - you can link images in summary area, but not the assignment submissions
Absolute file links need to be rewritten if html editing allowed in module. The links are stored internally as relative links. Before editing or display the internal link representation is converted to absolute links using simple str_replace() @@thipluginlink/summary@@/image.jpg --> /pluginfile.php/assignmentcontextid/summary/image.jpg, it is converted back to internal links before saving.
/pluginfile.php/contextid/areaname/arbitrary/params/or/dirs/filename.ext
pluginfile.php detects the type of plugin from context table, fetches basic info (like $course or $cm if appropriate) and calls plugin function (or later method) which does the access control and finally sends the file to user. ''areaname'' separates files by type and divides the context into several subtrees - for example ''summary'' files (images used in module intros), post attachments, etc.
====blog example====
Blog entries or notes in general do not have context id (because they live in system context, SYSCONTEXTID bellow is the id of system context).
The note attachments are always served with XSS protection on, ideally we should use separate wwwroot for this. Access control can be hardcoded.
/pluginfile.php/SYSCONTEXTID/blog/blogenryid/attachmentname.ext
====assignment example====
/pluginfile.php/assignmentcontextid/summary/someimage.jpg
/pluginfile.php/assignmentcontextid/submission/userid/attachmentname.ext
/pluginfile.php/assignmentcontextid/extra/allsubmissionfiles.zip
====scorm example====
/pluginfile.php/scormcontextid/summary/someimage.jpg
/pluginfile.php/scormcontextid/content/revisionnumber/dir/somescormfile.js
The revision counter is incremented when any file changes in order to prevent caching problems. The lifetime should be adjustable in module settings.
====questions example====
pluginfile.php/SYSCONTEXTID/question/questionid/file.jpg
====quiz example====
pluginfile.php/quizcontextid/summary/niceimage.jpg
pluginfile.php/quizcontextid/report/type/export.ods
===userfile.php===
Personal file storage, intended as an online storage of work in progress like assignments before the submission.
* read/write own files only for now
* option to share with others later
* personal "websites" will not be supported (security)
/userfile.php/userid/dir/dir/filename.ext


===rssfile.php===
===rssfile.php===
Replaces rss/file.php wchi is kept only for backwards compatibility.
It's said that rss/file.php is kept only for backwards compatibility. But what exactly is meant by "backwards compatibility"?
RSS files should not require sessions/cookies, urls should contain some sort of security token/key
Just display feeds with nice error messages and info how to re-subscribe?
Internally the files may be stored in database or together with other files.
Etag support should be implemented to improve performance.
 
/rssfile.php/contextid/any/parameters/module/wants/rss.xml
/rssfile.php/SYSCONTEXTID/blog/userid/rss.xml
 
Again modules and plugins decide what gets sent to user.
 
===Physical file storage===
* must be effective - we havea lot of duplicate files
* support utf-8 on all platforms
* reasonably fast
 
TODO
 
===Temporary files===
 
TODO
 
===Legacy file serving===
Going to use good-old separate directories in moodledata.
 
# user avatars
# group avatars
# tex, algebra
# rss cache (?full rss rewrite soon?)
 
==File storage API==
 
File contents are stored in moodledata/filepool using sha1 hashes instead of file names.
 
=== file table ===
 
This table contains one entry for every file.  Enough information is kept here so that the file can be fully identified and retrieved again if necessary.
 
{| border="1" cellpadding="2" cellspacing="0"
|'''Field'''
|'''Type'''
|'''Default'''
|'''Info'''
 
|-
|'''id'''
|int(10) 
|
|autoincrementing
 
|-
|sha1hash
|varchar(40)
|
|The sha1 hash of content.
 
|-
|'''contextid'''
|int(10)
|
|The context id defined in context table - identifies the instance of plugin owning the file.
 
|-
|instanceid
|int(10)
|
|Optional - some plugin specific instance id (eg. forum post, blog entry or assignment submission, user id for user files)
 
|-
|plugin
|varchar(255)
|
|The module that is the "owner" of this file (eg "moodle", "blog", "mod/assignment" or "blocks/html")
 
|-
|filetype
|varchar(255)
|
|Like submissions, filemanager files (images and swf linked from summaries), etc.
 
|-
|filename
|varchar(255)
|
|The full Unicode name of this file (case sensitive)
 
|-
|filepath
|text
|NULL
|Optional - relative path to file from module content root, useful in Scorm and Resource mod - most of the mods do not need this
 
|-
|timecreated
|int(10)
|
|The time this file was created (if known), otherwise same as time imported
 
|-
|timemodified
|int(10)
|
|The last time the file was modified
 
|-
|filesize
|int(10)
|
|size of file - bytes
 
|-
|userid
|int(10) 
|NULL
|Optional - general id field - meaning depending on plugin
|}
 
index on "contextid, instanceid"
 
=== file_metadata table ===
 
This table contains extra metadata about files.  Repositories could provide this, or it could be manually edited in the local copy.
 
{| border="1" cellpadding="2" cellspacing="0"
|'''Field'''
|'''Type'''
|'''Default'''
|'''Info'''
 
|-
|'''id'''
|int(10) 
|
|autoincrementing
 
|-
|'''fileid'''
|int(10)
|
|Id of file.
 
|-
|'''name'''
|varchar(255)
|
|The name of extra metadata
 
|-
|value
|text
|
|Value
 
|}
 
=== file_acl ===
 
This table describes optional ACL for file. This is not required in majority of cases, modules usually hardcode the file access logic, course files should not be used much any more.
 
{| border="1" cellpadding="2" cellspacing="0"
|'''Field'''
|'''Type'''
|'''Default'''
|'''Info'''
 
|-
|'''id'''
|int(10) 
|
|autoincrementing
 
|-
|'''fileid'''
|int(10) 
|
|The file we are defining access for
 
|-
|'''contextid'''
|int(10)
|
|The context where this file is being published
 
|-
|'''capability'''
|text
|
|The capability that is required to see this file.
|}
 
====acl notes====
* this is missing some concept similar to '''user/group/others''', for example in case of user files typical user can not assign permissions or view them - this becomes useless there
* it is more important to synchronise the availability of file link and the file itself - having link pointing to inaccessible file or file which is accessible when not wanted are both problems
* browser/proxy caching works against us here - "secret" files should not be cached
 
=== repository_sync table ===
 
This table contains information how to synchronise data with repositories. Data would be synchronised from cron.php or on demand from file manager. The sync would be one way only (repository-->local file).
 
{| border="1" cellpadding="2" cellspacing="0"
|'''Field'''
|'''Type'''
|'''Default'''
|'''Info'''
 
|-
|'''id'''
|int(10) 
|
|autoincrementing
 
|-
|'''fileid'''
|int(10)
|
|Id of file.
 
|-
|'''repositoryid'''
|int(10)
|
|The repository instance this is associated with, see [[Repository_API]]
 
|-
|updates
|int(10)
|
|Specifies the update schedule (0 = none, 1 = on demand, other = some period in seconds)
 
|-
|repositorypath
|text
|
|The full path to the original file on the repository
 
|-
|timeimportfirst
|int(10)
|
|The first time this file was imported into Moodle


|-
== meta information ==
|timeimportlast
|int(10)
|
|The most recent time that this file was imported into Moodle
|}


==File management API==
It seems like it would be nice to include some meta information about certain file types, e.g.:


This section describes following:
* images: width, height, alt
#interactions with html editor
* flash files: width, height
#file manager
#interactions with repositories


==Major problems==
Regarding: [[File_API#Files_database_tables]]
#unicode chars in zip files


=== Some little comments to be considered (to avoid forgetting them) ===
::Perhaps we could also hash filepath and filename and index by them, to save some text limitations in the DB side (length limits of indexes, not indexable, complex retrieval...). [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 11:54, 29 June 2008 (CDT)


* each context will have its own "file manager"
::Also, perhaps we should store finally the plugin type there to save some queries per request, using it to drive to the correct file handling of each plugin. [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 18:54, 29 June 2008 (CDT)
* separate "file manager context" files (FMF) and "internal context" (ICF) files (current modedit files, submissions, attachements...)
* /pluginfile.php/SYSCONTEXTID/{blog|question} and so... will have own FMF too? Or only ICF ?
* rssfile.php: I'd support both Etag (cool) and Last-Modified (more used), when we receive If-None-Match/If-Modified-Since => 304
* Way to migrate
* Way to copy between contexts
* Links = -1 for them
* Deletion strategy (locks, quarantine status...)

Latest revision as of 03:32, 16 January 2012

Main tasks

  • File Storage API:
    • abstract (M1)
    • local pool implementation (M1)
    • DB schema (M1)
    • deletion, acls, metadata (M1)
    • problem: empty directories, file overwriting
  • File Manager API:
    • unique class, able to handle one "file area" (M2)
    • security (M2)
    • hack old file manager to be able to work with new fileareas (M3)
    • js and non js implementations of FileManager (M4)
    • integration with editor (M4)
    • integration with formslib (M4)
    • integration with repos (M4)
    • problem: zip support
  • File Serving:
    • from pool:
      • file.php
      • pluginfile.php
      • draftfile.php
      • userfile.php
    • from other moddata places:
      • rssfile.php
      • user/pix.php
      • user/pixgroup.php
  • Migration:
    • course files (as much as possible, allow fallback) (M2)
    • moddata
  • Backup & restore:

Milestones

M1: File storage API completed (this week) M2: migration of course files + new filephp + FileManager + hacked old file manager (next monday) M3: ...


rssfile.php

It's said that rss/file.php is kept only for backwards compatibility. But what exactly is meant by "backwards compatibility"? Just display feeds with nice error messages and info how to re-subscribe?

meta information

It seems like it would be nice to include some meta information about certain file types, e.g.:

  • images: width, height, alt
  • flash files: width, height

Regarding: File_API#Files_database_tables

Perhaps we could also hash filepath and filename and index by them, to save some text limitations in the DB side (length limits of indexes, not indexable, complex retrieval...). Eloy Lafuente (stronk7) 11:54, 29 June 2008 (CDT)
Also, perhaps we should store finally the plugin type there to save some queries per request, using it to drive to the correct file handling of each plugin. Eloy Lafuente (stronk7) 18:54, 29 June 2008 (CDT)