Note: This documentation is for Moodle 2.7. For up-to-date documentation see File API.

Development talk:File API: Difference between revisions

From MoodleDocs
No edit summary
 
(155 intermediate revisions by 8 users not shown)
Line 1: Line 1:
Some quick questions to avoid forgetting them:
===Main tasks===
* File Storage API:
** abstract (M1)
** local pool implementation (M1)
** DB schema (M1)
** deletion, acls, metadata (M1)
** problem: empty directories, file overwriting
* File Manager API:
** unique class, able to handle one "file area" (M2)
** security (M2)
** hack old file manager to be able to work with new fileareas (M3)
** js and non js implementations of FileManager (M4)
** integration with editor (M4)
** integration with formslib (M4)
** integration with repos (M4)
** problem: zip support
* File Serving:
** from pool:
*** file.php
*** pluginfile.php
*** draftfile.php
*** userfile.php
** from other moddata places:
*** rssfile.php
*** user/pix.php
*** user/pixgroup.php
* Migration:
** course files (as much as possible, allow fallback) (M2)
** moddata
**
* Backup & restore:


1) Will them be under the control of FileAPI (or, as they are now, fixed local storage)?
===Milestones===
M1: File storage API completed (this week)
M2: migration of course files + new filephp + FileManager + hacked old file manager (next monday)
M3: ...


- dataroot/temp
- dataroot/lang
- dataroot/cache
- dataroot/environment
- dataroot/filter
- dataroot/rss
- dataroot/search
- dataroot/sessions
- dataroot/upgradelogs


[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : I don't see these as being in the API - I've updated the spec.
2) Assuming we'll have a cool OOP FileAPI...
- a) Will it support different FileAPI classes (to be able to store in other systems) ?
- b) Will it support multiple FileAPI classes working together (like the Repo) ?
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : Hmm, I suppose it makes sense to switch the backend from local file storage to specify something else (eg database storage) but multiple File storage places doesn't make sense to me, that is the Repository API and the Portfolio API. 
3) I've annotated in red some things that have sounded strange in my first look.
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : Thanks, all fixed. 
4) Are we going to have "directory records" in the implementation, or that is going to be handled exclusively by the "moodlepath" column?
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : Good point.  I was thinking of moodlepath only but I wonder if directory records might be more efficient.  New table, I guess.
5) One general question... are we going to "force" all modules to be "autocontained" ? How are we going to handle resources, for example (with all those css, links, images..). In general how are we going to handle multiple-file packages?
[[User:Martin Dougiamas|Martin Dougiamas]] 03:20, 28 April 2008 (CDT) : they'll be a set of files, probably in a "directory" specified by moodlepath of directory record.  What problems do you see?  Should we retain better knowledge of the original group of files?
That's all for now, ciao4niao :-) [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 21:27, 5 April 2008 (CDT)
== Making Storage 'content addressable' ==
One opportunity  which this API opens up is the possibility of making the actual storage of files 'content addressable' . That is, if two users upload the same image for example, only store this file once on disk. This  brings benefits in reducing the amount of storage and improving caching (especially in increasingly common situations the moodle data store served from a NFS directory or other remote storage similar). To do this we could use a hash like sha-1 on the file and store the file on disk named by its hash (rather than some arbitary id). Then when someone uploads the same file as has already been uploaded, the hash matches and we just point the database record to the same file on disk. This technique is increasingly being used by enterprise-style repositories as well as things like git. I can see the major benefits in things like scorm packages other such things which have 100's of small duplicate files stored multiple times per package and course, so sometimes you can have the same image file stored 20 different times across 20 different packages in one course, which is then duplicated for multiple classes etcetc. --[[User:Dan Poltawski|Dan Poltawski]] 07:32, 1 June 2008 (CDT)
Excellent idea, Dan, it's in  [[User:Martin Dougiamas|Martin Dougiamas]] 01:43, 20 June 2008 (CDT)
== Specific File Attachments? ==
We have entries for 'moduleinstance', do we need entries to identify files per other attachments? Such as forum posts, wiki attachments, database attachments, assignment submissions?
[[User:Mike Churchward|Mike Churchward]] 16:27, 9 June 2008 (CDT)
I'm not sure if we need to have such links back to the exact forum post
or glossary entry.  The idea is that the forum posts (say) would reference
the file->id.  Would it be useful to have back links too ...? 
[[User:Martin Dougiamas|Martin Dougiamas]] 22:30, 22  June 2008 (CDT)
==Squid==
Consider proxy support. --[[User:Helen Foster|Helen Foster]] 16:53, 9 June 2008 (CDT)
== Batch uploads (zips) ==
Should we require that file API (optionally) require some form of batch file upload or zip/unzip function?
[[User:Mike Churchward|Mike Churchward]] 17:14, 9 June 2008 (CDT)
Yeah, it definitely needs to handle zipped files nicely ... [[User:Martin Dougiamas|Martin Dougiamas]] 22:33, 22  June 2008 (CDT)
=Skodak's rants=
The API should be split into several independent parts:
# File serving API
## file.php
## pluginfile.php
## userfile.php
## rssfile.php
# File storage API
## optional access control
## optional repo sync
# File management API
## File browsing
## File linking (editor integration)
## Upload from repository
==File serving API==
Deals with serving of files - browser requests file, Moodle sends it back. We have three main files.
===file.php===
Serves course files.
It would be nice to have some special hardcoded protection of backup files - preventing of backup file downloads/uploads; backups contain a lot of personal info, we could block restoring of backups from other sites too.
Implements basic file access. Ideally only images and files linked from course sections should be there, no XSS protection required - we expect javascript, sw, etc. there, no way to make it "secure". The access control is not critical any more if we move most most of the files into modules;
/file.php/courseid/dir/dir/filename.ext
===pluginfile.php=== (aka modfile.php)
Sends module, block, question files. Absolute file links need to be rewritten if html editing allowed in module. The links are stored internally as relative links.
* modules decide about access control
* optional XSS protection - student submitted files must not be served with normal headers, we have to force download instead; ideally there should be second wwwroot for serving of untrusted files
/pluginfile.php/contextid/arbitrary/params/or/dirs/filename.ext
pluginfile.php detects the type of plugin from context table, fetches basic info (like $course or $cm if appropriate) and calls plugin function (or later method) which does the access control and finally sends the file to user.
====blog example====
Blog entries or notes in general do not have context id (because they live in system context).
The note attachments are always served with XSS protection on, ideally we should use separate wwwroot for this. Access control can be hardcoded.
/pluginfile.php/SYSCONTEXTID/blog/blogenryid/attachmentname.ext
====assignment example====
/pluginfile.php/assignmentcontextid/submission/userid/attachmentname.ext
/pluginfile.php/assignmentcontextid/downloadall/file.zip
====scorm example====
/pluginfile.php/scormcontextid/revisionnumber/dir/somescormfile.js
The revision counter is incremented when any file changes in order to prevent caching problems. The lifetime should be adjustable in module settings.
===userfile.php===
Personal file storage
* read/write own files only for now
/userfile.php/userid/dir/dir/filename.ext


===rssfile.php===
===rssfile.php===
Replaces rss/file.php wchi is kept only for backwards compatibility.
It's said that rss/file.php is kept only for backwards compatibility. But what exactly is meant by "backwards compatibility"?
RSS files should not require sessions/cookies, urls should contain some sort of security token/key
Just display feeds with nice error messages and info how to re-subscribe?
Internally the files may be stored in database or together with other files.
Etag support should be implemented to improve performance.
 
/rssfile.php/contextid/any/parameters/module/wants/rss.xml
 
Again modules and plugins decide what gets sent to user.
 
==File storage API==
 
File contents are stored in moodledata/filepool using sha1 hashes instead of file names.
 
=== file table ===
 
This table contains one entry for every file.  Enough information is kept here so that the file can be fully identified and retrieved again if necessary.
 
{| border="1" cellpadding="2" cellspacing="0"
|'''Field'''
|'''Type'''
|'''Default'''
|'''Info'''
 
|-
|'''id'''
|int(10) 
|
|autoincrementing
 
|-
|sha1hash
|varchar(40)
|
|The sha1 hash of content.
 
|-
|'''contextid'''
|int(10)
|
|The context id defined in context table - identifies the instance of plugin owning the file.
 
|-
|'''instanceid'''
|int(10)
|
|Optional - some plugin specific instance id (eg. forum post, blog entry or assignment submission, user id for user files)
 
|-
|plugin
|varchar(255)
|
|The module that is the "owner" of this file (eg "moodle", "blog", "mod/assignment" or "blocks/html")
 
|-
|filetype
|varchar(255)
|
|Like submissions, filemanager files (images and swf linked from summaries), etc.
 
|-
|filename
|varchar(255)
|
|The full Unicode name of this file (case sensitive)
 
|-
|filepath
|text
|NULL
|Optional - relative path to file from module content root, useful in Scorm and Resource mod - most of the mods do not need this
 
|-
|timecreated
|int(10)
|
|The time this file was created (if known), otherwise same as time imported
 
|-
|timemodified
|int(10)
|
|The last time the file was modified
 
|-
|timeaccessed
|int(10)
|NULL
|The last time this file was accessed for any reason (not sure about this)
 
|-
|userid
|int(10) 
|NULL
|Optional - general id field - meaning depending on plugin
|}
 
=== repository_sync table ===
 
This table contains information how to synchronise data with repositories.
 
{| border="1" cellpadding="2" cellspacing="0"
|'''Field'''
|'''Type'''
|'''Default'''
|'''Info'''
 
|-
|'''id'''
|int(10) 
|
|autoincrementing
 
|-
|'''fileid'''
|int(10)
|
|Id of file.
 
|-
|'''repositoryid'''
|int(10)
|
|The repository instance this is associated with, see [[Development:Repository_API]]


|-
== meta information ==
|updates
|int(10)
|
|Specifies the update schedule (0 = none, 1 = on demand, other = some period in seconds)


|-
It seems like it would be nice to include some meta information about certain file types, e.g.:
|repositorypath
|text
|
|The full path to the original file on the repository


|-
* images: width, height, alt
|timeimportfirst
* flash files: width, height
|int(10)
|
|The first time this file was imported into Moodle


|-
Regarding: [[Development:File_API#Files_database_tables]]
|timeimportlast
|int(10)
|
|The most recent time that this file was imported into Moodle
|}


==File management API==
::Perhaps we could also hash filepath and filename and index by them, to save some text limitations in the DB side (length limits of indexes, not indexable, complex retrieval...). [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 11:54, 29 June 2008 (CDT)


'''TODO'''
::Also, perhaps we should store finally the plugin type there to save some queries per request, using it to drive to the correct file handling of each plugin. [[User:Eloy Lafuente (stronk7)|Eloy Lafuente (stronk7)]] 18:54, 29 June 2008 (CDT)

Latest revision as of 06:26, 2 February 2009

Main tasks

  • File Storage API:
    • abstract (M1)
    • local pool implementation (M1)
    • DB schema (M1)
    • deletion, acls, metadata (M1)
    • problem: empty directories, file overwriting
  • File Manager API:
    • unique class, able to handle one "file area" (M2)
    • security (M2)
    • hack old file manager to be able to work with new fileareas (M3)
    • js and non js implementations of FileManager (M4)
    • integration with editor (M4)
    • integration with formslib (M4)
    • integration with repos (M4)
    • problem: zip support
  • File Serving:
    • from pool:
      • file.php
      • pluginfile.php
      • draftfile.php
      • userfile.php
    • from other moddata places:
      • rssfile.php
      • user/pix.php
      • user/pixgroup.php
  • Migration:
    • course files (as much as possible, allow fallback) (M2)
    • moddata
  • Backup & restore:

Milestones

M1: File storage API completed (this week) M2: migration of course files + new filephp + FileManager + hacked old file manager (next monday) M3: ...


rssfile.php

It's said that rss/file.php is kept only for backwards compatibility. But what exactly is meant by "backwards compatibility"? Just display feeds with nice error messages and info how to re-subscribe?

meta information

It seems like it would be nice to include some meta information about certain file types, e.g.:

  • images: width, height, alt
  • flash files: width, height

Regarding: Development:File_API#Files_database_tables

Perhaps we could also hash filepath and filename and index by them, to save some text limitations in the DB side (length limits of indexes, not indexable, complex retrieval...). Eloy Lafuente (stronk7) 11:54, 29 June 2008 (CDT)
Also, perhaps we should store finally the plugin type there to save some queries per request, using it to drive to the correct file handling of each plugin. Eloy Lafuente (stronk7) 18:54, 29 June 2008 (CDT)