Difference between revisions of "Talk:Repository API"

Jump to: navigation, search
(Editing Repository Files / Version Control ?)
(Backup/restore relinking)
Line 42: Line 42:
  determine all the files in one course, probably using one SQL query).  
  determine all the files in one course, probably using one SQL query).  
  [[User:Martin Dougiamas|Martin Dougiamas]] 07:08, 29 February 2008 (CST)
  [[User:Martin Dougiamas|Martin Dougiamas]] 07:08, 29 February 2008 (CST)
If we are backing up/restoring on the same server, will copies of a course restore without file duplication? I use this a lot so that I can archive one year's course whilst modifying it for next year. [[User:Matt Gibson|Matt Gibson]] 03:12, 29 April 2008 (CDT)
===Access needed for both file and its instances===
===Access needed for both file and its instances===

Revision as of 08:12, 29 April 2008

(Ideas will be deleted from the comments section as they are resolved or merged into the main spec)

Missing concept of trusted files

Files are not created equal, some of them are to be trusted, some can not be trusted at all. Web browsers trust everything received from the server, files from server may access cookie information and thus scripting technologies may allow them to do anything user can do. We do have to trust our teachers because they are supposed to create the learning content, but we definitely can not trust all students.

Imagine if students were allowed to upload arbitrary files to server, like html file loaded with javascript and the server would happily serve them to all Moodle users. Our solution is to use html cleaning filters for submitted texts and force downloads of student uploaded files. To do this we must know if we trust the files or not. Unfortunately the forced downloads of student uploaded files and cleaning of html texts does not solve all problems, because bugs in browsers and especially browser plug-ins may sometimes be used to work around our protections.

The best solution would be to use separate web addresses for trusted and not trusted files (two wwwroots in config.php), not all sites may afford two different addresses but we should be imo prepared for this. Petr Škoda (škoďák) 16:42, 28 February 2008 (CST)

Great idea, yes.  In fact couldn't all the files be served via $CFG->fileroot all the time?  
Martin Dougiamas 07:08, 29 February 2008 (CST)

userid field could be used for this, but separate flag might be better in order to allow teachers to upload untrusted files (teacher uploads assignment submission for the student).

Relative file links

Flash, Java and SCORM require relative links and directory hierarchy in general - we must support it. Some SCORM packages load hundreds of files per page which means the file serving must be very fast with minimum of db access.

Reading the proposal above it seems the API is about serving of isolated files referenced by repository ids. HTML requires to use relative or absolute locators with file names, we can not use repository ids directly in relative links. In case of scorm we have absolute path to base of SCORM package of given activity and SCORM files use relative links inside the package. Solution could be to store relative paths directly in filename field ex:directory1/directory2/filename.ext.

Yes I agree, we definitely need to support (virtual) directories and slasharguments.
We could just match the file argument to a path in the db.  Perhaps add the fileid 
to the argument path as a primary key: fileid/directory1/directory2/filename.ext
Martin Dougiamas 07:08, 29 February 2008 (CST)

Virtual directories and files

Sometimes the content of files is generated on the fly (csv exports, etc.), there are many special files spread through codebase doing nearly the same, it should be imho possible to use the same file API for these.

Another virtual example is assignment submissions and webdav. I would like to see an option to browse the assignment submissions as directory structure, the top directory would be a list of names of users, inside html files with online assignment and uploaded files. This would allow us to implement simple zip&go or webdav based offline grading solutions. The problem here is that the content of this virtual submissions directly needs to be created on the fly based on user references, the proposed repository structure can not be used for this.

Very interesting idea! Martin Dougiamas 07:08, 29 February 2008 (CST)

Backup/restore relinking

We are supporting relinking inside courses only. Till now it was easy to guess if absolute link will work after restore on another server. There are several types of files:

  • course files - relinked during restore, works on any server if from the same course
  • module files - not relinked, urls are not permanent, link can not be copypasted (assignments, forum attachments, rss files, etc.)
  • user files - not relinked, links work only on original server (blog attachments, personal files, etc.)
I think all this will be simplified in the proposed system because everything is 
represented using the same system: a file with an ACL (backup can quickly 
determine all the files in one course, probably using one SQL query). 
Martin Dougiamas 07:08, 29 February 2008 (CST)

If we are backing up/restoring on the same server, will copies of a course restore without file duplication? I use this a lot so that I can archive one year's course whilst modifying it for next year. Matt Gibson 03:12, 29 April 2008 (CDT)

Access needed for both file and its instances

We need two types of access control - first who can create instances (link files), second who can access the instance (download the file).

For the first don't we already have those capabilities? (like mod/forum:createattachment,
moodle/course:managefiles) but they probably could use rationalising.   We'll need new 
ones per repository, too, of course.  
Martin Dougiamas 07:08, 29 February 2008 (CST)

Cache lifetime

There should be a way to specify cache filetime for each instance of file. For example 0 for uploaded assignments, 1 day for resource file. It might be better to allow modules to decide about this, at present it is hardcoded in file.php.

Great idea.  Martin Dougiamas 07:08, 29 February 2008 (CST)

Hierarchy in tables ?

It's possible that I don't undertand a key piece of the concept, but I wonder why there is no reference to any "parent id" in the file or file_instance table ? In other words, how the hierarchical structure is supposed to be "imported" from repository to Moodle ? If the hierarchy is reserved to the course context, and not to the repository context, it seems difficult to allow students to access to a complete directory, for example.

On the other hand, maybe it's only a question for the "local" repository type, and not to the "generic" repository API ? Allegre Guillaume 16:43, 5 March 2008 (CST)

Martin Dougiamas 12:58, 15 March 2008 (CDT): basically I was thinking we just store a full local path for each file instead of hierarchies. It wasn't in the db schema though: I've just added it. Thanks!

As for allowing access to a whole directory, that's something I've not thought about - thanks! I agree we need to support something like that in the interface. Hmm ..

Editing Repository Files / Version Control ?

How do you imagine to handle the "editing file" problem ? I can see several solutions :

  1. the simplest way : write access to a file really allows to edit (re-upload) the file, each instance being modified
  2. the "cheap copy" way : optionally, the modification is applied only to new file_instances (or those for which the teacher forces to). Here you have to handle two (then maybe more) revisions of this file.

This triggers the file revisions (or version control) question.

A related question is about versionned files : should it depend only upon the repository layer (for example, a plugin could implement a SVN "repository") ? Or should Moodle be aware of the file "revision number" ? Allegre Guillaume 17:00, 5 March 2008 (CST)

Martin Dougiamas 13:08, 15 March 2008 (CDT): I don't think we should start getting into such things, version and editing is the job of the dedicated external repository system and people should use that interface.

What we do have in Moodle is the file->updates field. This specifies when to get a fresh copy. It would be set when the user has specified they want Moodle to use the "latest version" of the file. And if the user specified a particular version in the repository interface then that is what gets copied (once) and the file->updates field is left as zero.

One scenario would be if someone wants to use version 1 in one course and version 2 in another, but since the version is in the URL to the original file (and thus a different remote path) they would be treated as two separate files in Moodle anyway.

Might be a small problem if they said "latest version, no update" in course 1 and then later on said "latest version, no update" in a second course. I guess that would re-download the latest version (which might have changed) and thus course 1 would have an unexpected update. We could solve this by alerting the user and giving them a choice, I suppose.

Replacing Moodle's File System

The API as specified still uses the standard Moodle file system, which has its limitations while providing a simple file access method. But, shouldn't we also be considering completely (or partially) replacing the Moodle file system with the repository system? To that end, the file interface would be the same to users, but where the files are and how they are accessed would be up to which repository was being used. In this system, the standard Moodle file system would just be one of the available repositories. Then the API could support more robust access controls if available, or very few (as Moodle does now). This could allow for per-directory / per-file privilege granting, common file areas so that courses could access the same copy of a file (rather than copying it into the Moodle file area), etc.

The API could also support multiple repositories and allow choosing a file as a link rather than copying it.

Of course, this would also mean allowing write access to the repository area from Moodle, rather than leaving it as read-only. I know the plan was to do this through the Portfolio API, but that really is limited to a user storage function and not a file management function. I think having an API defined to allow for full file management would be a better solution - even if not supported by all repositories. Mike Churchward 12:54, 6 March 2008 (CST)

Martin Dougiamas 12:54, 15 March 2008 (CDT) : Actually, Mike, the standard file system is NOT being retained at all. I've clarified this slightly in the docs. The course-centered structure for files is gone. All files ARE stored on disk but most of the info regarding their location and access is all governed by a table in the database (so the files could just as easily be in a remote database if you wished). Files used in multiple areas within Moodle will never be stored more than once in Moodle.

The idea of a read/write interface was how we started, but it's insanely complex once you try to handle backups (ie how do you make a complete backup suitable for giving to someone else?) and access control (how do you decide access depending on Moodle contexts), and we'll never do it as well as the original repository does it. Why should we duplicate the Hive interface (to use an example) in Moodle when Hive has a perfectly good one already, one that handles all the extra stuff like Copyright controls, workflow and so on?

That said, the current Repository plugin idea is very simple, so that if anyone really wanted to embed management into the "file picker" interface they totally could do it in the plugin. ---