Talk:Repository API

Revision as of 06:34, 22 June 2008 by Dongsheng Cai (talk | contribs) (ws.php)

Jump to: navigation, search

(Ideas will be deleted from the comments section as they are resolved or merged into the main spec)

Missing concept of trusted files

Files are not created equal, some of them are to be trusted, some can not be trusted at all. Web browsers trust everything received from the server, files from server may access cookie information and thus scripting technologies may allow them to do anything user can do. We do have to trust our teachers because they are supposed to create the learning content, but we definitely can not trust all students.

Imagine if students were allowed to upload arbitrary files to server, like html file loaded with javascript and the server would happily serve them to all Moodle users. Our solution is to use html cleaning filters for submitted texts and force downloads of student uploaded files. To do this we must know if we trust the files or not. Unfortunately the forced downloads of student uploaded files and cleaning of html texts does not solve all problems, because bugs in browsers and especially browser plug-ins may sometimes be used to work around our protections.

The best solution would be to use separate web addresses for trusted and not trusted files (two wwwroots in config.php), not all sites may afford two different addresses but we should be imo prepared for this. Petr Škoda (škoďák) 16:42, 28 February 2008 (CST)

Great idea, yes.  In fact couldn't all the files be served via $CFG->fileroot all the time?  
Martin Dougiamas 07:08, 29 February 2008 (CST)

userid field could be used for this, but separate flag might be better in order to allow teachers to upload untrusted files (teacher uploads assignment submission for the student).

Relative file links

Flash, Java and SCORM require relative links and directory hierarchy in general - we must support it. Some SCORM packages load hundreds of files per page which means the file serving must be very fast with minimum of db access.

Reading the proposal above it seems the API is about serving of isolated files referenced by repository ids. HTML requires to use relative or absolute locators with file names, we can not use repository ids directly in relative links. In case of scorm we have absolute path to base of SCORM package of given activity and SCORM files use relative links inside the package. Solution could be to store relative paths directly in filename field ex:directory1/directory2/filename.ext.

Yes I agree, we definitely need to support (virtual) directories and slasharguments.
We could just match the file argument to a path in the db.  Perhaps add the fileid 
to the argument path as a primary key: fileid/directory1/directory2/filename.ext
Martin Dougiamas 07:08, 29 February 2008 (CST)

Virtual directories and files

Sometimes the content of files is generated on the fly (csv exports, etc.), there are many special files spread through codebase doing nearly the same, it should be imho possible to use the same file API for these.

Another virtual example is assignment submissions and webdav. I would like to see an option to browse the assignment submissions as directory structure, the top directory would be a list of names of users, inside html files with online assignment and uploaded files. This would allow us to implement simple zip&go or webdav based offline grading solutions. The problem here is that the content of this virtual submissions directly needs to be created on the fly based on user references, the proposed repository structure can not be used for this.

Very interesting idea! Martin Dougiamas 07:08, 29 February 2008 (CST)

Backup/restore relinking

We are supporting relinking inside courses only. Till now it was easy to guess if absolute link will work after restore on another server. There are several types of files:

  • course files - relinked during restore, works on any server if from the same course
  • module files - not relinked, urls are not permanent, link can not be copypasted (assignments, forum attachments, rss files, etc.)
  • user files - not relinked, links work only on original server (blog attachments, personal files, etc.)
I think all this will be simplified in the proposed system because everything is 
represented using the same system: a file with an ACL (backup can quickly 
determine all the files in one course, probably using one SQL query). 
Martin Dougiamas 07:08, 29 February 2008 (CST)

If we are backing up/restoring on the same server, will copies of a course restore without file duplication? I use this a lot so that I can archive one year's course whilst modifying it for next year. Matt Gibson 03:12, 29 April 2008 (CDT)

Access needed for both file and its instances

We need two types of access control - first who can create instances (link files), second who can access the instance (download the file).

For the first don't we already have those capabilities? (like mod/forum:createattachment,
moodle/course:managefiles) but they probably could use rationalising.   We'll need new 
ones per repository, too, of course.  
Martin Dougiamas 07:08, 29 February 2008 (CST)

Cache lifetime

There should be a way to specify cache filetime for each instance of file. For example 0 for uploaded assignments, 1 day for resource file. It might be better to allow modules to decide about this, at present it is hardcoded in file.php.

Great idea.  Martin Dougiamas 07:08, 29 February 2008 (CST)

Hierarchy in tables ?

It's possible that I don't undertand a key piece of the concept, but I wonder why there is no reference to any "parent id" in the file or file_instance table ? In other words, how the hierarchical structure is supposed to be "imported" from repository to Moodle ? If the hierarchy is reserved to the course context, and not to the repository context, it seems difficult to allow students to access to a complete directory, for example.

On the other hand, maybe it's only a question for the "local" repository type, and not to the "generic" repository API ? Allegre Guillaume 16:43, 5 March 2008 (CST)

Martin Dougiamas 12:58, 15 March 2008 (CDT): basically I was thinking we just store a full local path for each file instead of hierarchies. It wasn't in the db schema though: I've just added it. Thanks!

As for allowing access to a whole directory, that's something I've not thought about - thanks! I agree we need to support something like that in the interface. Hmm ..

Editing Repository Files / Version Control ?

How do you imagine to handle the "editing file" problem ? I can see several solutions :

  1. the simplest way : write access to a file really allows to edit (re-upload) the file, each instance being modified
  2. the "cheap copy" way : optionally, the modification is applied only to new file_instances (or those for which the teacher forces to). Here you have to handle two (then maybe more) revisions of this file.

This triggers the file revisions (or version control) question.

A related question is about versionned files : should it depend only upon the repository layer (for example, a plugin could implement a SVN "repository") ? Or should Moodle be aware of the file "revision number" ? Allegre Guillaume 17:00, 5 March 2008 (CST)


Martin Dougiamas 13:08, 15 March 2008 (CDT): I don't think we should start getting into such things, version and editing is the job of the dedicated external repository system and people should use that interface.

What we do have in Moodle is the file->updates field. This specifies when to get a fresh copy. It would be set when the user has specified they want Moodle to use the "latest version" of the file. And if the user specified a particular version in the repository interface then that is what gets copied (once) and the file->updates field is left as zero.

One scenario would be if someone wants to use version 1 in one course and version 2 in another, but since the version is in the URL to the original file (and thus a different remote path) they would be treated as two separate files in Moodle anyway.

Might be a small problem if they said "latest version, no update" in course 1 and then later on said "latest version, no update" in a second course. I guess that would re-download the latest version (which might have changed) and thus course 1 would have an unexpected update. We could solve this by alerting the user and giving them a choice, I suppose.

Replacing Moodle's File System

The API as specified still uses the standard Moodle file system, which has its limitations while providing a simple file access method. But, shouldn't we also be considering completely (or partially) replacing the Moodle file system with the repository system? To that end, the file interface would be the same to users, but where the files are and how they are accessed would be up to which repository was being used. In this system, the standard Moodle file system would just be one of the available repositories. Then the API could support more robust access controls if available, or very few (as Moodle does now). This could allow for per-directory / per-file privilege granting, common file areas so that courses could access the same copy of a file (rather than copying it into the Moodle file area), etc.

The API could also support multiple repositories and allow choosing a file as a link rather than copying it.

Of course, this would also mean allowing write access to the repository area from Moodle, rather than leaving it as read-only. I know the plan was to do this through the Portfolio API, but that really is limited to a user storage function and not a file management function. I think having an API defined to allow for full file management would be a better solution - even if not supported by all repositories. Mike Churchward 12:54, 6 March 2008 (CST)

Martin Dougiamas 12:54, 15 March 2008 (CDT) : Actually, Mike, the standard file system is NOT being retained at all. I've clarified this slightly in the docs. The course-centered structure for files is gone. All files ARE stored on disk but most of the info regarding their location and access is all governed by a table in the database (so the files could just as easily be in a remote database if you wished). Files used in multiple areas within Moodle will never be stored more than once in Moodle.

The idea of a read/write interface was how we started, but it's insanely complex once you try to handle backups (ie how do you make a complete backup suitable for giving to someone else?) and access control (how do you decide access depending on Moodle contexts), and we'll never do it as well as the original repository does it. Why should we duplicate the Hive interface (to use an example) in Moodle when Hive has a perfectly good one already, one that handles all the extra stuff like Copyright controls, workflow and so on?

That said, the current Repository plugin idea is very simple, so that if anyone really wanted to embed management into the "file picker" interface they totally could do it in the plugin. ---

Windows network drives

I've added this to the end of the list of repository types to be supported because its possibly the most useful one I can think of. Our school wants to make the transition to moodle, but getting the files on there is the biggest barrier and there is a lot of stuff sitting on our shared and personal folders which is time consuming to transfer. Integration is a big thing that management want to see, particularly the ability to bulk upload files. This may not be a repository as such because it may be that the best way is to copy the file from the shared folder into the repository rather than where it is so it can be moved/deleted accidentally, but easy access is vital in making the process of uploading a file possible. It would also make Moodle a killer app in that it could make our network drives available from home through a secure web interface.

I've linked to Guy Thomas' Windows Share Web Client block as it is a good start, but sadly only works on Linux. Matt Gibson 03:43, 29 April 2008 (CDT)


Fantastic idea, let's call it the SMB plugin  - Martin Dougiamas 21:25, 8 June 2008 (CDT)

Bulk operations

On a similar note, Don Hinkelman's new course format provides a great bulk-upload feature whereby a whole folder can be uploaded and added to the course as resources with the names of the files as titles, with just a couple of clicks. I know this is extending the repository idea a little, but to be honest, I think that one of the biggest weaknesses of Moodle's file handling right now is not being able to add an entire folder of files without manually going through each one and laboriously specifying location and title. To make any repository useful, I think three operations should be possible with one or two clicks:

  • Add a whole folder/group of files to the repository at once
  • The above operation, but additionally, have all of the folder/group added to a course and named automatically, all in one go.
  • Find a group of files which are already in the repository and add them to a course all at once, with automatic naming as above.

Not sure if this is best put here or in the Developemt: File_API bit, but here seemed slightly better. Matt Gibson 03:43, 29 April 2008 (CDT)

 This is probably not related to the file API at all.  I'd see it as being a new item on the resource menu (Add multiple resources ...) with some sort of shift-click scenario.  nice idea though.  Can you file it as a tracker item?    Martin Dougiamas 21:26, 8 June 2008 (CDT)

Bulk and a new resource type from repository

Our school has several campuses and a distance learning unit with well organized and big course resource trees. The repository module could be enormously useful for us.

I have looked at Moodle's IMS repository a bit, and like the way it is possible to put multiple resources into one access point. If the new repository could have some of the same possibilities, it would be marvellous. The drawback with the IMS is that it opens a new window with all the multiple resources. Instead, it would be OK to be able to share a whole lot of resources, or a whole "resource tree".

I also like the bulk function, to be able to parse a folder of say, html-files, and add it to the resource tree.

Further notes

"local repository name"

I think users will find the word local to be confusing. For the developer or administrator, local seems to refer to the Moodle server. But for the teacher or student, they will think that local refers to their own local files system. I would suggest just using the domain name for that server (like myMoodleSite.org) and list it as the first entry to avoid this confusion. Gary Anderson 02:40, 10 June 2008 (CDT)

Credentials to third party systems

You might want to consider something like oauth - http://oauth.net - to manage credentials of third party sites, rather than asking the user for them. I know I sure wouldn't want to input my credentials for one system into another! Nigel McNie 07:08, 17 June 2008 (CDT)

function is_logged()

I created a prototype of file picker, the repository I tested is box.net, in this cast, we need a function to check if users are authenticated by box.net, if this function return false, print_login function will be called, it print a link to redirect to box.net login page, after the user logged in, box.net will redirect to file picker page, the authenticated user obtain a auth_token_key to access resource.

The other web service like google docs and facebook have similar authentication process.

See: http://enabled.box.net/docs/rest#authentication

- Dongsheng

ws.php

Another problem is users have to wait for page loading (file picker page) until moodle have got the response from remote web service (limited by cURL's synchronous requests), this may take a long time or the remote web service won't reply at all, users may consider something goes wrong, then they may refresh the file picker.

Maybe we should have another page: repository/ws.php, which in charge of communicating with web services, it can be embedded in file picker page as iframe, or we can use Ajax to invoke this page, the main page -- repository/picker.php will display a loading picture or a progress bar during communicating, this is more user friendly.

For example, when users try to obtain a file list in box.net, a iframe of ws.php?action=get_list will be inserted in picker, $repository_box->get_listing('/') will be invoked in ws.php, a loading picture will be displayed in picker page, when remote web service returned, the picker page remove the loading picture, and display the result. Does this sound reasonable?

--Dongsheng Cai 01:34, 22 June 2008 (CDT)