Improved support for external File content

Jump to: navigation, search

Note: This page is a work-in-progress. Feedback and suggested improvements are welcome. Please join the discussion on moodle.org or use the page comments.

File synching
Project state Planning
Tracker issue MDL-28666
Discussion TODO
Assignee Dongsheng Cai

Moodle 2.3


Problems with Files in 2.0

  • It is not currently easy to use a file in multiple places throughout Moodle and update them all at once
  • It is not currently easy to create a simple shared "course repository" for teachers to use


Example use cases that will become possible

A teacher wants to upload a file once and use it in multiple courses. When they update the file, it should be updated in all their courses automatically.

  1. The teacher uploads the file to their private file area (or other repository).
  2. In each place they want to add the file, they use the file picker to select the file from their private files area (or other repo) and select "link to latest version".
  3. Replacing the file will automatically mean the linked copies use it.
  4. If teacher picked a file which is an alias already, moodle will copy the alias, not create an alias to alias.

Several teachers want to create a shared repository of files together

  1. One teacher adds a course repository block to the course
  2. Using the block, the teacher creates an instance of a "?filesystem? repository" inside the course (essentially a folder).
  3. The content of the repository can be edited via the "Course repositories" block.
  4. All of the teachers in that course can now see that appear in their filepickers and select files
  5. Files can be "linked" as above

A student wants to submit a linked file as an assignment, so they can continue updating it after the assignment due date.

  1. This will not be possible, because the assignment will not allow linking to files.
  2. The student is forced to upload a copy of a file and this is protected by the assignment module.

Solution summary

The basic idea is to allow the CONTENT of files to be stored outside of the filepool, while all the metadata and access is controlled by Moodle exactly the same as it is now.

  1. Extend Files API with a new "reference" concept, with all files now having a "repositoryid" (the repository instance that the file came from) and a "reference" (the address in that repository of the content of the file).
  2. All files copied into Moodle don't have record in `files_reference` table, but others can specify a UUID (usually a URL with file-specific tokens in it, created by the original user who placed the file) in reference and repostiroyid columns.
  3. Improve the filesystem repository plugin to make it easier to create folder repositories on the fly, via a block.
  4. Add support to the filepicker UI to add "linking" to more repositories that support it, including the server files and filesystem repository.
  5. Virtual files are served via pluginfile.php URLs just like normal files:
    1. pluginfile.php uses module callback to determine access (as now), and if it passes then
    2. pluginfile.php calls a file logging subsystem to log the fact that this file is being served (useful for copyright reporting, for example)
    3. If the file is not 'local', then pluginfile.php uses the relevant repository callback to get the content of the file and streams it to the user with appropriate mimetypes etc
    4. The repositories have a way to cache this content in the normal Moodle filepool, to avoid repeated downloads.
    5. If an external repository is down or not configured, then the repository plugin can choose to just serve the local cached version (useful for restored backups and disaster tolerance)
  6. When a file is linked to another file in the local location, then it has a location of "filepool". We need to pay attention to these during:
    1. Garbage cleanups
    2. Update local caches

Note that the original URL of files in external repositories are never revealed to users.

Details

File picker walk-through

File picker using repository reference SD.png

  1. User clicks the "Insert image" button in TinyMCE, then launches our File Picker in the dialog
  2. User chooses a repository which supports UUID direct file references (eg Equella, Alfresco etc)
  3. A. File picker will ask repository plugin for customised repository UI if supported
  4. B. File Picker use repository UI and <object> type to display customised UI in file picker right pane
  5. User selects a file from the repository, if is customised UI, file picker Javascript API will be used to trigger file selection event (with all file related information), the repository plugin gives you two choices in the file picker interface 2011-11-09+15.23.png File+picker+options.png
    • C. Use the current version: this will copy the file to Moodle (as now). No need to reference external content.
    • C. Use the latest version: this will use a file reference, so that the most recent version is always pulled from the repository
  6. D. If "Use the latest version" selected, Repository API will create a file reference, if selected file is already an alias, repository API will copy this alias, not create an alias to alias, repository plugin can cache external files in moodle local directory, but this is optional
  7. E. Repository API ask File API to create a file with file parameters and reference. File API stores the reference and repository instance id in `mdl_files_reference` table and returns the file URL which looks the same as any other Moodle file URL.
  8. File picker gives the file URL to TinyMCE
  9. When TinyMCE displays the resource, it will cause the browser to call the file URL, which contains pluginfile.php.
  10. Pluginfile.php uses File API to send file contents to browsers, if File API detects the requested file is not ordinary moodle files which is located at external repository, File API will ask Repository API for file content, Repository API will firstly look for the cached file, if file is too old or not found (removed by cron checking), Repository API will fetch the resource (and cache it), then return to File API. Alternatively, Repository API could disable caching, asking for fresh content all the time.

File request walk-through

File Request SD.png

  1. A. User request a file
  2. B. File API detects if the request file is regular moodle files or located at external repository, if it's external files, File API will ask repository API to grab the file
  3. C. File API collects the file reference information from database, it could be stored in php serialised or JSON format
  4. D. File API passes raw file reference information to repository plugin
  5. E. Repository plugin will firstly check if file available locally
  6. F. If not repository plugin will use file reference information to grab the file
  7. G. Repository API returns file content to File API to serve the file

Database changes

We can create another table for left joining, this requires File API query this table when locating files:

  • New file table `mdl_files_reference`
    • `id` primary key
    • `fileid` foreign key of `mdl_files`.`id`
    • `repositoryid`
    • `reference` - can be URL, UUID or other data format, repository plugin callbacks know the meaning of this field. File reference should be cached when adding to moodle, contenthash should be accurate.
  • `mdl_files_log` table for files access log, File API should have a new function to insert records to this table
    • `id` primary key
    • `userid` (0 if it's guest),
    • `timeaccess`,
    • `fileid`

File API changes

  • pluginfile.php and draftfile.php (it's actually send_stored_file()): when file isn't local file (`repositoryid` isn't zero), stored_file instance should ask repository plugin‘s send the file contents, Repository API decide return the cached copy or fresh contents
  • Preserve file reference information when call file_storage::create_file_from_storedfile
  • file_storage::get_area_files should retrieve file reference information, also file_storage::get_file_by_id, file_storage::get_file_by_hash
  • When call file_prepare_draft_area(), it should keep files' reference field, we also need to the original file information when creating draft file copies, it will be used by filemanager element to look up references to original file, to make this happen, we have to create a temp file only known by filemanager: mdl_files_draft_info, it has following fields:
    • id
    • draftfileid
    • name
    • value

So when filemanager wants to decide whether or not the draft file's original file has references, so firstly, filemanager looks up this table by providing draftfileid and name="originalfileparams", the value field will be the stored_file params, the filemanager will be able to ask Files API to provide all references. In this way, we don't have to inject Files API to know more information, it's only the duty of filemanager to get the original file.

  • file_save_draft_files should preserve file reference information
  • new method: file_storage::create_file_from_reference
public function create_file_from_reference($file_record, $repoisitoryid, $reference, array $options = NULL)
  • Cron: we probably need a Repository API function to cache/update external files
  • stored_file class
    • set_author()
    • set_license()
    • replace_content_with(stored_file $storedfile)
    • rename($filepath, $filename): rename files
    • delete_references(): delete all reference information
    • get_reference_details($ref): Get human readable reference information

Repository API changes

  • Additions to FILE_INTERNAL and FILE_EXTERNAL, we need another type FILE_REFERENCE = 4 // 0100, repository plugin needs to declare what types of files are supported
  • When users ask to create a reference (instead of copying) in file picker, Repository API try to cache the file in filepool(special file area for repository), after file downloaded, repository API should ask Files API to create a virtual file in `mdl_files` table, the existing fields stay the same, extra file reference information is stored in `mdl_files_reference` table, link or other format that repository plugins know. File API return the stored_file object, repository API generate the moodle url for users just like other moodle files without revealing the internal reference
  • Making repository plugin upgrade and versioning possible, repository plugin may need to update `reference` in `mdl_files_reference` table if reference info changed. (we already have db/upgrade.php, make sure all new plugins have one)
  • Cron will use repository plugin callbacks to clean up cache files, repoistory::cron($repositoryid)
  • public function send_file($storedfile, $lifetime=86400 , $filter=0, $forcedownload=false, array $options = null): Serve repository files
  • repository::get_file_reference($str): Create the file reference
  • repository::get_file_by_reference($ref): Get an external file by providing reference, $ref is the file record in files_reference table, it has reference, lastsync and lifetime fields, repository plugin could decide what to do based on this information, the return value will be an object:
    • $fileinfo->handle, returns a file handler
    • $fileinfo->contenthash, this returns an existing moodle file
    • $fileinfo->content, returns file content
    • $fileinfo->filepath, returns a file path

Repository API will handle different types of return automatically.

  • repository::sync_individual_file(stored_file $file): Decide whether or not the stored_file instance should be synced
  • repository::get_reference_details($ref): Convert the reference info to human readable format

Repository plugins changes

  • Server files: store file parameters in `reference` field
  • Private files: same as server files plugin
  • Alfresco: Store UUID in `reference` field, but alfresco will change UUID once tomcat restart, may need other information to locate files
  • Flickr private: needs flickr secret and token and photo id to locate the files
  • File system: store file path in `reference` field
  • s3: store file path, s3 repository will use secret and token to fetch the file from s3 no matter files are public or not.
  • EQUELLA
    1. Admin installs EQUELLA and setup parameters for single sign on
    2. Teacher clicks EQUELLA instance, EQUELLA will return an URL of repository UI (plugin code)
    3. Teacher pick a file from EQUELLA, EQUELLA repository UI will revoke file picker JavaScript API to notify moodle download this resource (plugin code)
    4. Repository API stores file UUID and SSO userid, creates file reference in moodle file pool
    5. EQUELLA plugin implements method to download contents using stored UUID and SSO userid
    6. EQUELLA plugin implements method to update/invalidate cached resources (by cron)

Content caching

  • Create moodledata/repostory/cache directory
  • Generate hash based on file URL and request parameters (provided by repository plugin), not content hash because we cannot send content hash to external repository for file information
  • Cached files are stored using hash code as file name

class repository_cache

Returned stored_file instance or file path

  • store($url, $string_to_be_hashed)
  • get($string_to_be_hashed)

Filepicker Javascript API for customizing

  • File picker should be able to dislable file references by taking an option
  • Support <object> tag in filepicker container
  • Provide Javascript API to allow plugin communicate with filepicker
    • Notify file picker to download file
    • Notify file picker to pop up authentication page

File manager to handle virtual files

Not much trouble here, need to make sure draftfile.php can serve external resources, because all files managed by file manager is in draft area