As of ICA v2.8.0, Bench Workspaces utilize a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.
As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:
- Copy project data
- Delete project data
- Mount project data (CLI only)
- Unmount project data (CLI only)
When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.
WARNING: This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.
The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa.
The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.
WARNING: Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.
Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.
MacOS uses hidden metadata files beginning with ._ ,which are copied over and exposed during CLI copy to your project data. These can be safely deleted from your project.
Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.
❗️ Once a file is written it cannot be changed! You won't be able to update it in the project location because of the restrictions mentioned above.
Some examples of other actions or commands that will not work because of the above mentioned limitations:
- Save a jupyter notebook or R script on the /project location
- Using vi or another editor
- Add/remove a file from an existing zip file
- Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt
- Rename a file due to the existing association between ICA and AWS
A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.
Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver also the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may do caching.
To use a specific file in a jupyter notebook, you will need to use '~/project/filename'. Otherwise it won't work.
This functionality won't work for old workspaces unless you enable the permissions for that old workspace.