For the complete documentation index, see llms.txt. This page is also available as Markdown.

Data

The Data inventory provides access to the files and folders stored in the project or linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.

See also Non-indexed folders which are a special form of data storage optimised for fast processing.

File/Folder Naming

ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the AWS S3 documentation.)

Folders and files cannot be renamed after they have been created. To rename a folder, you will need to create a new folder with the desired name, move the contents from the original folder into the new one, and then delete the original folder. Please see the Moving Data section for more information.

Characters generally considered "safe"
  • Alphanumeric characters

    • 0-9

    • a-z

    • A-Z

  • Special characters

    • Exclamation point !

    • Hyphen -

    • Underscore _

    • Period .

    • Asterisk *

    • Single quote '

    • Open parenthesis (

    • Closed parenthesis )

Data Formats

See the list of supported Data Formats

Data Privacy

When adding data to ICA, prioritize data privacy. Whether you're using storage configurations like AWS S3 or performing ICA data uploads, careful management of data access needs to be considered. When setting up cloud storage, confirm that configuration settings prevent unauthorized access. Always verify that uploads are free of unintended data to avoid privacy breaches. For more detailed information, refer to the ICA Security and Compliance section.

Data Integrity

See Data Integrity


Viewing Data

On the Projects > your_project > Data page, you can view file information and preview files.

Folder view (structured) vs flat view (list)

You can switch between folder view and flat view with the icons at the left.

  • Folder view shows the navigation structure and only the files and folders in the current folder. Searches in folder view will be performed on the current folder and all subfolders of the current folder.

  • Flat view shows a list of all files and folders within the current project. When you perform searches in flat view, all data of your project will be considered.

Tree view shows the navigation structure and only the files and folders in the current folder. Searches in tree view will be performed on the current folder and all subfolders of the current folder.

List view shows all files and folders within the current project. When you perform searches in list view, all data of your project will be considered.

You cannot switch between folder and flat view when you are viewing search results. Clear the search with the clear search button, or the x in the search dialog first.

Files

To view file details click on the filename to see the file details.

  • Run input tags identifies the last 100 pipelines which used this file as input.

  • Run output tags identifies the pipeline which created the file.

  • Connector tags show if the file was added via browser upload or connector.

  • Clicking on a folder will open the folder itself, to see the folder details, use the folder details link at the top right of the screen.

To view file contents, select the checkbox at the beginning of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click the view tab to preview the file.

If your data is the result of an analysis, you can find the analysis which created it at Projects > your_project > Data > your_data > view > Data details tab > Source analysis. Clicking the link here will open the analysis.

Filtering

To add filters, select the funnel/filter symbol at the top right, next to the search field.

Filters are reset when you exit the current screen.

Sorting

To sort data, select the three vertical dots in the column header on which you want to sort and chose ascending or descending.

Sorting is retained when you exit the current screen.

Displaying Columns

To change which columns are displayed, select the three columns symbol and select which columns should be shown.

In externally-managed projects, you can see which files are externally controlled and which are ICA-managed by means of the “managed by” column.

In flat view, you can jump to the folder in which files are located with the "Path" column.

The displayed columns are retained when you exit the current screen.

When you share the data view by sharing the link from your browser, filters and sorting is retained in links, so the recipient will see the same data and order.

To see the ongoing actions (copying and moving) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list if it is not present yet. You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.

Folders

If you open a folder by clicking it, you can see the folder details link at the top right. This will open the details screen where you can consult the folder size and number of files in that folder, the owning project, ongoing actions and folder id. You can also download the folder and all contents here with the download button.

To help navigate between folders in flat view, you can use the "path' column in the data view which will open the folder containing the selected file in tree view. If you want to go further up the folder path, you can use the folder structure above the file view. If the path is not visible, you can add it with the three-columns symbol next to the filter symbol.

Searching for Data

To quickly find data, use the search dialog at the top right. Search is performed with automatic wildcards before and after the search text. You can use * as additional wildcard, for example b*n will match bunny as it is interpreted as *b*n*.

You can search on the file name, the path (/folder/subfolder) and tags.

Secondary Data

When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files.


Hyperlinking to Data

You can create hyperlinks to data to quickly share it with the following syntax:

Variable
Location

ServerURL

See browser address bar.

projectID

At YourProject > Details > URN > urn:ilmn:ica:project:ProjectID#MyProject

FolderID

At YourProject > Data > folder > folder details > ID

AnalysisID

At YourProject > Flow > Analyses > YourAnalysis > ID

Normal permission checks still apply with these links. If you try to follow a link to data to which you do not have access, you will be returned to the main project screen or login screen, depending on your permissions.


Exporting the Data List

You can export the list of data which you see in the overview as a CSV, JSON, or excel file.

  1. Select one or more files to export at Projects > your_project > Data.

  2. Select Export at the bottom of the screen.

  3. Choose between the following export options:

    • To export only the list of selected files, select the Selected rows as the Rows to export option. To export the list of all files on the page, select Current page.

    • To export only the columns which are currently shown in your view, select Visible columns as the Columns to export option, otherwise, choose All columns.

  4. Select the export format. (CSV/JSON/Excel)


Data Management

To prevent cost issues, you can not perform actions such as copying and moving data which would write data to the workspace when the project billing mode is set to tenant and the owning tenant of the folder is not the current user's tenant.

Downloading Data

Single files can be downloaded directly from within the UI.

  • Select the checkbox next to the file which you want to download, followed by Download > Browser Download > Download.

  • You can also download files from their details screen. Click on the file name and select Download at the bottom of the screen. Depending on the size of your file, it may take some time to load the file contents.

Schedule for Download

You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.

  1. Select a file or files to download.

  2. Select Download > Schedule download (for files or folders). This will display a list of all available connectors.

  3. Select a connector and optionally, enter your email address if you want to be notified of download completion, and then select Download.

If you do not have a connector, create one and install it. You must then return to the file selection in step 1 to use it.

You can view the progress of the download or abort the scheduled download on the Activity page for the project.

Uploading Data

Uploading data to the platform makes it available to analysis workflows and tools.

UI Upload

To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either

  • Drag a file from your system into the Choose a file or drag it here box.

  • Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.

Your files are added to the Data page with status partial during upload and become available when upload completes.

Do not close the ICA tab in your browser while data uploads.

Uploads via the UI are limited to 5TB and no more than 100 concurrent files at a time, but for practical and performance reasons, it is recommended to use the CLI or Service connector when uploading large amounts of data.

Upload Data via CLI

For instructions on uploading/downloading data via CLI, see CLI Data Transfer.


Copying Data

You can copy data from your project to a different folder within the same project or you can copy data from another project to your current project, provided you have the necessary access rights.

You can copy data from a subfolder to a higher-level folder to move data up one or more levels (folder/destination/source). You can not copy data from the source folder onto itself or onto a subfolder of the source folder as this would result in a loop.

Copying large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Required Rights

The person copying the data must have the following rights:

Copy Data Rights
Source Project
Destination Project

Within a project

  • Contributor rights

  • Upload and Download rights

  • Contributor rights

  • Upload and Download rights

Between different projects

  • Download rights

  • Viewer rights

  • Upload rights

  • Contributor rights

Restrictions

The following restrictions apply when copying data:

Copy Data Restrictions
Source Project
Destination Project

Within a project

  • No linked data

  • No partial data

  • No archived data

  • No Linked data

Between different projects

  • Data sharing enabled

  • No partial data

  • No archived data

  • Within the same region

  • No linked data

  • Within the same region

Copying Data

  1. Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.

  2. Optionally, use the filters or search with the search box for the desired data.

  3. Select the data (individual files or folders with data) you want to copy.

  4. Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).

  5. Select which action to take if the data already exists (overwrite existing data, don't copy or keep both the original and the new copy by appending a version number to the copied data).

  6. Select Copy to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed when the copy process completes.

Replace

Overwrites the existing data. Folders will copy their data in an existing folder with existing files. Existing files will be replaced when a file with the same name is copied and new files will be added. The remaining files in the target folder will remain unchanged.

Don't copy

The original files are kept. If you selected a folder, files that do not yet exist in the destination folder are added to it. Files that already exist at the destination are not copied over and the originals are kept.

Keep both

Files have a number appended to them if they already exist. If you copy folders, the folders are merged, with new files added to the destination folder and original files kept. New files with the same name get copied over into the folder with a number appended.

There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.

Copy Status

  • INITIALIZED

  • WAITING_FOR_RESOURCES

  • RUNNING

  • STOPPED - When choosing to stop the batch job.

  • SUCCEEDED - All files and folders are copied.

  • PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.

  • FAILED - None of the files and folders could be copied.

To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.

Notes on copying data

  • Copying data comes with an additional storage cost as it will create a copy of the data.

  • Copying data from your own S3 storage requires additional configuration. See Connect AWS S3 Bucket and SSE-KMS Encryption..

  • On the command-line interface, the command to copy data is icav2 projectdata copy.

  • Before copy and move operations are executed on your own S3 storage, a test is performed to verify the necessary operational rights. This can result in temporary test files remaining (for example when IAM policy is not correctly set up for a versioned bucket). These files can safely be manually deleted from your S3 console.


Moving Data

You can move data within a project or between different projects to which you have access. If your browser allows notifications, a pop-up will appear when the move is completed.

  • Move From is used when you are in the destination location.

  • Move To is used when you are in the source location.

Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported.

Once the move has started, no other operation must be performed on the data being moved to avoid potential data loss or duplication. When modifying data at the source or destination during a move process, incomplete data transfers may occur with duplicate folders and files with different identifiers.. You can manually transfer any remaining data and delete duplicate files and folders afterward.

Move Synchronization issues

Changes to the date during move may cause the destination data to be unsynchronized between the object store (S3) and ICA. To address this, create a folder session on the destination directory's parent folder by using the following API steps: Create Folder Session and Complete Folder Session. Ensure that the move job is aborted before making the create and complete requests for the folder session.

Required Rights

There are a number of rights and restrictions related to data move as this will delete the data in the source location.

Move Data Rights
Source Project
Destination Project

Within a project

  • Contributor rights

  • Contributor rights

Between different projects

  • Download rights

  • Contributor rights

  • Upload rights

  • Viewer rights

Restrictions

Move Data Restrictions
Source Project
Destination Project

Within a project

  • No linked data

  • No partial data

  • No archived data

  • No Linked data

Between different projects

  • Data sharing enabled

  • Data owned by user's tenant

  • No linked data

  • No partial data

  • No archived data

  • No externally managed projects

  • Within the same region

  • No linked data

  • Within same region

Moving Data Constraints

  • 1000 Maximum Items: Up to 1000 items per move. Items include files and folders. Folders with subfolders and subfiles still count as one item.

  • Naming Conflicts: Cannot move to a destination with existing files/folders of the same name.

  • Linked Data Restrictions: Cannot move linked data move data to linked data.

  • Self Move: Folders cannot be moved to themselves.

  • In-Transit Data: Cannot move data that is being moved.

  • Region Restrictions: No cross-region moves allowed.

  • Project Constraints: No moves from externally-managed projects or externally-managed data.

  • Status Requirement: Data must be in status available.

  • Ownership: Data must be owned by the user's tenant for cross-project moves.

  • Destination Default: If no target folder is selected, data moves to the root folder of the target project.

Move Data From

Move Data From is used when you are in the destination location.

  1. Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.

  2. Select the files and folders which you want to move.

  3. Select the Move button.

Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Data To

Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.

  1. Navigate to Projects > your_project > Data > your_source_location.

  2. Select the files and folders which you want to move.

  3. Select to Projects > your_project > Data > your_source_location > Manage > Move To.

  4. Select your target project and location. You can create a new folder to move data to by filling in the "New folder name (optional)" field. This does NOT rename an existing folder. To rename a folder, you will need to create a new folder with the desired name, move the contents from the original folder into the new one, and then delete the original folder.

  5. Select the Move button.

Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Status

  • INITIALIZED

  • WAITING_FOR_RESOURCES

  • RUNNING

  • STOPPED - When choosing to stop the batch job.

  • SUCCEEDED - All files and folders are moved.

  • PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.

  • FAILED - None of the files and folders could be moved.

To see the ongoing actions on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.

If you are only able to select your source project as the target data project, this may indicate that data sharing (Projects > your_project > Project Settings > Details > Data Sharing) is not enabled for your project or that you do not have have upload rights in other projects.

Before copy and move operations are executed on your own S3 storage, a test is performed to verify the necessary operational rights. This can result in temporary test files remaining (for example when IAM policy is not correctly set up for a versioned bucket). These files can safely be manually deleted from your S3 console.


Deleting, Archiving and Unarchiving

To manually archive or delete files:

  1. Select the checkbox next to the file or files to delete or archive.

  2. Select Manage, and then select one of the following options:

    • Archive — Move the file or files to long-term storage (event code ICA_DATA_110).

    • Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).

    • Delete — Remove the file completely (event code ICA_DATA_106).

When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.

To archive or delete files programmatically, you can use ICA's API endpoints:

  1. GET the file's information.

  2. Modify the dates of the file to be deleted/archived.

  3. PUT the updated information back in ICA.

Python Example

The Python snippet below exemplifies the approach: it sets (or updates if set already) the time to be archived for a specific file:

To delete a file at specific timepoint, the key 'willBeDeletedAt' should be added or changed using the API call. If running in the terminal, a successful run will finish with the message ‘200’. In the ICA UI, you can check the details of the file to see the updated values for ‘Time To Be Archived’ (willBeArchivedAt) or ‘Time To Be Deleted’ (willBeDeletedAt), as shown in the screenshot.


Linking and Unlinking

Data linking creates a dynamic read-only view to the source data. You can use data linking to get access to data without running the risk of modifying the source material and to share data between projects. Linking ensures changes to the source data are immediately visible and no additional storage is required. You can recognise linked data by the green color and see the owning project as part of the details.

Since this is read-only access, you cannot perform actions such as deleting, adding, moving or (un)archiving on linked data as these actions require write access.

Linking data is only possible from the root folder of your destination project. The action is disabled in project subfolders.

Linking a parent folder after linking a file or subfolder will unlink the file or subfolder and link the parent folder. So root\linked_subfolder will become root\linked_parentfolder\linked_subfolder.

Initial linking can take considerable time when there is a large amount of source data. However, once the initial link is made, updates to the source data will be instantaneous. You can monitor the progress at Projects > your_project > activity > Batch Jobs.

Migrating snapshot linked data. (linked before ICA release v.2.29)

Before ICA version v.2.29, when data was linked, a snapshot was created of the file and folder structure. These links created a read-only view of the data as it was at the time of linking, but did not propagate changes to the file and folder structure. If you want to use the advantages of the new way of linking with dynamic updates, unlink the data and relink it. Since snapshot linking has been deprecated, all new data linking done in ICA v.2.29 or later has dynamic content updates.

Linking data from another project.

  1. Select Projects > your_project > Data > Manage, and then select Link.

  2. To view data by project, select the funnel symbol, and then select Owning Project. If you know to which project the data is linked to, you can choose to filter on linked projects.

  3. Select the checkbox next to the file or files to add.

  4. Select Link.

Your files will be added and visible in the Data page.

Display Owning Project

if you have selected multiple owning projects, you can add the owning project column to see which project owns the data.

  1. At the top of the screen, next to the filer icon, select the three columns.

  2. The Add/remove columns tab will appear.

  3. Choose Owning Project (or Linked Projects)

    Owning Project Filter

Linking Folders

If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen. From here you can see more details such as how many files have already been linked, by clicking the batch job.

Unlinking Project Data

To unlink the data, go to the root level of your project and select the linked folder or, if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. The progress can be monitored at Projects > your_project > Activity > Batch Jobs.

Last updated

Was this helpful?