1 of 100

Illumina Connected Analytics

Introduction

Get Started

About the Platform

Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications. When using the applications provided on the platform for diagnostic purposes, it is the responsibility of the user to determine regulatory requirements and to validate for intended use, as appropriate.

The platform is hosted in regions listed below.

The platform hosts a suite of RESTful HTTP-based application programming interfaces (APIs) to perform operations on data and analysis resources. A web application user-interface is hosted alongside the API to deliver an interactive visualization of the resources and enables additional functionality beyond automated analysis and data transfer. Storage and compute costs are presented via usage information in the account console, and a variety of compute resource options are specifiable for applications to fine tune efficiency.

Getting Started

Getting Help

Use the search bar on the top right to navigate through the help docs and find specific topics of interest.

If you have any questions, contact Illumina Technical Support by phone or email:

Illumina Technical Support | techsupport@illumina.com | 1-800-809-4566

For customers outside the United States, Illumina regional Technical Support contact information can be found at www.illumina.com/company/contact-us.html.

To see the current ICA version you are logged in to, click your username found on the top right of the screen and then select About.

Other Illumina Products

To view a list of the products to which you have access, select the 9 dots symbol at the top right of ICA. This will list your products. If you have multiple regional applications for the same product, the region of each is shown between brackets.

The More Tools category presents the following options

My Illumina Dashboard to monitor instruments, streamline purchases and keep track of upcoming activities.
Link to the Support Center for additional information and help.
Link to the order management from where you can keep track of your current and past orders.

Release Notes

Get Started

Software Registration

New users may reference the Illumina Connected Software Registration Guide for detailed guidance on setting up an account and registering a subscription.

Tenant Setup

The platform requires a provisioned tenant in the Illumina account management system with access to the Illumina Connected Analytics (ICA) application. Once a tenant has been provisioned, a tenant administrator will be assigned. The tenant administrator has permission to manage account access including add users, create workgroups, and add additional tenant administrators.

Each tenant is assigned a domain name used to login to the platform. The domain name is used in the login URL to navigate to the appropriate login page in a web browser. The login URL is https://<domain>.login.illumina.com, where <domain> is substituted with the domain name assigned to the tenant.

New user accounts can be created for a tenant by navigating to the domain login URL and following the links on the page to setup a new account with a valid email address. Once the account has been added to the domain, the tenant administrator may assign registered users to workgroups with permission to use the ICA application. Registered users may also be made workgroup administrators by tenant administrators or existing workgroup administrators.

For more details on identity and access management, please see the Illumina Connected Software help site.

API Keys

For security reasons, it is best practice to not use accounts with administrator level access to generate API keys and instead create a specific CLI user with basic permission. This will minimize the possible impact of compromised keys.

Generate an API Key

For long-lived credentials to the API, an API Key can be generated from the account console and used with the API and command-line interface. Each user is limited to 10 API Keys. API Keys are managed through the product dashboard after logging in through the domain login URL by navigating to the profile drop down and selecting "Manage API Keys".

Click the button to generate a new API Key. Provide a name for the API Key. Then choose to either include all workgroups or select the workgroups to be included. Selected workgroups will be accessible with the API Key.

Click to generate the API Key. The API Key is then presented (hidden) with a button to show the key to be copied and a link to download to a file to be stored securely for future reference. Once the window is closed, the key contents will not be accessible through the domain login page, so be sure to store it securely for future reference if needed.

After generating an API key, save the key somewhere secure to be referenced when using the command-line interface or APIs.

Access via Web UI

The web application provides a visual user interface (UI) for navigating resources in the platform, managing projects, and extended features beyond the API. To access the web application, navigate to the Illumina Connected Analytics portal.

On the left, you have the navigation bar which will auto-collapse on smaller screens. When collapsed, use the ≡ symbol to expand it.
The central part of the display is the item on which you are performing your actions.
At the top right, you have icons to refresh the screen for information, status updates, and access to the online help.

Access via the CLI

The command-line interface offers a developer-oriented experience for interacting with the APIs to manage resources and launch analysis workflows. Find instructions for using the command-line interface including download links for your operating system in the CLI documentation.

Access via the API

The HTTP-based application programming interfaces (APIs) are listed in the API Reference section of the documentation. The reference documentation provides the ability to call APIs from the browser page and shows detailed information about the API schemas. HTTP client tooling such as Postman or cURL can be used to make direct calls to the API outside of the browser.

When accessing the API using the API Reference page or through REST client tools, the Authorization header must be provided with the value set to Bearer <token> where <token> is replaced with a valid JSON Web Token (JWT). For generating a JWT, see JSON Web Token (JWT).

Object Identifiers

The object data models for resources that are created in the platform include a unique id field for identifying the resource. These fixed machine-readable IDs are used for accessing and modifying the resource through the API or CLI, even if the resource name changes.

JSON Web Token (JWT)

Accessing the platform APIs requires authorizing calls using JSON Web Tokens (JWT). A JWT is a standardized trusted claim containing authentication context. This is a primary security mechanism to protect against unauthorized cross-account data access.

A JWT is generated by providing user credentials (API Key or username/password) to the token creation endpoint. Token creation can be performed using the API directly or the CLI.

Home

Projects

Introduction

When looking at the main ICA navigation, you will see the following structure:

Projects are your primary work locations which contain your data and tools to execute your analyses. Projects can be considered as a binder for your work and information. You can have data contained within a project, or you can choose to make it shareable between projects.
Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.
Bundles are packages of assets such as sample data, pipelines, tools and templates which you can use as a curated data set. Bundles can be provided both by Illumina and other providers, and you can even create your own bundles. You will find the Illumina-provided pipelines in bundles.
Audit/Event Logs are used for audit purposes and issue resolving.
System Settings contain general information susch as the location of storage space, docker images and tool repositories.

Projects are the main dividers in ICA. They provide an access-controlled boundary for organizing and sharing resources created in the platform. The Projects view is used to manage projects within the current tenant.

Note that there is a combined limit of 30,000 projects and bundles per tenant.

Create new Project

To create a new project, click the Projects > + Create Project button.

Required fields include:

Name
- 1-255 characters
- Must begin with a letter
- Characters are limited to alphanumerics, hyphens, underscores, and spaces
Analysis Priority (Low/Medium(default)/High) This is balanced per tenant with high priority analyses started first and the system progressing to the next lower priority once all higher priority analyses are running. Balance your priorities so that lower priority projects do not remain waiting for resources indefinitely.
Project Owner Owner (and usually contact person) of the project. The project owner has the same rights as a project administrator, but can not be removed from a project without first assigning another project owner. This can be done by the current project owner, the tenant administrator or a project administrator of the current project. Reassignment is done at Projects > your_project > Project Settings > Team > Edit.
Project Location Select your project location. Options available are based on Entitlement(s) associated with purchased subscription.
Storage Bundle (auto-selected based on user selection of Project Location)

Click the Save button to finish creating the project. The project will be visible from the Projects view.

Create with Storage Configuration

During project creation, select the I want to manage my own storage checkbox to use a Storage Configuration as the data provider for the project.

With a storage configuration set, a project will have a 2-way sync with the external cloud storage provider: any data added directly to the external storage will be sync'ed into the ICA project data, and any data added to the project will be sync'ed into the external cloud storage.

Managing Projects

Several tools are available to assist you with keeping an overview of your projects. These filters work in both list and tile view and persist across sessions.

Searching is a case-insensitive wildcard filter. Any project which contains the characters will be shown. Use * as wildcard in searches. Be aware that operators without search words are blocked and will result in Unexpected error occurred when searching for projects. You can use the brackets, AND, OR and NOT operators, provided that you do not start the search with them (Monkey AND Banana is allowed, AND Aardvark by itself is invalid syntax)
Filter by Workgroup : Projects in ICA can be accessible for different workgroups. This drop-down list allows you to filter projects for specific workgroups. To reset the filter so it displays projects from all your workgroups, use the x on the right which appears when a workgroup is selected.
Hidden projects : You can hide projects (Projects > your_project > Details > Hide) which you no longer use. Hiding will delete data in base and bench and will thus be irreversible.
- You can still see hidden projects if you select this option and delete the data they contain at Projects > your_project > Data to save on storage costs.
- If you are using your own S3 bucket, your S3 storage will be unlinked from the project, but the data will remain in your S3 storage. Your S3 storage can then be used for other projects.
Favorites : By clicking on the star next to the project name in the tile view, you set a project as favourite. You can have multiple favourites and use the Favourites checkbox to only show those favourites. This prevents having too many projects visible.
Tile view shows a grid of projects. This view is best suited if you only have a few projects or have filtered them out by creating favourites. A single click will open the project.
List view shows a list of projects. This view allows you to add additional filters on name, description, location, user role, tenant, size and analyses. A double-click is required to open the project.

If you are missing Projects

Externally-managed projects

Illumina software applications which do their own data management on ICA (such as BSSH) store their resources and data in a project much in the same was as manually created projects work in ICA. For ICA, these projects are considered to be externally-managed projects and from ICA, there are a number of restrictions on which actions are allowed on externally-managed projects. For example, you can not delete or move externally-managed data. This is to prevent inconsistencies when these applications want to access their own project data.

When you create a folder with a name which already exists as externally-managed folder, your project will have that folder twice. Once ICA-managed and once externally-managed as S3 does not require unique folder names.

You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column, visible in the data list view of externally-managed projects at Projects > your_project > Data.

Projects are indicated as externally-managed in the projects overview screen by a project card with a light grey accent and a lock symbol followed by "managed by app".

Tutorial

Bundles

Bundles are curated data sets which combine assets such as pipelines, tools, and Base query templates. This is where you will find packaged assets such as Illumina-provided pipelines and sample data. You can create, share and use bundles in projects of your own tenant as well as projects in other tenants.

There is a combined limit of 30 000 projects and bundles per tenant.

The following ICA assets can be included in bundles:

Data (link / unlink)
Samples (link / unlink)
Reference Data (add / delete)
Pipelines (link/unlink)
Tools and Tool images (link/unlink)
Base tables (read-only) (link/unlink)
Base query templates
Bench docker images

The main Bundles screen has two tabs: My Bundles and Entitled Bundles. The My Bundles tab shows all the bundles that you are a member of. This tab is where most of your interactions with bundles occur. The Entitled Bundles tab shows the bundles that have been specially created by Illumina or other organizations and shared with you to use in your projects. See Access and Use an Entitled Bundle.

Some bundles come with additional restrictions such as disabling bench access or internet access when running pipelines to protect the data contained in them. When you link these bundles, the restrictions will be enforced on your project. Unlinking the bundle will not remove the restrictions.

You can not link bundles which come with additional restrictions to externally managed projects.

As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.

If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.

Linking an Existing Bundle to a Project

From the main navigation page, select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the + button, under Linked bundles.
Click on the desired bundle, then click the +Link Bundles button.
Click Save.

The assets included in the bundle will now be available in the respective pages within the Project (e.g. Data and Pipelines pages). Any updates to the assets will be automatically available in the destination project.

To unlink a bundle from a project,

Select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the (-) button, next to the linked bundle you wish to remove.

Bundles and projects have to be in the same region in order to be linked. Otherwise, the error The bundle is in a different region than the project so it's not eligible for linking will be displayed.

You can only link bundles to a project if that project belongs to a tenant who has access to the bundle. You do not carry your access to a bundle over if you are invited to projects of other tenants.

You can not unlink bundles which were linked by external applications

Create a New Bundle

To create a new bundle and configure its settings, do as follows.

From the main navigation, select Projects > your_project > Bundles.
Select + Create .
Enter a unique name for the bundle.
From the Region drop-down list, select where the assets for this bundle should be stored.
[Optional] Configure the following settings.
- Categories—Select an existing category or enter a new one.
- Status—Set the status of the bundle. When the status of a bundle changes, it cannot be reverted to a draft or released state.
  - Draft—The bundle can be edited.
  - Released—The bundle is released. Technically, you can still edit bundle information and add assets to the bundle, but should refrain from doing so.
  - Deprecated—The bundle is no longer intended for use. By default, deprecated bundles are hidden on the main Bundles screen (unless non-deprecated versions of the bundle exist). Select "Show deprecated bundles" to show all deprecated bundles. Bundles can not be recovered from deprecated status.
- Short Description—Enter a description for the bundle.
- Metadata Model—Select a metadata model to apply to the bundle.
Enter a release version for the bundle and optionally enter a description for the version.
[Optional] Links can be added with a display name (max 100 chars) and URL (max 2048 chars).
- Homepage
- License
- Links
- Publications
[Optional] Enter any information you would like to distribute with the bundle in the Documentation section.
Select Save.

There is no option to delete bundles, they must be deprecated instead.

Edit an Existing Bundle

To make changes to a bundle:

From the main navigation, select Bundles.
Select a bundle.
Select Edit.
Modify the bundle information and documentation as needed.
Select Save.

When the changes are saved, they also become available in all projects that have this bundle linked.

Adding Assets to a Bundle

To make changes to a bundle:

Select a bundle.
On the left-hand side, select the type of asset under Flow (such as pipeline or tool) you want to add to the bundle.
Depending on the asset type, select add or link to bundle.
Select the assets and confirm.

Assets must meet the following requirements before they can be added to a bundle:

For Samples and Data, the project the asset belongs to must have data sharing enabled.
The region of the project containing the asset must match the region of the bundle.
You must have permission to access the project containing the asset.
Pipelines and tools need to be in released status.
Samples must be available in a complete state.

When you link folders to a bundle, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Bundles > your_bundle > activity > Batch Jobs screen. To see more details and the progress, double-click the batch job and then double-click the individual item. This will show how many individual files are already linked.

You can not add the same asset twice to a bundle. Once added, the asset will no longer appear in the selection list.

Which batch jobs are visible as activity depends on the user role.

Create a New Bundle Version

When creating a new bundle version, you can only add assets to the bundle. You cannot remove existing assets from a bundle when creating a new version. If you need to remove assets from a bundle, it is recommended that you create a new bundle. All users wich currently have access to a bundle will automatically have access to the new version as well.

From the main navigation, select Bundles.
Select a bundle.
Select + Create new Version.
Make updates as needed and update the version number.
Select Save.

When you create a new version of a bundle, it will replace the old version in your list. To see the old version, open your new bundle and look at Bundles > your_bundle > Details > Versioning. There you can open the previous version which is contained in your new version.

Assets such as data which were added in a previous version of your bundle will be marked in green, while new content will be black.

Add Terms of Use to a Bundle

From the main navigation, Select Bundles > your_bundle > Bundle Settings > Legal.
To add Terms of Use to a Bundle, do as follows:
- Select + Create New Version.
- Use the WYSIWYG editor to define Terms of Use for the selected bundle.
- Click Save.
- [Optional] Require acceptance by clicking the checkbox next to Acceptance required. Acceptance required will prompt a user to accept the Terms of Use before being able to use a bundle or add the bundle to a project.
To edit the Terms of Use, repeat Steps 1-3 and use a unique version name. If you select acceptance required, you can choose to keep the acceptance status as is or require users to reaccept the terms of use. When reacceptance is required, users need to reaccept the terms in order continue using this bundle in their pipelines. This is indicated when they want to enter projects which use this bundle.

Collaborating on a Bundle

If you want to collaborate with other people on creating a bundle and managing the assets in the bundle, you can add users to your bundle and set their permissions. You use this to create a bundle together, not to use the bundle in your projects.

From the main navigation, select Bundles > your_bundle > Bundle Settings > Team.
To invite a user to collaborate on the bundle, do as follows.
- To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.
- To add a user by their email address, select By email and enter their email address.
- To add all the users of an entire workgroup, select Add workgroup and select a workgroup from the drop-down list.
- Select the Bundle Role drop-down list and choose a role for the user or workgroup. This role defines the ability of the user or workgroup to view or edit bundle settings.
  - Viewer: view content without editing rights.
  - Contributor: view bundle content and link/unlink assets.
  - Administrator: full edit rights of content and configuration.
- Repeat as needed to add more users.
Users are not officially added to the bundle until they accept the invitation.
To change the permissions role for a user, select the Bundle Role drop-down list for the user and select a new role.
To revoke bundle permissions from a user, select the trash icon for the user.
Select Save Changes.

Once you have finalized your bundle and added all assets and legal requirements, you can share your bundle with other tenants to use it in their projects.

Your bundle must be in released status to prevent it from being updated while it is shared.

Go to Bundles > your_bundle > Edit > Details > Bundle status and set it to Released.
Save the change.

Once the bundle is released, you can share it. Invitations are sent to an individual email address, however access is granted and extended to all users and all workgroups inside that tenant.

Go to Bundles > your_bundle > Bundle Settings > Share.
Click Invite and enter the email address of the person you want to share the bundle with. They will receive an email from which they can accept or reject the invitation to use the bundle. The invitation will show the bundle name, description and owner. The link in the invite can only be used once.

Do not to create duplicate entries. You can only use one user/tenant combination per bundle.

You can follow up on the status of the invitation on the Bundles > your_bundle > Bundle Settings > Share page.

If they reject the bundle, the rejection date will be shown. To re-invite that person again later on, select their email address in the list and choose Remove. You can then create a new invitation. If you do not remove the old entry before sending a new invitation, they will be unable to accept and get an error message stating that the user and bundle combination must be unique. They can also not re-use an invitation once it has been accepted or declined.
If they accept the bundle, the acceptance date will be shown. They will in turn see the bundle under Bundles > Entitled bundles. To remove access, select their email address in the list and choose Remove.

Entitled Bundles

Entitled bundles are bundles created by Illumina or third parties for you to use in your projects. Entitled bundles can already be part of your tenant when it is part of your subscription. You can see your entitled bundles at Bundles > Entitled Bundles.

To use your shared entitled bundle, add the bundle to your project via Project Linking. Content shared via entitled bundles is read-only, so you cannot add or modify the contents of an entitled bundle. If you lose access to an entitled bundle previously shared with you, the bundle is unlinked and you will no longer be able to access its contents.

Event Log

The event log shows an overview of system events with options to search and filter. For every entry, it lists the following:

Event date and time
Category (error, warn or info)
Code
Description
Tenant

Up to 200,000 results will be be returned. If your desired records are outside the range of the returned records, please refine the filters or use the search function at the top right.

Export is restricted to the amount of entries shown per page. You can use the selector at the bottom to set this to up to 1000 entries per page.

Metadata Models

Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.

Each tenant has one root metadata model that is accessible to all projects in the tenant. This allows an organization to collect the same piece of information for every sample in every project in the tenant, such as an ID number. Within this root model, you can configure multiple metadata submodels, even at different levels.

Illumina recommends that you limit the amount of fields or field groups you add to the root model. If there are any misconfigured items in the root model, it will carry over into all other metadata models in the tenant. Once a root model is published, the fields and groups that are defined within it cannot be deleted. You should first consider creating submodels before adding anything to the root model. When configuring a project, you have the option to assign one published metadata model for all samples in the project. This metadata model can be the root model, a submodel of the root model, or a submodel of a submodel. It can be any published metadata model in the tenant. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

❗️ Illumina recommends that you limit the amount of fields or field groups you add to the root model. You should first consider creating submodels before adding anything to the root model.

Metadata concepts

The following terminology is used within this page:

Metadata fields = Metadata fields will be linked to a sample in the context of a project. They can be of various types and could contain single or multiple values.
Metadata groups = You can identify that a few fields belong together (for example, they all are related to quality metrics). That would be the moment to create a group so that the user knows these fields belong together
Root model = Model that is linked to the tenant. Every metadata model that you link to a project will also contain the fields and groups specified in this model as this is a parent model for all other models. This is a subcategory of a project metadata model
Child/Sub model = Any metadata model that is not the root model. Child models will inherit all fields and groups from their parent models. This is a subcategory of a project metadata model
Pipeline model = Model that is linked to a specific pipeline and not a project

Metadata in the context of ICA will always give information about a sample. It can be provided by the user, the pipeline and via the API. There are 2 general categories of metadata models: Project Metadata Model and Pipeline Metadata Model. Both models are built from metadata fields and groups. The project metadata model is specific per tenant, while the pipeline metadata model is linked to a pipeline and can be shared across tenants. These models are defined by users.

Each sample can have multiple metadata models. Whenever you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will also be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with sample and the pipeline contained a metadata model, the groups and fields will be present as well for each analysis that comes out of a pipeline execution.

Groups & fields

The following field types are used within ICA:

Text: Free text
Keyword: Automatically complete value based on already used values
Numeric: Only numbers
Boolean: True or false, cannot be multiple value
Date: e.g. 23/02/2022
Date time: e.g. 23/02/2022 11:43:53, saved in UTC
Enumeration: select value out of drop-down list

The following properties can be selected for groups & fields:

Required: Pipeline can’t be started with this sample until the required group/field is filled in
Sensitive: Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields won't be visible
Filled by pipeline: Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for groups
Multiple value: This group/field can consist out of multiple (grouped) values

❗️ Fields cannot be both required and filled by pipeline

Project vs. Pipeline Metadata Models

Project metadata model has metadata linked to a specific project. Values are known upront, general information is required for each sample of a specific project, and it may include general mandatory company information.

Pipeline metadata model has metadata linked to a specific pipeline. Values are populated during the pipeline execution and it requires an output file with the name 'metadata.response.json'.

❗️ Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled

Metadata Actions

Publish a Metadata Model

Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. When a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first.

Retire a Metadata Model

If a published metadata model is no longer needed, you can retire the model (except the root model).

First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.
When you are certain you want to retire a model and all submodels are retired, click on the three dots in the top right of the model window, and then select Retire Metadata Model.

Assign a Metadata Model to a Project

To add metadata to your samples, you first need to assign a metadata model to your project.

Go to Projects > your_project > Project Settings > Details.
Select Edit.
From the Metadata Model drop-down list, select the metadata model you want to use for the project.
Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

Add Metadata to Samples Manually

To manually add metadata to samples in your project, do as follows.

Precondition is that you have a metadata model assigned to your project
Go to Projects > your_project > Samples > your_sample.
Double-click your sample to open the sample details.
Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline cannot start.
Select Save

Populating a Pipeline Metadata Model

To fill metadata by pipeline executions, a pipeline model must be created.

In the Illumina Connected Analytics main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.
Double-click on your pipeline to open the pipeline details.
Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.

❗️ The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.

Pushing Metadata Metrics to Base

Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchronize that data into your project's Base warehouse.

In the Illumina Connected Analytics main navigation, select Projects.
In your project menu select Schedule.
Select 'Add new', and then click on the Metadata Schedule option.
Type a name for your schedule, optionally add description, and select whether you would like the metadata source would be the current project or the entire tenant. It is also possible to select whether ICA references would be anonymized and if sensitive metadata fields would be included. As a reminder, values of sensitive metadata fields would not be visible to other users outside of the project.
Select Save.
Navigate to Tables under BASE menu in your project.
Two new table schemas should be added with your current metadata models.

Docker Repository

In order to create a Tool or Bench image, a Docker image is required to run the application in a containerized environment. Illumina Connected Analytics supports both public Docker images and private Docker images uploaded to ICA.

Importing a Public External Image (Tools)

Navigate to System Settings > Docker Repository.
Click Create > External image to add a new external image.
Add your full image URL in the Url field, e.g. docker.io/alpine:latest or registry.hub.docker.com/library/alpine:latest. Docker Name and Version will auto-populate. (Tip: do not add http:// or https:// in your URL)

Note: Do not use :latest when the repository has rate limiting enabled as this interferes with caching and incurs additional data transfer.

(Optional) Complete the Description field.
Click Save.
The newly added image will appear in your Docker Repository list.

Verification of the URL is performed during execution of a pipeline which depends on the Docker image, not during configuration.

External images are accessed from the external source whenever required and not stored in ICA. Therefore, it is important not to move or delete the external source. There is no status displayed on external Docker repositories in the overview as ICA cannot guarantee their availability. The use of :stable instead of :latest is recommended.

Importing a Private Image (Tools + Bench Images)

In order to use private images in your tool, you must first upload them as a TAR file.

Navigate to Projects > your_project .
Select your uploaded TAR file and click in the top menu on Manage > Change Format .
Navigate to System Settings > Docker Repository (outside of your project).
Click on Create > Image.
Click on the magnifying glass to find your uploaded TAR image file.
Select the appropriate region and if needed, filter on project from the drop-down menus to find your file.
Select that file.
The newly added image should appear in your Docker Repository list. Verify it is marked as Available under the Status column to ensure it is ready to be used in your tool or pipeline.

Copying Docker Images to other Regions

Navigate to System Settings > Docker Repository.
Either
- Select the required image(s) and go to Manage > Add Region.
- OR double-click on a required image, check the box matching the region you want to add, and select update.
In both cases, allow a few minutes for the image to become available in the new region (the status becomes available in table view).

To remove regions, go to Manage > Remove Region or unselect the regions from the Docker image detail view.

Downloading Docker Images

You can download your created Docker images at System Settings > Docker Images > your_Docker_image > Manage > Download.

In order to be able to download Docker images, the following requirements must be met:

The Docker image can not be from an entitled bundle.
Only self-created Docker images can be downloaded.
The Docker image must be an internal image and in status Available.
You can only select a single Docker image at a time for download.

File Size Considerations

Docker image size should be kept as small as practically possible. To this end, it is best practice to compress the image. After compressing and uploading the image, select your uploaded file and click Manage > Change Format in the top menu to change it to Docker format so ICA can recognize the file.

Tool Repository

A Tool is the definition of a containerized application with defined inputs, outputs, and execution environment details including compute resources required, environment variables, command line arguments, and more.

Create a Tool

Tools define the inputs, parameters, and outputs for the analysis. Tools are available for use in graphical CWL pipelines by any project in the account.

Select System Settings > Tool Repository > + Create.
Configure tool settings in the tool properties tabs. See Tool Properties.
Select Save.

Tool Properties

The following sections describe the tool properties that can be configured in each tab.

Refer to the CWL CommandLineTool Specification for further explanation about many of the properties described below. Not all features described in the specification are supported.

Information Tab

Field

Entry

Name

The name of the tool.

Documentation Tab

The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the Tool Repository but is excluded from exported CWL definitions.

General Tool Tab

The General Tool tab provides options to configure the basic command line.

Field

Entry

CWL identifier field

CWL version

The CWL version in use. This field cannot be changed.

Base command

Components of the command. Each argument must be added in a separate line.

Standard in

The name of the file that captures Standard In (STDIN) stream information.

Standard out

The name of the file that captures Standard Out (STDOUT) stream information.

Standard error

The name of the file that captures Standard Error (STDERR) stream information.

Requirements

The requirements for triggering an error message.

Hints

The requirements for triggering a warning message.

The Hints/Requirements include CWL features to indicate capabilities expected in the Tool's execution environment.

Inline Javascript
- The Tool contains a property with a JavaScript expression to resolve it's value.
Initial workdir
- The workdir can be any of the following types:
  - String or Expression — A string or JavaScript expression, eg, $(inputs.InputFASTA)
  - File or Dir — A map of one or more files or directories, in the following format: {type: array, items: [File, Directory]}
  - Dirent — A script in the working directory. The Entry name field specifies the file name.
Scatter feature — Indicates that the workflow platform must support the scatter and scatterMethod fields.

Tool Arguments Tab

The Tool Arguments tab provides options to configure base command parameters that do not require user input.

Tool arguments may be one of two types:

String or Expression — A literal string or JavaScript expression, eg --format=bam.
Binding — An argument constructed from the binding of an input parameter.

The following table describes the argument input fields.

Field

Entry

Type

Value

The literal string to be added to the base command.

String or expression

Position

The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.

Binding

Prefix

The string prefix.

Binding

Item separator

The separator that is used between array values.

Binding

Value from

The source string or JavaScript expression.

Binding

Separate

The setting to require the Prefix and Value from fields to be added as separate or combined arguments. Tru indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.

Binding

Shell quote

The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.

Binding

Example

Field

Value

Prefix

--output-filename

Value from

$(inputs.inputSAM.nameroot).bam

Input file

/tmp/storage/SRR45678_sorted.sam

Output file

SRR45678_sorted.bam

Tool Inputs Tab

The Tool Inputs tab provides options to define the input files and directories for the tool. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.

Field

Entry

The file ID.

Label

A short description of the input.

Description

A long description of the input.

Type

The input type, which can be either a file or a directory.

Input options

Checkboxes to add the following options. Optional indicates the input is optional. Multi value indicates there is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking.

Secondary files

The required secondary files or directories.

Format

The input file format.

Position

The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.

Prefix

The string prefix.

Item separator

The separator that is used between array values.

Value from

The source string or JavaScript expression.

Load contents

The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.

Separate

Shell quote

The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.

Tool Settings Tab

The Tool Settings tab provides options to define parameters that can be set at the time of execution. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.

Field

Entry

The file ID.

Label

A short description of the input.

Description

A long description of the input.

Default Value

The default value to use if the tool setting is not available.

Type

The input type, which can be Boolean, Int, Long, Float, Double or String.

Input options

Checkboxes to add the following options. Optional indicates the input is optional. Multi value indicates there can be more than one value for the input.

Position

The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.

Prefix

The string prefix.

Item separator

The separator that is used between array values.

Value from

The source string or JavaScript expression.

Separate

Shell quote

The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.

Tool Outputs Tab

The Tool Outputs tab provides options to define the parameters of output files.

The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.

Field

Entry

The file ID.

Label

A short description of the input.

Description

A long description of the input.

Type

The input type, which can be either a file or a directory.

Output options

Checkboxes to add the following options. Optional indicates the input is optional. Multi value indicates here is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking.

Secondary files

The required secondary files or directories.

Format

The input file format.

Globs

The pattern for searching file names.

Load contents

Automatically loads some contents. The system extracts up to the first 64 KiB of text from the file. Populates the contents field with the first 64 KiB of text from the file.

Output eval

Evaluate an expression to generate the output value.

Tool CWL Tab

The Tool CWL tab displays the complete CWL code constructed from the values entered in the other tabs. the CWL code automatically updates when changes are made in the tool definition tabs, and any changes to the CWL code are reflected in the tool definition tabs.

❗️ Modifying data within the CWL editor can result in invalid code.

Edit a Tool

From the System Settings > Tool Repository page, select a tool.
Select Edit.

Update Tool Status

From the System Settings > Tool Repository page, select a tool.
Select the Information tab.
From the Status drop-down menu, select a status.
Select Save.

Import Tool

In addition to the interactive Tool builder, the platform GUI also supports working directly with the raw definition when developing a new Tool. This provides the ability to write the Tool definition manually or bring an existing Tool's definition to the platform.

A simple example CWL Tool definition is provided below.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: echo
inputs:
  message:
    type: string
    default: testMessage
    inputBinding:
      position: 1
outputs:
  echoout:
    type: stdout
baseCommand:
- echo

When creating a new Tool, navigate to System Settings > Tool Repository > your_tool > Tool CWL tab to show the raw CWL definition. Here a CWL CommandLineTool definition may be pasted into the editor. After pasting into the editor, the definition is parsed and the other tabs for visually editing the Tool will populate according to the definition contents.

Creating Your First Tool - Tips and Tricks

General Tool - includes your base command and various optional configurations.
- The base command is required for your tool to run, e.g. python /path/to/script.py such that python and /path/to/script.py are added in separate lines.
- Inline Javascript requirement - must be enabled if you are using Javascript anywhere in your tool definition.
- Initial workdir requirement - Dirent Type
  - Your tool must point to a script that executes your analysis. That script can either be provided in your Docker image or using a Dirent. Defining a script via Dirent allows you to dynamically modify your script without updating your Docker image. In order to define your Dirent script define your script name under Entry name (e.g. runner.sh) and the script content under Entry. Then, point your base command to that custom script, e.g. bash runner.sh.

❗ What's the difference between Settings and Arguments?
Settings are exposed at the pipeline level with the ability to get modified at launch, while Arguments are intended to be immutable and hidden from users launching the pipeline.

How to reference your tool inputs and settings throughout the tool definition?
- You can either reference your inputs using their position or ID.
  - Settings can be referenced using their defined IDs, e.g. $(inputs.InputSetting)
  - File/Directory inputs can be referenced using their defined IDs, followed by the desired field, e.g. $(inputs.InputFile.path). For additional information please refer to the File CWL documentation.
  - All inputs can also be referenced using their position, e.g. bash script.sh $1 $2

Storage

A storage configuration provides ICA with information to connect to an external cloud storage provider, such as AWS S3. The storage configuration validates that the information provided is correct, and then continuously monitors the integration.

Refer to the following pages for instructions to setup supported external cloud storage providers:

Credentials

The storage configuration requires credentials to connect to your storage. AWS uses the security credentials to authenticate and authorize your requests. On the System Settings > Credentials > Create, you can enter these credentials. Long-term access keys consist of a combination of the access key ID and secret access key as a set.

Fill out the following fields:

Type—The type of access credentials. This will usually be AWS user.
Name—Provide a name to easily identify your access key.
Access key ID—The access key you created.
Secret access key—Your related secret access key.

Create a Storage Configuration

In the ICA main navigation, select System Settings > Storage > Create.
Configure the following settings for the storage configuration.
- Type—Use the default value, eg, AWS_S3. Do not change.
- Region—Select the region where the bucket is located.
- Configuration name—You will use this name when creating volumes that reside in the bucket. The name length must be in between 3 and 63 characters.
- Description—Here you can provide a description for yourself or other users to identify this storage configuration.
- Bucket name—Enter the name of your S3 bucket.
- Key prefix [Optional]—You can provide a key prefix to allow only files inside the prefix to be accessible. The key prefix must end with "/".
- If a key prefix is specified, your projects will only have access to that folder and subfolders. For example, using the key prefix folder-1/ ensures that only the data from the folder-1 directory in your S3 bucket is synced with your ICA project. Using prefixes and distinct folders for each ICA project is the recommended configuration as it allows you to use the same S3 bucket for different projects.
- Using no key prefix results in syncing all data in your S3 bucket (starting from root level) with your ICA project. Your project will have access to your entire S3 bucket, which prevents that S3 bucket from being used for other ICA projects. Although possible, this configuration is not recommended.
- Secret—Select the credentials to associate with this storage configuration. These were created on the Credentials tab.
- Server Side Encryption [Optional]—If needed, you can enter the algorithm and key name for server-side encryption processes.
Select Save.

With the action Set as default for region, you select which storage will be used as default storage in a region for new projects of your tenant. Only one storage can be default at a time for a region, so selecting a new storage as default will unselect the previous default. If you do not want to have a default, you can select the default storage and the action will become Unset as default for region.

The System Settings > Credentials > Share action is used to make the storage available to everyone in your tenant. By default, storage is private per user so that you have complete control over the contents. Once you decide you want to share the storage, simply select it and use the Share action. Do take into account that once shared, you can not unshare the storage. Once your storage is used in a project, it can also no longer be deleted.

Filenames beginning with / are not allowed, so be careful when entering full path names. Otherwise the file will end up on S3 but not be visible in ICA. If this happens, access your S3 storage directly and copy the data to where it was intended. If you are using an Illumina-managed S3 storage, submit a support request to delete the erroneous data.

Storage Configuration Verification

Every 4 hours, ICA will verify the storage configuration and credentials to ensure availability. When an error is detected, ICA will attempt to reconnect once every 15 minutes. After 200 consecutively failed connection attempts (50 hours), ICA will stop trying to connect.

When you update your credentials, the storage configuration is automatically validated. In addition, you can manually trigger revalidation when ICA has stopped trying to connect by selecting the storage and then clicking Validate on the System Settings > Storage > Manage.

Supported Storage Classes

Connect AWS S3 Bucket

You can use your own S3 bucket with Illumina Connected Analytics (ICA) for data storage. This section describes how to configure your AWS account to allow ICA to connect to an S3 bucket.

These instructions utilize the AWS CLI. Follow the AWS CLI documentation for instructions to download and install.

The AWS S3 bucket must exist in the same AWS region as the ICA project. Refer to the table below for a mapping of ICA project regions to AWS regions:

ICA Project Region

AWS Region

Australia

ap-southeast-2

Canada

ca-central-1

Germany

eu-central-1

India

ap-south-1

Indonesia

ap-southeast-3

Israel

il-central-1

Japan

ap-northeast-1

Singapore

ap-southeast-1

South Korea*

ap-northeast-2

eu-west-2

United Arab Emirates

me-central-1

United States

us-east-1

(*) BSSH is not currently deployed on the South Korea instance, resulting in limited functionality in this region with regard to sequencer integration.

You can use unversioned, versioned and suspended buckets as own S3 storage. If you connect buckets with object versioning, the data in ICA will be automatically synced with the data in objectstore. When an object is deleted without specifying a particular version, a Delete marker is created on the objectstore to indicate that the object has been deleted. ICA will reflect the object state by deleting the record from the database. No further action on your side is needed to sync.

You can enable SSE using an Amazon S3-managed key (SSE-S3). Instructions for using KMS-managed (SSE-KMS) keys are found here.

Because of how Amazon S3 handles folders and does not send events for S3 folders, the following restrictions must be taken into account for ICA project data stored in S3.

When creating an empty folder in S3, it will not be visible in ICA.
When moving folders in S3, the original, but empty, folder will remain visible in ICA and must be manually deleted there.
When deleting a folder and its contents in S3, the empty folder will remain visible in ICA and must be manually deleted there.
Projects cannot be created with ./ as prefix since S3 does not allow uploading files with this key prefix.

When configuring a new project in ICA to use a preconfigured S3 bucket, create a folder on your S3 bucket in the AWS console. This folder will be connected to ICA as a prefix.

Failure to create a folder will result in the root folder of your S3 bucket being assigned which will block your S3 bucket from being used for other ICA projects with the error "Conflict while updating file/folder. Please try again later."

For Bring Your Own Storage buckets, all unversioned, versioned and suspended buckets are supported. If you connect buckets with object versioning, the data in ICA will be automatically synced with the data in objectstore.

For Bring Your Own Storage buckets with versioning enabled, when an object is deleted without specifying a particular version, a "Delete marker" is created on the objectstore to indicate that the object has been deleted. ICA will reflect the object state by deleting the record from the database. No further action on your side is needed to sync.

Configure Bucket CORS Permission

ICA requires cross-origin resource sharing (CORS) permissions to write to the S3 bucket for uploads via the browser. Refer to the Configuring cross-origin resource sharing (CORS) (expand the "Using the S3 console" section) documentation for instructions on enabling CORS via the AWS Management Console. Use the following configuration during the process:

In the cross-origin resource sharing (CORS) section, enter the following content.

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "HEAD",
            "GET",
            "PUT",
            "POST",
            "DELETE"
        ],
        "AllowedOrigins": [
            "https://ica.illumina.com"
        ],
        "ExposeHeaders": [
            "ETag",
            "x-amz-meta-custom-header"
        ]
    }
]

Create AWS IAM Policy

ICA requires specific permissions to access data in an AWS S3 bucket. These permissions are contained in an AWS IAM Policy.

Refer to the Creating policies on the JSON tab documentation for instructions on creating an AWS IAM Policy via the AWS Management Console. Use the following configuration during the process:

On Unversioned buckets, paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.

Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:RestoreObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/YOUR_FOLDER_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetFederationToken"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

On Versioned OR Suspended buckets, paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.

Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:GetBucketLocation",
                "s3:ListBucketVersions",
                "s3:GetBucketVersioning"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:RestoreObject",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/YOUR_FOLDER_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetFederationToken"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

(Optional) Set policy name to "illumina-ica-admin-policy"

To create the IAM Policy via the AWS CLI, create a local file named illumina-ica-admin-policy.json containing the policy content above and run the following command. Be sure the path to the policy document (--policy-document) leads to the path where you saved the file:

aws iam create-policy --policy-name illumina-ica-admin-policy --policy-document file://illumina-ica-admin-policy.json

Create AWS IAM User

An AWS IAM User is needed to create an Access Key for ICA to connect to the AWS S3 Bucket. The policy will be attached to the IAM user to grant the user the necessary permissions.

Refer to the Creating IAM users (console) documentation for instructions on creating an AWS IAM User via the AWS Management Console. Use the following configuration during the process:

(optional) Set user name to "illumina_ica_admin"
Select the Programmatic access option for the type of access
Select Attach existing policies directly when setting the permissions, and choose the policy created in Create AWS IAM Policy
(Optional) Retrieve the Access Key ID and Secret Access Key by choosing to Download .csv

To create the IAM user and attach the policy via the AWS CLI, enter the following command (AWS IAM users are global resources and do not require a region to be specified). This command creates an IAM user illumina_ica_admin, retrieves your AWS account number, and then attaches the policy to the user.

aws iam create-user --user-name illumina_ica_admin
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
aws iam attach-user-policy --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/illumina-ica-admin-policy --user-name illumina_ica_admin

Create AWS Access Key

If the Access Key information was retrieved during the IAM user creation, skip this step.

Refer to the Managing access keys (console) AWS documentation for instructions on creating an AWS Access Key via the AWS Console. See the "To create, modify, or delete another IAM user's access keys (console)" sub-section.

Use the command below to create the Access Key for the illumina_ica_admin IAM user. Note the SecretAccessKey is sensitive and should be stored securely. The access key is only displayed when this command is executed and cannot be recovered. A new access key must be created if it is lost.

aws iam create-access-key --user-name illumina_ica_admin

    "AccessKey": {
        "UserName": "illumina_ica_admin",
        "AccessKeyId": "<access key id>",
        "Status": "Active",
        "SecretAccessKey": "<secret access key>",
        "CreateDate": "2020-10-22 09:42:24+00:00"
    }

The AccessKeyId and SecretAccessKey values will be provided to ICA in the next step.

S3 Bucket Policy

Connecting your S3 bucket to ICA does not require any additional bucket policies.

However, if a bucket policy is required for use cases beyond ICA, you need to ensure that the bucket policy supports the essential permissions needed by ICA without inadvertently restricting its functionality.

Here is one such example:

Be sure to replace the following fields:

YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.
YOUR_ACCOUNT_ID: Replace this field with your account ID number.
YOUR_IAM_USER: Replace this field with the name of your IAM user created for ICA.

{
     "Version": "2012-10-17",
     "Statement": [
         {
             "Effect": "Deny",
             "Principal": {
                 "AWS": "*"
             },
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:RestoreObject",
                 "s3:DeleteObject",
                 "s3:DeleteObjectVersion",
                 "s3:GetObjectVersion"
             ],
             "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
             "Condition": {
                 "ArnNotLike": {
                     "aws:PrincipalArn": [
                         "arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_IAM_USER",
                         "arn:aws:sts::YOUR_ACCOUNT_ID:federated-user/*"
                     ]
                 }
             }
         }
     ]
 }

In this example, we have a restriction enabled on the bucket policy to disallow any kind of access to the bucket. However, there is an exception rule added for the IAM user that ICA is using to connect to the S3 bucket. The exception rule is allowing ICA to perform the above S3 action permissions necessary for ICA functionalities.

Additionally, the exception rule is applied to the STS federated user session principal associated with ICA. Since ICA leverages the AWS STS to provide temporary credentials that allow users to perform actions on the S3 bucket, it is crucial to include these STS federated user session principals in your policy's whitelist. Failing to do so could result in 403 Forbidden errors when users attempt to interact with the bucket's objects using the provided temporary credentials.

Create ICA Storage Credential

To connect your S3 account to ICA, you need to add a storage credential in ICA containing the Access Key ID and Access Key created in the previous step. From the ICA home screen, navigate to System Settings > Credentials and click the Create button to create a new storage credential.

Provide a name for the storage credentials, ensure the type is set to "AWS user" and provide the Access Key ID and Secret Access Key.

With the secret credentials created, a storage configuration can be created using the secret credential. Refer to the instructions to Create a Storage Configuration for details.

Enabling Cross Account Access for Copy and Move Operations

ICA uses AssumeRole to copy and move objects from a bucket in an AWS account to another bucket in another AWS account. To allow cross account access to a bucket, the following policy statements must be added in the bucket policy:

Be sure to replace the following fields:

ASSUME_ROLE_ARN: Replace this field with the ARN of the cross account role you want to give permission to. Refer to the table below to determine which region-specific Role ARN should be used.
YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Allow cross account access",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "ASSUME_ROLE_ARN"
                },
                "Action": [
                    "s3:PutObject",
                    "s3:DeleteObject",
                    "s3:ListMultipartUploadParts",
                    "s3:AbortMultipartUpload",
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::YOUR_BUCKET_NAME",
                    "arn:aws:s3:::YOUR_BUCKET_NAME/*"
                ]
            }
        ]
    }

The ARN of the cross account role you want to give permission to is specified in the Principal. Refer to the table below to determine which region-specific Role ARN should be used.

Region

Role ARN

Australia (AU)

arn:aws:iam::079623148045:role/ica_aps2_crossacct

Canada (CA)

arn:aws:iam::079623148045:role/ica_cac1_crossacct

Germany (EU)

arn:aws:iam::079623148045:role/ica_euc1_crossacct

India (IN)

arn:aws:iam::079623148045:role/ica_aps3_crossacct

Indonesia (ID)

arn:aws:iam::079623148045:role/ica_aps4_crossacct

Israel (IL)

arn:aws:iam::079623148045:role/ica_ilc1_crossacct

Japan (JP)

arn:aws:iam::079623148045:role/ica_apn1_crossacct

Singapore (SG)

arn:aws:iam::079623148045:role/ica_aps1_crossacct

South Korea (KR)

arn:aws:iam::079623148045:role/ica_apn2_crossacct

UK (GB)

arn:aws:iam::079623148045:role/ica_euw2_crossacct

United Arab Emirates (AE)

arn:aws:iam::079623148045:role/ica_mec1_crossacct

United States (US)

arn:aws:iam::079623148045:role/ica_use1_crossacct

Storage Configuration Troubleshooting Guide

The following are common issues encountered when connecting an AWS S3 bucket through a storage configuration

Error Type

Error Message

Description/Fix

Access Forbidden

Access forbidden: {message}

Mostly occurs because of lack of permission. Fix: Review IAM policy, Bucket policy, ACLs for required permissions

Conflict

System topic is not in a valid state

Conflict

Found conflicting storage container notifications with overlapping prefixes

Conflict

Found conflicting storage container notifications for {prefix}{eventTypeMsg}

Conflict

Found conflicting storage container notifications with overlapping prefixes{prefixMsg}{eventTypeMsg}

Customer Container Notification Exists

Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification

Invalid Access Key ID

Failed to update bucket policy: The AWS Access Key Id you provided does not exist in our records.

Check the status of the AWS Access Key ID in the console. If not active, activate it. If missing, create it.

Invalid Paramater

Missing credentials for storage container

Invalid Parameter

Missing bucket name for storage container

Invalid Parameter

The storage container name has invalid characters

Invalid Parameter

Storage Container '{storageContainer}' does not exist

Invalid Parameter

Invalid parameters for volume configuration: {message}

Invalid Storage Container Location

Storage container must be located in the {region} region

Invalid Storage Container Location

Storage container must be located in one of the following regions: {regions}

Missing Configuration

Missing queue name for storage container notification

Missing Configuration

Missing system topic name for storage container notification

Missing Configuration

Missing lambda ARN for storage container notification

Missing Configuration

Missing subscription name for storage container notification

Missing Storage Account Settings

The storage account '{storageAccountName}' needs HNS (Hierarchical Namespace) enabled.

Missing Storage Container Settings

Missing settings for storage container

Conflicting bucket notifications

This error occurs when an existing bucket notification's event information overlap with the notifications ICA is trying to add. Amazon S3 event notification only allows overlapping events with non-overlapping prefix. Depending on the conflicts on the notifications, the error can be presented in any of the following:

Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification

Invalid parameters for volume configuration: found conflicting storage container notifications with overlapping prefixes

Failed to update bucket policy: Configurations overlap. Configurations on the same bucket cannot share a common event type

To fix the issue:

In the Amazon S3 Console, review your current S3 bucket's notification configuration and look for prefixes that overlaps with your Storage Configuration's key prefix
Delete the existing notification that overlaps with your Storage Configuration's key prefix
ICA will perform a series of steps in the background to re-verify the connection to your bucket.

GetTemporaryUploadCredentialsAsync failure

This error can occur when recreating a recently deleted storage configuration. To fix the issue, you have to delete the bucket notifications:

In the Amazon S3 Console select the bucket for which you need to delete the notifications from the list.
Choose properties
Navigate to the Event Notifications section and choose the check box for the event notifications with name gds:objectcreated, gds:objectremoved and gds:objectrestore and click Delete.
Wait 15 minutes for the storage to become available in ICA

If you do not want to wait 15 minutes, you can delete the current storage configuration, delete the bucket notifications in the bucket and create a new storage configuration.

SSE-KMS Encryption

This section describes how to connect an AWS S3 Bucket with SSE-KMS Encryption enabled. General instructions for configuring your AWS account to allow ICA to connect to an S3 bucket are found on this page.

Create an S3 bucket with SSE-KMS

Follow the AWS instructions for how to create S3 bucket with SSE-KMS key.

S3-SSE-KMS must be in the same region as your ICA v2.0 project. See the ICA S3 bucket documentation for more information.

In the "Default encryption" section, enable Server-side encryption and choose AWS Key Management Service key (SSE-KMS). Then select Choose your AWS KMS key.

If you do not have an existing customer managed key, click Create a KMS key and follow these steps from AWS.

Once the bucket is set, create a folder with encryption enabled in the bucket that will be linked in the ICA storage configuration. This folder will be connected to ICA as a prefix. Although it is technically possible to use the root folder, this is not recommended as it will cause the S3 bucket to no longer be available for other projects.

Connect the S3-SSE-KMS to ICA

Follow the general instructions for connecting an S3 bucket to ICA.

In the step "Create AWS IAM policy":

Add permission to use KMS key by adding kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey
Add the ARN KMS key arn:aws:kms:xxx on the first "Resource"

On Unversioned buckets, the permssions will match the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:GenerateDataKey",
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:kms:xxx",
                "arn:aws:s3:::BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:RestoreObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::BUCKET_NAME/YOUR_FOLDER_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetFederationToken"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

On Versioned OR Suspended buckets, the permssions will match the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:GenerateDataKey",
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:GetBucketLocation",
                "s3:ListBucketVersions",
                "s3:GetBucketVersioning"
            ],
            "Resource": [
                "arn:aws:kms:xxx",
                "arn:aws:s3:::BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:RestoreObject",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::BUCKET_NAME/YOUR_FOLDER_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetFederationToken"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

At the end of the policy setting, there should be 3 permissions listed in the "Summary".

Create the S3-SSE-KMS configuration in ICA

Follow the general instructions for how to create a storage configuration in ICA.

On step 3 in process above, continue with the [Optional] Server Side Encryption to enter the algorithm and key name for server-side encryption processes.

On "Algorithm", input aws:kms
On "Key Name", input the ARN KMS key: arn:aws:kms:xxx

Although "Key prefix" is optional, it is highly recommended to use this and not use the root folder of your S3 bucket. "Key prefix" refers to the folder name in the bucket which you created.

Additional set up for Cross Account Copy for S3 buckets with SSE-KMS encryption

In addition to following the instructions to Enable Cross Account Copy, the KMS policy must include the following statement for AWS S3 Bucket with SSE-KMS Encyption (refer to the Role ARN table from the linked page for the ASSUME_ROLE_ARN value):

    {
        "Sid": "Allow cross account access",
        "Effect": "Allow",
        "Principal": {
            "AWS": "ASSUME_ROLE_ARN"
        },
        "Action": [
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey"
        ],
        "Resource": "*"
    }

Project

Data

The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.

Recommended Practices

File/Folder Naming

The length of the file name (minus prefixes and delimiters) is ideally limited to 32 characters.

Characters generally considered "safe"

Alphanumeric characters
- 0-9
- a-z
- A-Z

Special characters
- Exclamation point !
- Hyphen -
- Underscore _
- Period .
- Asterisk *
- Single quote '
- Open parenthesis (
- Closed parenthesis )

Troubleshooting

If you get an error "Unable to generate credentials from the objectstore as the requested path is too long." from AWS when requesting temporary credentials, then the path should be shortened.

You can truncate the sample name and user reference or use advanced output mapping in the API which avoids generating the long folders and creates output in the targetPath-defined location.

"analysisOutput": [ { "sourcePath": "out", "type": "FOLDER", "targetProjectId": "enter_your_target_project_id", "targetPath": "/enter_your_target_folder/" }

Data Formats

Data Privacy

Data Integrity

Data Management

To prevent cost issues, you can not perform actions such as copying and moving data which would write data to the workspace when the project billing mode is set to tenant and the owning tenant of the folder is not the current user's tenant.

Viewing Data

On the Projects > your_project > Data page, you can view file information and preview files.

To view file details click on the filename to see the file details.

Run input tags identify the last 100 pipelines which used this file as input.
Connector tags indicate if the file was added via browser upload or connector.

To view file contents, select the checkbox at the begining of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click view to preview the file.

To see the ongoing actions (copying from, copying to, moving from, moving to) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list. This contains a list of ongoing actions sorted by when they were created. You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.

For folders, the list of ongoing actions is displayed on top left of the folder details. When clicking the list, the data job details are shown of the most recent created data job of all actions.

Secondary Data

When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).

Hyperlinking to Data

To hyperlink to data, use the following syntax:

Normal permission checks still apply with these links. If you try to follow a link to data to which you do not have access, you will be returned to the main project screen or login screen, depending on your permissions.

Uploading Data

Uploading data to the platform makes it available for consumption by analysis workflows and tools.

UI Upload

To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either

Drag a file from your system into the Choose a file or drag it here box.
Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.

Your files are added to the Data page with status partial during upload and become available when upload completes.

Do not close the ICA tab in your browser while data uploads.

Upload Data via CLI

Copying Data

You can copy data from the same project to a different folder or from another project to which you have access.

In order to copy data, the following rights must be assigned to the person copying the data:

The following restrictions apply when copying data:

Data in the "Partial" or "Archived" state will be skipped during a copy job.

To use data copy:

Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.
Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.
Select the data (individual files or folders with data) you want to copy.
Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).
Select which action to take if the data already exists (overwrite exsiting data, don't copy or keep both the original and the new copy by appending a number to the copied data).
Select Copy Data to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed whan the copy process completes.

The outcome can be

INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are copied.
PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.
FAILED - None of the files and folders could be copied.

To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.

There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.

Notes on copying data

Copying data comes with an additional storage cost as it will create a copy of the data.
You can copy over the same data multiple times.
On the command-line interface, the command to copy data is icav2 projectdata copy.

Move Data

You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.

Move From is used when you are in the destination location.
Move To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.

When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.

Troubleshooting

If the source or destination of data being moved is modified, the Move jobs will detect the changes and abort the job.
Modifying data at either the source or destination during a Move process can result in incomplete data transfer. Users can still manually move any remaining data afterward.

There are a number of rights and restrictions related to data move as this will delete the data in the source location.

Move jobs will fail if any data being moved is in the "Partial" or "Archived" state.

Move Data From

Move Data From is used when you are in the destination location.

Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.
Select the files and folders which you want to move.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Data To

Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.

Navigate to Projects > your_project > Data > your_source_location.
Select the files and folders which you want to move.
Select to Projects > your_project > Data > your_source_location > Manage > Move To.
Select your target project and location.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Status

INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are moved.
PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.
FAILED - None of the files and folders could be moved.

Restrictions:

A total maximum of 1000 items can be moved in one operation. An item can be either a file or a folder. Folders with subfolders and subfiles still count as one item.
You can not move files and folders to a destination where one or more files or folders with the same name already exists.
You can not move data and folders to linked data.
You can not move a folder to itself.
You can not move data which is in the process of being moved.
You can not move data across regions.
You can not move data from externally-managed projects.
You can not move linked data.
You can not move data between regions.
You can not move externally managed data.
You can only move data when it has status available.
To move data across projects, it must be owned by the user's tenant.
If you do not select a target folder for Move Data To, the root folder of the target project is used.

If you are only able to select your source project as the target data project, this may indicate that data sharing (Projects > your_project > Project Settings > Details > Data Sharing) is not enabled for your project or that you do not have have upload rights in other projects.

Download Data

Single files can be downloaded directly from within the UI.

Select the checkbox next to the file which you want to download, followed by Download > Download file.
Files for which ICA can display the contents can be viewed by clicking on the filename, followed by the View tab. Select the download action on the view tab to download the file. Note that larger files may take some time to load.

Schedule for Download

You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.

Select a file or files to download.
Select Download > Download files or folders using a service connector. This will display a list of all available connectors.
Select a connector, and then select Schedule for Download. If you do not find the connector you need or you do not have a connector, you can click the Don't have a connector yet? option to create a new connector. You must then install this new connector and return to the file selection in step 1 to use it.

You can view the progress of the download or stop the download on the Activity page for the project.

Export Project Data Information

The data records contained in a project can be exported in CSV, JSON, and excel format.

Select one or more files to export.
Select Export.
Select the following export options:
- To export only the selected file, select the Selected rows as the Rows to export option. To export all files on the page, select Current page.
- To export only the columns present for the file, select the Visible columns as the Columns to export option.
Select the export format.

Archiving and Deleting files

To manually archive or delete files, do as follows:

Select the checkbox next to the file or files to delete or archive.
Select Manage, and then select one of the following options:
- Archive — Move the file or files to long-term storage (event code ICA_DATA_110).
- Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).
- Delete — Remove the file completely (event code ICA_DATA_106).

When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.

To archive or delete files programmatically, you can use ICA's API endpoints:

Modify the dates of the file to be deleted/archived.

Python Example

The Python snippet below exemplifies the approach: it sets (or updates if set already) the time to be archived for a specific file:

To delete a file at specific timepoint, the key 'willBeDeletedAt' should be added or changed using the API call. If running in the terminal, a successful run will finish with the message ‘200’. In the ICA UI, you can check the details of the file to see the updated values for ‘Time To Be Archived’ (willBeArchivedAt) or ‘Time To Be Deleted’ (willBeDeletedAt), as shown in the screenshot.

Link Project Data

Linking a folder creates a dynamic read-only view of the source data. You can use this to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required.

You can recognise linked data by the green color and see the owning project as part of the details.

Since this is read-only access, you cannot perform actions on linked data that need to write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.

Linking data is only possible from the root folder of your destination project. The action is disabled in project subfolders.

Linking a parent folder after linking a file or subfolder will unlink the file or subfolder and link the parent folder. So root\linked_subfolder will become root\linked_parentfolder\linked_subfolder.

Migrating snapshot linked data. (linked before ICA release v.2.29)

Before ICA version v.2.29, when data was linked, a snapshot was created of the file and folder structure. These links created a read-only view of the data as it was at the time of linking, but did not propagate changes to the file and folder structure. If you want to use the advantages of the new way of linking with dynamic updates, unlink the data and relink it. Since snapshot linking has been deprecated, all new data linking done in ICA v.2.29 or later has dynamic content updates.

Initial linking can take considerable time when there is a large amount of source data. However, once the initial link is made, updates to the source data will be instantaneous.

You can perform analysis on data from other projects by linking data from that project.

Select Projects > your_project > Data > Manage, and then select Link.
To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.
Select the checkbox next to the file or files to add.
Select Select Data.

Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.

Display Owning Project

if you have selected multiple owning projects, you can add the owning project column to see which project owns the data.

At the top of the screen, next to the filer icon, select the three columns.
The Add/remove columns tab will appear.
Choose Owning Project (or Linked Projects)

Linking Folders

If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.

To see more details, double-click the batch job.

To see how many individual files are already linked, double-click the item.

Unlinking Project Data

To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > activity > Batch Jobs.

Non-indexed Folders

The GUI considers non-indexed folders as a single object. You can access the contents from a non-indexed folder

as Analysis input/output
in Bench
via the API

Data Integrity

You can verify the integrity of the data with the MD5 (Message Digest Algorithm 5) checksum. It is a widely used cryptographic hash function that generates a fixed-size, 128-bit hash value from any input data. This hash value is unique to the content of the data, meaning even a slight change in the data will result in a significantly different MD5 checksum.

For files smaller than 16 MB, you can directly retrieve the MD5 checksum using our API endpoints. Make an API GET call to the https://ica.illumina.com/ica/rest/api/projects/{projectId}/data/{dataId} endpoint specifying the data Id you want to check and the corresponding project ID. The response you receive will be in JSON format, containing various file metadata. Within the JSON response, look for the objectETag field. This value is the MD5 checksum for the file you have queried. You can compare this checksum with the one you compute locally ot ensure the file's integrity.

For larger files, the process is different due to computation limitations. In these cases, we recommend using a dedicated pipeline on our platform to explicitly calculate the MD5 checksum. Below you can find both a main.nf file and the corresponding XML for a possible Nextflow pipeline to calculate the MD5 checksum for FASTQ files.

nextflow.enable.dsl = 2


process md5sum {
    
    container "public.ecr.aws/lts/ubuntu:22.04"
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    
    input:
        file txt

    output:
        stdout emit: result
        path '*', emit: output

    publishDir "out", mode: 'symlink'

    script:
        txt_file_name = txt.getName()
        id = txt_file_name.takeWhile { it != '.'}

        """
        set -ex
        echo "File: $txt_file_name"
        echo "Sample: $id"
        md5sum ${txt} > ${id}_md5.txt
        """
    }

workflow {
    txt_ch = Channel.fromPath(params.in)
    txt_ch.view()
    md5sum(txt_ch).result.view()
}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <pd:dataInputs>
        <pd:dataInput code="in" format="FASTQ" type="FILE" required="true" multiValue="true">
            <pd:label>Input</pd:label>
            <pd:description>FASTQ files input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

Samples

You can use samples to group information related to a sample, including input files, output files, and analyses.

You can search for samples (excluding their metadata) with the Search button at the top right.

Add New Sample

To add a new sample, do as follows.

Select Projects > your_project > Samples.
To add a new sample, select + Create, and then enter a unique name and description for the sample.
To include files related to the sample, select + Add data to sample.

Your sample is added to the Samples page. To view information on the sample, select the sample, and then select Open Details.

Add Files to Samples

You can add additional files to a sample after creating the sample. Any files that are not currently included in a sample are listed on the Unlinked Files tab.

To add an unlinked file to a sample, do as follows.

Go to Projects > your_project > Samples > Unlinked files tab.
Select a file or files, and then select one of the following options:
- Create sample — Create a new sample that includes the selected files.
- Link to sample — Select an existing sample in the project to link the file to.

Alternatively, you can add unlinked files from the sample details.

Going to Projects > your_project > Samples > your_sample.
Select your sample to open the details.
The last section of the details is files, where you select + Add data to sample.
If the data is not in your project, select Choose a file, which will upload the data to your project. This does not automatically add it to your sample, you will still have to select that newly uploaded data and then select add data to sample.

Data can only be linked to a single sample, so once you have linked data to a sample, it will no longer appear in the list of data to choose form.

Removing Files from Samples

To remove files from samples,

Go to Projects > your_project > Samples > your_sample > Details.
Go to the files section and open the file details of the file you want to remove.
Select Remove data from sample.
Save your changes.

Link Samples to Project

A Sample can be linked to a project from a separate project to make it available in read-only capacity.

Navigate to the Samples view in the Project
Click the Link button
Select the Sample(s) to link to the project
Click the Link button

Data linked to Samples is not automatically linked to the project. The data must be linked separately from the Data view. Samples also must be available in a complete state in order to be linked.

Delete Samples

If you want to remove a sample, select it and use the delete option from the top navigation row. You will be presented a choice of how to handle the data in the sample.

Unlink all data without deleting it.
Delete input data and unlink other data.
Delete all data.

Activity

The Activity view shows the status and history of long-running activities including Data Transfers, Base Jobs, Base Activity, Bench Activity and Batch Jobs.

Data Transfers

The Data Transfers tab shows the status of data uploads and downloads. You can sort, search and filter on various criteria and export the information. Show ongoing transfers (top right) allows you to filter out the completed and failed transfers to focus on current activity.

Base Jobs

The Base Jobs tab gives an overview of all the actions related to a table or a query that have run or are running (e.g., Copy table, export table, Select * from table, etc.)

The jobs are shown with their:

Creation time: When did the job start
Description: The query or the performed action with some extra information
Type: Which action was taken
Status: Failed or succeeded
Duration: How long the job took
Billed bytes: The used bytes that need to be paid for

Failed jobs provide information on why the job failed. Details are accessed by double-clicking the failed job. Jobs in progress can be aborted here.

Base Activity

The Base Activity tab gives an overview of previous results (e.g., Executed query, Succeeded Exporting table, Created table, etc.) Collecting this information can take considerable time. For performance reasons, only the activity of the last month (rolling window) with a limit of 1000 records is shown and available for download as Excel or JSON. To get the data for the last year without limit on the number of records, use the export as file function. No activity data is retained for more than one year.

The activities are shown with:

Start Time: The moment the action was started
Query: The SQL expression.
Status: Failed or succeeded
Duration: How long the job took
User: The user that requested the action
Size: For SELECT queries, the size of the query results is shown. Queries resulting in less than 100Kb of data will be shown with a size of <100K

Bench Activity

The Bench Activity tab shows the actions taken on Bench Workspaces in the project.

The activities are shown with:

Workspace: Workspace where the activity took place
Date: Date and time of the activity
User: User who performed the activity
Action: Which activity was performed

Batch Jobs

The Batch Jobs tab allows users to monitor progress of Batch Jobs in the project. It lists Data Downloads, Sample Creation (double-click entries for details) and Data Linking (double-click entries for details). The (ongoing) Batch Job details are updated each time they are (re)opened, or when the refresh button is selected at the bottom of the details screen. Batch jobs which have a final state such as Failed or Succeeded are removed from the activity list after 7 days.

Which batch jobs are visible depends on the user role.

Reference Data

To use a reference set from within a project, you have first to add it. From the project's page select Flow > Reference Data > Manage > +Add to project. Then select a reference set to add to your project. You can select the entire reference set, or click the arrow next to it to expand it. After expanding, scroll to the right, to see the individual reference files in the set. You can select individual reference files to add to your project, by checking the boxes next to them.

Note: Reference sets are only supported in Graphical CWL pipelines.

Copying Reference Data to other Regions

Navigate to Reference Data (outside of Project context).
Select the data set(s) you wish to add to another region and select Actions > Copy to another project.
Select a project located in the region where you want to add your reference data.
You can check in which region(s) Reference data is present by double-clicking on individual files in the Reference set and viewing Copy Details on the Data details tab.
Allow a few minutes for new copies to become available before use.

Note: You only need one copy of each reference data set per region. Adding Reference Data sets to additional projects set in the same region does not result in extra copies, but creates links instead. This is done from inside the project at Projects > <your_project> > Flow > Reference Data > Manage > Add to project.

Creating a Pipeline with Reference Data

To create a pipeline with a reference data use the CWL - graphical mode (important restriction: as of now you cannot use reference data for pipelines created in advanced mode). Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.

If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings.

After clicking the magnifying glass icon the user can select from provided options.

Nextflow

In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.

System Information

(*) Pipelines will still run when 20.10.0 will be deprecated, but you will no longer be able to choose it when creating new pipelines.

Nextflow Version

You can select the Nextflow version while building a pipeline as follows:

Compute Node

For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard (default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy (AWS spot instance) tiers.

Compute Type

Inputs

Outputs

Nextflow version 20.10.10 (Deprecated)

For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir process due to insufficient disk space, resulting in incomplete output delivery.

Solutions:

Nextflow Configuration

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.

If no Docker image is specified, Ubuntu will be used as default.

The following configuration settings will be ignored if provided as they are overridden by the system:

CWL

Compute Type

To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement with a custom namespace.

For example, take the following ResourceRequirements:

This would result in a best fit of standard-large ICA Compute Type request for the tool.

If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.

Considerations

If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.

CWL Overrides

In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.

XML Input Form

Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.

The input form XML must adhere to the input form schema.

Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

Files

The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:

code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.

Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.

Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

Folder as an input

To use a folder as an input the following form is required:

Multiple files as an input

For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.

Settings

Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:

code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?

In the code below a string setting with the identifier inp1 is specified.

Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.

Integers

For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.

Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

Option types can also be used to specify a boolean, for example

Strings

For a string setting the following schema with an element stringType is to be used.

Booleans

For a boolean setting, booleanType can be used.

Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

JSON Scatter Gather Pipeline

Pay close attention to uppercase and lowercase characters when creating pipelines.

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > JSON based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

Nextflow files

split.nf

First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.

sort.nf

Next, select +Create and name the file sort.nf. Copy and paste the following definition.

merge.nf

Select +Create again and label the file merge.nf. Copy and paste the following definition.

main.nf

Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.

Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.

Inputform files

On the Inputform files tab, edit the inputForm.json to allow selection of a file.

inputForm.json

Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.

The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.

onSubmit.js

onRender.js

Click the Save button to save the changes.

Command-Line Interface

Sequencer Integration

Tutorials

Reference

Cohorts Data in ICA Base

ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.

ICA Cohorts Base Tables

After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.

Post ingestion, data will be represented in Base.
Select BASE from the ICA left-navigation and click Query.
Under the New Query window, a list of tables is displayed. Expand the Shared Database for Project \<your project name\> .
Cohorts tables will be displayed.
To preview the table and fields click each view listed.
Clicking any of these views then selecting PREVIEW on the right-hand side will show you a preview of the data in the tables.

Note: If your ingestion includes Somatic variants, there will be two molecular tables: ANNOTATED_SOMATIC_MUTATIONS and ANNOTATED_VARIANTS. All ingestions will include a PHENOTYPE table.

Note: The PHENOTYPE table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the PHENOTYPE table.

Phenotype Data

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Sample Identifier

SUBJECTID

STRING

Identifer for Subject entity

STUDY

STRING

Study designation

AGE

NUMERIC

Age in years

SEX

STRING

Sex field to drive annotation

POPULATION

STRING

Population Designation for 1000 Genomes Project

SUPERPOPULATION

STRING

Superpopulation Designation from 1000 Genomes Project

RACE

STRING

Race according to NIH standard

CONDITION_ONTOLOGIES

VARIANT

Diagnosis Ontology Source

CONDITION_IDS

VARIANT

Diagnosis Concept Ids

CONDITIONS

VARIANT

Diagnosis Names

HARMONIZED_CONDITIONS

VARIANT

Diagnosis High-level concept to drive UI

LIBRARYTYPE

STRING

Seqencing technology

ANALYTE

STRING

Substance sequenced

TISSUE

STRING

Tissue source

TUMOR_OR_NORMAL

STRING

Tumor designation for somatic

GENOMEBUILD

STRING

Genome Build to drive annotations - hg38 only

SAMPLE_BARCODE_VCF

STRING

Sample ID from VCF

AFFECTED_STATUS

NUMERIC

Affected, Unaffected, or Unknown for Family Based Analysis

FAMILY_RELATIONSHIP

STRING

Relationship designation for Family Based Analysis

Annotated Variants

This table will be available for all projects with ingested molecular data

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Original sample barcode used in VCF column

STUDY

STRING

Study designation

GENOMEBUILD

STRING

Only hg38 is supported

CHROMOSOME

STRING

Chromosome without 'chr' prefix

CHROMOSOMEID

NUMERIC

Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt

DBSNP

STRING

dbSNP Identifiers

VARIANT_KEY

STRING

Variant ID in the form "1:12345678:12345678:C"

NIRVANA_VID

STRING

Broad Institute VID: "1-12345678-A-C"

VARIANT_TYPE

STRING

Description of Variant Type (e.g. SNV, Deletion, Insertion)

VARIANT_CALL

NUMERIC

1=germline, 2=somatic

DENOVO

BOOLEAN

true / false

GENOTYPE

STRING

"G|T"

READ_DEPTH

NUMERIC

Sequencing read depth

ALLELE_COUNT

NUMERIC

Counts of each alternate allele for each site across all samples

ALLELE_DEPTH

STRING

Unfiltered count of reads that support a given allele for an individual sample

FILTERS

STRING

Filter field from VCF. If all filters pass, field is PASS

ZYGOSITY

NUMERIC

0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt

GENEMODEL

NUMERIC

1=Ensembl, 2=RefSeq

GENE_HGNC

STRING

HUGO/HGNC gene symbol

GENE_ID

STRING

Ensembl gene ID ("ENSG00001234")

GID

NUMERIC

NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID

TRANSCRIPT_ID

STRING

Ensembl ENST or RefSeq NM_

CANONICAL

STRING

Transcript designated 'canonical' by source

CONSEQUENCE

STRING

missense, stop gained, intronic, etc.

HGVSC

STRING

The HGVS coding sequence name

HGVSP

STRING

The HGVS protein sequence name

Annotated Somatic Mutations

This table will only be available for data sets with ingested Somatic molecular data.

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Original sample barcode, used in VCF column

SUBJECTID

STRING

Identifier for Subject entity

STUDY

STRING

Study designation

GENOMEBUILD

STRING

Only hg38 is supported

CHROMOSOME

STRING

Chromosome without 'chr' prefix

DBSNP

NUMERIC

dbSNP Identifiers

VARIANT_KEY

STRING

Variant ID in the form "1:12345678:12345678:C"

MUTATION_TYPE

NUMERIC

Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant

VARIANT_CALL

NUMERIC

1=germline, 2=somatic

GENOTYPE

STRING

"G|T"

REF_ALLELE

STRING

Reference allele

ALLELE1

STRING

First allele call in the tumor sample

ALLELE2

STRING

Second allele call in the tumor sample

GENEMODEL

NUMERIC

1=Ensembl, 2=RefSeq

GENE_HGNC

STRING

HUGO/HGNC gene symbol

GENE_ID

STRING

Ensembl gene ID ("ENSG00001234")

TRANSCRIPT_ID

STRING

Ensembl ENST or RefSeq NM_

CANONICAL

BOOLEAN

Transcript designated 'canonical' by source

CONSEQUENCE

STRING

missense, stop gained, intronic, etc.

HGVSP

STRING

HGVS nomenclature for AA change: p.Pro72Ala

Annotated Copy Number Variants

This table will only be available for data sets with ingested CNV molecular data.

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

GENOMEBUILD

STRING

Genome build, always 'hg38'

NIRVANA_VID

STRING

Variant ID of the form 'chr-pos-ref-alt'

CHRID

STRING

Chromosome without 'chr' prefix

CID

NUMERIC

Numerical representation of the chromosome, X=23, Y=24, Mt=25

GENE_ID

STRING

NCBI or Ensembl gene identifier

GID

NUMERIC

Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix

START_POS

NUMERIC

First affected position on the chromosome

STOP_POS

NUMERIC

Last affected position on the chromosome

VARIANT_TYPE

NUMERIC

1 = copy number gain, -1 = copy number loss

COPY_NUMBER

NUMERIC

Observed copy number

COPY_NUMBER_CHANGE

NUMERIC

Fold-chang of copy number, assuming 2 for diploid and 1 for haploid as the baseline

SEGMENT_VALUE

NUMERIC

Average FC for the identified chromosomal segment

PROBE_COUNT

NUMERIC

Probes confirming the CNV (arrays only)

REFERENCE

NUMERIC

Baseline taken from normal samples (1) or averaged disease tissue (2)

GENE_HGNC

STRING

HUGO/HGNC gene symbol

Annotated Structural Variants

This table will only be available for data sets with ingested SV molecular data. Note that ICA Cohorts stores copy number variants in a separate table.

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

GENOMEBUILD

STRING

Genome build, always 'hg38'

NIRVANA_VID

STRING

Variant ID of the form 'chr-pos-ref-alt'

CHRID

STRING

Chromosome without 'chr' prefix

CID

NUMERIC

Numerical representation of the chromosome, X=23, Y=24, Mt=25

BEGIN

NUMERIC

First affected position on the chromosome

END

NUMERIC

Last affected position on the chromosome

BAND

STRING

Chromosomal band

QUALIITY

NUMERIC

Quality from the original VCF

FILTERS

ARRAY

Filters from the original VCF

VARIANT_TYPE

STRING

Insertion, deletion, indel, tandem_duplication, translocation_breakend, inversion ("INV"), short tandem repeat ("STR2")

VARIANT_TYPE_ID

NUMERIC

21=insertion, 22=deletion, 23=indel, 24=tandem_duplication, 25=translocation_breakend, 26=inversion ("INV"), 27=short tandem repeat ("STR2")

CIPOS

ARRAY

Confidence interval around first position

CIEND

ARRAY

Confidence interval around last position

SVLENGTH

NUMERIC

Overall size of the structural variant

BONDCHR

STRING

For translocations, the other affected chromosome

BONDCID

NUMERIC

For translocations, the other affected chromosome as a numeric value, X=23, Y=24, Mt=25

BONDPOS

STRING

For translocations, positions on the other affected chromosome

BONDORDER

NUMERIC

3 or 5: Whether this fragment (the current variant/VID) "receives" the other chromosome's fragment on it's 3' end, or attaches to the 5' of the other chromosome fragment

GENOTYPE

STRING

Called genotype from the VCF

GENOTYPE_QUALITY

NUMERIC

Genotype call quality

READCOUNTSSPLIT

ARRAY

Read counts

READCOUNTSPAIRED

ARRAY

Read counts, paired end

REGULATORYREGIONID

STRING

Ensembl ID for the affected regulatory region

REGULATORYREGIONTYPE

STRING

Type of the regulatory region

CONSEQUENCE

ARRAY

Variant consequence according to SequenceOntology

TRANSCRIPTID

STRING

Ensembl of RefSeq transcript identifier

TRANSCRIPTBIOTYPE

STRING

Biotype of the transcript

INTRONS

STRING

Count of impacted introns out of the total number of introns, specified as "M/N"

GENEID

STRING

Ensembl or RefSeq gene identifier

GENEHGNC

STRING

HUGO/HGNC gene symbol

ISCANONICAL

BOOLEAN

Is the transcript ID the canonical one according to Ensembl?

PROTEINID

STRING

RefSeq or Ensembl protein ID

SOURCEID

NUMERICAL

Gene model: 1=Ensembl, 2=RefSeq

Raw RNAseq data tables for genes and transcripts

These tables will only be available for data sets with ingested RNAseq molecular data.

Table for gene quantification results:

Field Name

Type

Description

GENOMEBUILD

STRING

Genome build, always 'hg38'

STUDY_NAME

STRING

Study designation

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

LABEL

STRING

Group label specified during import: Case or Control, Tumor or Normal, etc.

GENE_ID

STRING

Ensembl or RefSeq gene identifier

GID

NUMERIC

Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix

GENE_HGNC

STRING

HUGO/HGNC gene symbol

SOURCE

STRING

Gene model: 1=Ensembl, 2=RefSeq

TPM

NUMERICAL

Transcripts per million

LENGTH

NUMERICAL

The length of the gene in base pairs.

EFFECTIVE_LENGTH

NUMERICAL

The length as accessible to RNA-seq, accounting for insert-size and edge effects.

NUM_READS

NUMERICAL

The estimated number of reads from the gene. The values are not normalized.

The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.

Differential expression tables for genes and transcripts

These tables will only be available for data sets with ingested RNAseq molecular data.

Table for differential gene expression results:

Field Name

Type

Description

GENOMEBUILD

STRING

Genome build, always 'hg38'

STUDY_NAME

STRING

Study designation

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

CASE_LABEL

STRING

Study designation

GENE_ID

STRING

Ensembl or RefSeq gene identifier

GID

NUMERIC

Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix

GENE_HGNC

STRING

HUGO/HGNC gene symbol

SOURCE

STRING

Gene model: 1=Ensembl, 2=RefSeq

BASEMEAN

NUMERICAL

Fold-change

LFC

NUMERICAL

Log of the fold-change

LFCSE

NUMERICAL

Standard error for log fold-change

PVALUE

NUMERICAL

P-value

CONTROL_SAMPLECOUNT

NUMERICAL

Number of samples used as control

CONTROL_LABEL

NUMERICAL

Label used for controls

The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.

2023

2023 December 05 - ICA v2.21.0

Features and Enhancements

Flow
- Analysis logs (task stdout/stderr files) are now written to a folder named ‘ica_logs’ within the analysis output folder
- Default scratch disk size attached to analysis steps reduced from 2TB to 0B to improve cost and performance of analyses. Pipelines created before ICA v2.21.0 will not be impacted
Notifications
- Notifications can now be updated and deleted in externally managed Projects
API
- Clarified on the Swagger page which sorting options apply to which paging strategy (cursor-based versus offset-based). Changed the default sorting behavior so that:
  - When no paging strategy is specified and no sort is requested, then cursor-based paging is default
  - When no paging strategy is specified and sort is requested, then offset-based paging is default
Cohorts
- Procedure Search Box: Users can now access additional UI functionalities for Procedures
  - Users can now access Procedure codes from OMOP
  - Improved handling of drug codes across all reports, excluding Survival comparison
- Ingestion
  - Users now have enhanced job warning log and API status improvements
  - Users now require download permissions to facilitate the data ingestion process
- Fetch Molecular Files: Improved import – Users can now input a directory path and select sample files individually
- Variant Type Summary: Users can now access a new variants tab that summarizes Variant type statistics per gene
- Added sorting and filtering capabilities to report tables, such as variants observed in genes
- Users can now view sample barcodes, replacing internal auto-increment sample IDs in the Structural Variants table within the Genes tab
- “Search subjects” functionality improved with flexible filtering logic that now supports partial matches against a single input string

Fixed Issues

Data Management
- Fixed an issue with data copy via the CLI where the file was being copied to a subfolder of the intended location instead of the specified folder
- Resolved an issue where browser upload hangs intermittently when creating data
- Fixed an issue where the delete popup does not always disappear when deleting data
- Fixed an issue where GetFolder API call returns 404 error if the Create and Get operations are performed 100ms apart
- Fixed an issue where file copy would fail if the file was located at the root level of User’s S3 storage bucket
- Fixed an issue causing data linked from externally managed projects to be incorrectly excluded from the list project data API response
- Fixed an issue where User cannot use data URNs to identify the destination folder when interacting with copy data API endpoints
- Bundles: Fixed an issue where clicking the back button before saving a new bundle leads to inconsistencies
Flow
- Fixed an issue where pipeline documentation is not scrollable when launching pipeline
- Fixed an issue with logfiles of a task not being available for streaming while the task is still running
- Fixed an issue where using the 're-run' button from the analysis page reverts the storage size selection to default
- Fixed an inconsistency where the following two endpoints would show different analysis statuses:
  - GET /api/projects/{projectId}/analyses
  - GET /api/projects/{projectId}/analyses/{analysisId}
- Improved performance issues with UI loading data records when selecting inputs for analysis
- Fixed a caching issue which resulted in delays when running pipelines
- Fixed an issue where back button for analysis or pipeline details does not always direct Users back to analysis or pipelines view, respectively
- Fixed an issue where system performance is degraded when large batches (e.g., 1,000) of data are added as input to Analyses via the graphical UI. It is recommended to start Analyses with large numbers of input files via API
Base
- Fixed an issue where enabling Base from a Base view other than Base Tables returned a warning message
- Fixed an issue where Base access was not enabled when a bundle with tables is added to a project without Base (Base is automatically enabled so users can see the bundle's tables). However, access to the bundle's tables is revoked upon the deletion of Base, and was not granted again once Base was re-enabled
- Fixed an issue where a Base job to load data into a table never finished because the file was deleted after the job started and before it finished. Now the job will end up in a Failed state
Cohorts
- Fixed an issue where needle plot filtered out data points reappear when zooming in the exon when a filter is in place
- Fixed an issue where users from a different tenant who accept a project share may encounter a failure at the final step of the data ingestion process
- Fixed an issue where users can encounter intermittent errors when browsing and typing for a gene
- Fixed an issue where the UI hangs on large genes and returns a 502 error

2023 November 9 - ICA v2.20.1

Fixed Issues

Data Management
- Fixed an issue where multiple folder copy jobs with the same destination may get stuck In Progress
- Fixed an intermittent issue where tags on the target folder for a batch data update call are not set, but are set for all child data
Flow
- Fixed an issue causing intermittent pipeline failures due to an infrastructure error

2023 October 31 - ICA v2.20.0

Features and Enhancements

General
- Navigation: If multiple regions are enabled for the same tenant, the region will be indicated in the waffle menu
- Logging: Data transfers of BaseSpace Sequence Hub projects with data stored in ICA will be traced in ICA logs
Cohorts
- Disease Search Box: Added support for specifying subjects by age of onset of disease(s)
- Drug Search Box: Added a new query builder box for Drugs
  - Ingestion: Support for Drug, drug route, etc. attached to subjects
  - Cohorts building: Users can build cohorts by specifying drugs, drug route, etc.
- Ingestion
  - Combine different variant types during ingestion (small variants, cnv, sv)
  - Cohorts supports Illumina Pisces variant caller for hg19 VCFs

Fixed Issues

General
- Fixed an issue where the graphical UI hands with ha spinning wheel when saving or executing a command
- Fixed an issue where rich text editor for Documentation tab on Pipelines, Tools, Projects and Bundles does not populate with correct styles in edit mode
Data Management
- Fixed an issue where multiple clicks on create data in Project API endpoint resulted in multiple requests
- Fixed an issue where the secondary data selection screen could not be resized
- A spinning wheel icon with ‘copying’ status is displayed at the folder level in the target Project when a folder is being copied. This applies to the actual folder itself and not for folders higher up in the hierarchy
- Fixed an issue where API to retrieve a project data update batch is failing with 500 error when either the Technical or the User tags are updated during the batch update request
- Fixed an issue where linking jobs fail to complete if other linking jobs are running
- Improved performance for data transfer to support BaseSpace Sequence Hub Run transfers
- Fixed an issue causing some folder copy jobs to remain in "Partially Succeeded" status despite being completed successfullyBundles: Fixed an issue where the URL and Region where a Docker image is available is not displayed for a Docker image Tool shared via an entitled Bundle
- Fixed an issue where the folder copy job was getting stuck copying large amounts of big files
- Fixed an issue where the folder counts were not matching expected counts after Data linking
- Fixed an issue where delete data popup would occasionally not disappear after deleting data.
- Fixed an issue with data copy where referencing data from another region would not result in immediate failure
- Fixed issue where uploading a folder using the CLI was not working
- Fixed an issue where a Docker image shared via an entitled Bundle can be added to another region
Workflows
- Fixed an issue where workflow does not fail if BCL Convert fails for a BCL Convert-only run
Flow
- Improved performance when batches of data up to 1000 are added as input to an Analysis
- Nextflow engine will return exit code 55 if the pipeline runner task is preempted
- Fixed an issue where log files cannot be opened for any steps in an analysis while the analysis is in progress
- Fixed an issue with concurrent updates on analysis
- Fixed an issue where unknown data inputs in the XML of an analysis are not being ignored
- The warning, close, and machine profile icons for Tools can now be seen in the graphical CWL pipeline editor
- Fixed an issue where user cannot expand analysis output folder if user permissions change after starting analysis. Now, if a user has the correct permissions to start an analysis, that analysis should be able to finish correctly no matter the permissions at the time it succeeds
Base
- Fixed an issue switching back from template to Empty Table did not clear the fields
- Data linked from an externally managed project can be added to Base Tables
- Fixed an issue in the graphical UI where schema definition does not scroll correctly when many columns are defined

2023 October 3 - ICA v2.19.0

Features and Enhancements

Data Management/API
- Added a new endpoint available to change project owner
  - POST /api/projects/{projectId}:changeOwner { “newOwnerId”:”}
- Added a new endpoint to copy data from one project to another:
  - /api/projects/{projectId}/projectDataCopyBatch
Data Management/CLI
- Added the ability to copy files and folders between projects in the UI and CLI. This includes support for copying data from projects with ICA-managed storage (default) to projects with S3-configured storage.
Flow/API
- When starting an analysis via the API, you can specify the input files based on HTTP(s). When your analysis is done, you will see the URL corresponding to the inputs in the UI, but you will not be able to start an analysis from the UI using this URL
- Added two new endpoints for workflow sessions:
  - Get /api/projects/{projectId}/workflowSessions
  - Get /api/projects/{projectId}/workflowSessions/{workflowSessionId}/inputs
- Added a new endpoint to retrieve configurations from a workflow session
Flow/CLI
- Duplicate analyses submitted via the CLI will be avoided
Flow
- Removed the ability to start analyses from data and sample views in the UI where a single input is selected to start analyses in bulk
- Flow/Autolaunch ICA Workflow Session and Orchestrated Analyses (launched by the workflow session) now saves outputs in an organized folder structure: /ilmn-analysis/<name_used_to_create_sequencer_run_output_folder>
Base
- The Base module has a new feature called ‘Data Catalogue’. This allows you to add usage data from your tenant/project if that data is available for you.
  - Data Catalogue views will be available and can be used in Base to query on
  - You will be able to preview and query Data Catalogue views through Base Tables and Query screens
  - The Data Catalogue will always be up to date with the available views for your tenant/project
  - Data Catalogue views cannot be shared through a Bundle
  - Data Catalogue views will also be available to team members that were added after the view was added
  - Data Catalogue views can be removed from the Base tables and corresponding project
  - By removing Base from a project, the Data Catalogue will also be removed from that project
Cohorts: Disease Search box
- Cohorts now includes a disease search box to search for disease concepts. This replaces the disease concept tree explorer
- Disease search box located under a Disease tab in main Query builder
- Search box allows for a copy/paste action of codes to be processed as separate query elements. Currently, the feature is limited to a complete valid list
- Each disease entered into the search box is displayed as a separate query item and can be set to include or exclude.
- Diseases in search box can be used with boolean logic in cohort creation
- Search box allows for an auto-complete of diagnosis concepts and identifiers
- The disease filter is included in the cohort query summary on cohort page

Fixed Issues

Data Management
- Data copy between ICA-managed projects and S3 storage configured projects is supported
- Fixed an issue where storage configurations matching ICA-managed buckets would cause volume records to get associated with the wrong storage configuration in the system
API
- The endpoint GET/api/projects/{ProjectID}/samples/{SampleID} correctly returns all the own samples and linked samples
- Improved handling of bulk update via API when concurrent deletion of file has occurred
CLI
- Fixed an issue where projectdata update tags would not update the tags
- Fixed an issue to support adding the server-url as a parameter instead of having the config set
Flow
- Fixed an issue resulting in failure to send a notification resulting in a failed workflow
- Fixed an issue where one workflow session may override another when both are executed at the same time
Base
- Fixed an issue where query download in JSON format returns an error
- Added a message in the UI when a query takes longer than 30 seconds to inform the user that the query is ongoing and can be monitored in the Activity view
- Added a section describing the Data Catalogue functionality
Bench
- Fixed an issue where resizing the workspace to current size would prevent users from resizing for the next 6 hours
Cohorts
- Fixed an issue where Gene Expression table does not display with TCGA data or for tenants with a hyphen (e.g., ‘genome-group’)
- Fixed an issue where user had no way to delete a cohort comparison from a deleted cohort
- Fixed an issue in the UI where multi-cohort needle plot tracks are overlapping
- Fixed an issue causing failures during annotation step with ‘CNV’ data type when selection ‘GB=hg19’ and ‘CNV data’ for liftover; also observed with ‘SM data’ and ‘hg38’ without liftover (in APS1 and CAC1 regions) due to a ‘404 Not Found’ error.

2023 September 14 - ICA v2.18.4

Fixed Issue

Fixed an issue uploading folders via the CLI

2023 September 8 - ICA v2.18.3

Fixed Issue

Fixed an issue causing CWL pipelines using Docker images that do not contain bash shell executable to fail.

2023 September 7 - ICA v2.18.2

Fixed Issue

Fixed an issue leading to intermittent system instability.

2023 September 6 - ICA v2.18.1

Fixed Issue

Cohorts
- Issue fixed where GTEx plot is not available for tenants with a hyphen (e.g. ilmn-demo).

2023 August 31 - ICA v2.18.0

Features and Enhancements

General
- Versioning: The ICA version can now be found under your user when you select "About"
- Versioning/API: It is possible to retrieve system information about ICA, such as the current version through GET/api/systeminfo
- Logging: When an action is initiated by another application, such as BaseSpace Sequence Hub, it will be traced as well in the ICA logs
Data Management
- New API endpoints are available for:
  - Creation of a data update in bulk: POST/api/projects/{projectId}/dataUpdateBatch
    A list of data updates for a certain project: GET/api/projects/{projectId}/dataUpdateBatch/{batchId}
    A list of items from the batch update: GET/api/projects/{projectId}/dataUpdateBatch/{batchId}/items
    A specific item from the batch update: GET/api/projects/{projectId}/dataUpdateBatch/{batchId}/items/{itemId} Note: Batch updates include tags, format, date to be archived and date to be deleted
Data Management/API
- The sequencing run information can be retrieved through its Id by using the API endpoint GET/api/sequencingRuns/{sequencingRunId}
Flow:
- Auto launch now supports BCL Convert v3.10.9 pipeline and both TruSight Oncology 500 v2 pipelines (from FASTQs)
- Removed "fpga-small" from available compute types. Pipelines using "fpga-small" will use the "fpga-medium"-equivalent compute specifications instead
- Analyses launched/tracked by BaseSpace Sequence Hub contain relevant BaseSpace information in analysis details view
Flow/API
- getPipelineParameters API returns parameter type in response
- Added endpoints to retrieve and update a project pipeline definition
- New API endpoint available to request the analyses in which a sample is being used
- When leaving activationCodeDetailId empty when starting an analysis, the best match activation code will be used
Flow/API/CLI
- Include "mountPaths" field in response for API and CLI command to retrieve analysis inputs
API
- Two new API endpoints added to accept Terms and Conditions on a bundle:
  - GET /api/bundles/{bundleId}/termsOfUse/userAcceptance/currentUser Returns you the time of acceptance when you, the current user, accepted the Terms & Conditions.
  - POST /api/bundles/{bundleId}/termOfUse:accept
- Add temporary credentials duration to API documentation
Notifications
- List of events to which you can subscribe contains new ICA notification containing analyses updates
Bench
- A new Bench permission is being introduced: Administrator. This permission allows users to manages existing workspaces and create new workspaces
- The Bench Administrator role allows you to create new Bench workspaces with any permissions even if you as a Bench administrator do not have these permissions. In that case, you can create and modify the workspace, but you cannot enter that workspace. Modifying is only possible when the workspace is stopped
- As a Bench Contributor you are not allowed anymore to delete a Bench Workspace, you need the Bench Administrator role.
Cohorts
- Users can now ingest raw DRAGEN bulk RNAseq results for genes and transcripts (TPM), with the option to precompute differential expression during ingestion
- Added support for running multiple DEseq2 analyses in the ingestion workflow through bulk processing based on sample size and specific requirements
- In multiple needle plot view, individual needle plots can now be collapsed and expanded
- Pop-outs for needle plot variants now contain additional links to external resources, such as UCSC
- For a given cohort, display a distribution of raw expression values (TPM per gene) for a selected attributes
- Use of the Cohorts maintains session between core ICA and Cohorts iFrame to prevent unwanted timeouts
- Cohorts displays structural variants that include or overlap with a gene of interest

Fixed Issues

General
- Collaboration: Fixed an issue where a user is presented with a blank screen when responding to a project invitation
Data Management/API
- Improved error handling for API endpoint: DELETE/api/bundles/{bundleId}samples{sampleId}
- Fixed an issue where the API endpoint GET /api/samples erroneously returned a 500
- API endpoint GET/api/projects/{projectId}/analyses now returns the correct list when filtering on UserTags whereas it previously returned too many
- Improved retry mechanism for API endpoint to create folderuploadsession
Data Management/CLI
- When an upload of a folder/file is done through the CLI, it returns the information and ID of the folder/file
Data Management
- CreatorId is now present on all data, including subfolders
- Improved external linking to data inside ICA using deep linking
- Improved error handling when creating folders with invalid characters.
- Fixed an inconsistency for URN formats on output files from Analyses. This fix will apply only for analyses that are completed starting from ICAv2.18.0
- Improved resilience in situations of concurrent linking and unlinking of files and folders from projects
- It is only possible to delete a storage configuration if all projects that are using this storage configuration have been hidden and are not active projects anymore
- Improved accuracy of the displayed project data size. Prior cost calculations were accurate, but the project data size visualization included technical background data
- Fixed an issue where there is a discrepancy in number of configurations between Storage->Configurations and Configurations-> Genomics.Byob.Storage Configuration view
Flow/API
- Improved error handling when invalid project-id is used in API endpoint GET /api/projects/{projectId}/pipelines
- Fixed an issue when an Analysis completed with error "incomplete folder session", the outputs of the Analysis are not always completely listed in the data listing APIs
- Updated ICA Swagger Project > createProject to correctly state that the analysis priority must be in uppercase
Flow
- When a spot instance is configured, but revoked by AWS, the pipeline will fail and exit code 55 is returned
- Fix to return meaningful error message when instrument run ID is missing from Run Completion event during an auto launched analysis
- Improved parallel processing of the same analysis multiple times
Base
- Improved error handling when creating queries which use two or more fields with the same name. The error message now reads "Query contains duplicate column names. Please use column alias in the query"
- Fixed an issue where queries on tables with many entries fail with NullPointerException
Bench
- Clarified that changes to Bench workspace size only take effect after a restart
Cohorts
- Fixed issue where counts of subjects are hidden behind attribute names
- Fixed issue where the state of checked files are not retained when selecting molecular files that are in multiple nested folders
- Fixed issue where projects that contain files from linked bundles cause a time out, resulting in users not being able to select files for ingestion
- Fixed an issue where the 'Import Jobs' page loaded within the Data Sets frame, depending on where the import was initiated
- Fixed an issue in the Correlation plat where x-axis counts were hidden under attribute names
- Fixed an issue where users were previously incorrectly signed out of their active sessions

2023 August 3 - ICA v2.17.1

Fixed Issues

Fixed an issue causing analyses requesting FPGA compute resources to experience long wait times (>24h) or not be scheduled

2023 June 27 - ICA v2.17.0

Features and Enhancements

Data Management
- Performance improvements for data link and unlink operations – Larger and more complex folders can now be linked in the graphical UI, and progress can be monitored with a new visual indication under Activity > Batch Jobs
Notifications
- Notifications are now available for batch job changes
Flow
- Increased the allowed Docker image size from 10GB to >20GB
- CWL: Added support for javascript expressions “ResourceRequirements” fields (i.e., type, size, tier, etc.) in CWL Pipeline definitions
Flow/API
- Added support for using Pipeline APIs to query Pipelines included in Entitled Bundles (i.e., to retrieve input parameters)
- Added support for providing S3 URLs as Pipeline data inputs when launching via the API (using storage credentials)
- Added support for specifying multi-value input parameters in a Pipeline launch command
Bench
- Project and Tenant Administrators are now allowed to stop running Workspaces
Cohorts
- Enhanced ingestion workflow to ingest RNAseq raw data from DRAGEN output into backend Snowflake database
- Added support for running multiple DEseq2 analyses in the ingestion workflow through bulk processing based on sample size and specific requirements
- Multi-Cohort Marker Frequency - Added Multi-Cohort Marker Frequency tab allowing users to compare expression data across up to four Cohorts at the gene level
- Multi-Cohort Marker Frequency includes a pairwise p-value heat map
- Multi-Cohort Marker Frequency - Includes frequencies for Somatic and Copy Number Variants
- Tab added for a multi-cohort marker frequency analysis in cohort comparisons
- Multi-Cohort Needle Plot - Added new tab in the Comparison view with vertically aligned needle plots per cohort for a specified gene, allowing collapsible and expandable individual needle plots
- Additional filter logic added to multi-cohort needle plot
- Improved DRAGEN data type determination during ingestion allowing for multiple variant type ingestion
- Enhanced list of observed variants with grouped phenotypes and individual counts, including a column for total sample count; tooltips/pop-outs provide extended information
- Updates to needle plot link outs
- Improved the Comparison feature by optimizing API calls to handle subjects with multiple attributes, ensuring successful loading of the page and enabling API invocation only when the user selects or expands a section
- Removed unused columns (genotype, mrna_feature_id, allele1, allele2, ref_allele, start_pos, stop_pos, snp_id) from annotated_somatic_mutations table in backend database
- Refactored shared functionality for picking consequence type to reduce code duplication in PheWAS-plot and GWAS-plot components
- Invalid comparisons on the Comparisons page are now grayed out and disabled This improvement prevents the selection of invalid options
- Automatic retry of import jobs when there are failures accessing data from ICA API

Fixed Issues

General
- Navigation: Removed breadcrumb indication in the graphical UI
Data Management
- The content of hidden Projects can now be displayed
- Fixed the TimeModified timestamp on files
- Bundles: Resolved issues when linking a large number of files within a folder to a Bundle
Flow
- Single values are now passed as a list when starting an Analysis
- Pipelines will succeed if the input and output formats specified on the pipeline level match at the Tool level
- Fixed an issue causing Analysis failures due to intermittent AWS S3 network errors when downloading input data
- CWL: Improved performance on output processing after a CWL Pipeline Analysis completes
- Flow/UI: Mount path details for Analysis input files are now visible
- Flow/UI: Improved usability when starting an Analysis by filtering entitlement options based o inputs selected and available entitlements
Flow/API
- List of Analyses can now be retrieved via the API based on filters for UserReference and UserTags
Base
- Fixed an issue where the Scheduler continues to retry uploading files which cannot be loaded
Bench
- Resolved an issue when attempting to access Workspaces with multi-factor authentication (MFA) enabled at the Tenant-level
API
- Improved error messaging for POST /api/projects/{projectId}/data/{dataId}:scheduleDownload
Cohorts
- Fixed issue where Correlation bubble plot not showing for any projects intermittently
- Fixed issue where importing Germline/hg19 test file did not load variants for a specific gene in the Needle plot due to missing entries in the Snowflake table
- Fixed a bug causing an HTTP 400 error while loading the Cohort for the second time due to the UI passing "undefined" as variantGroup, which failed to convert to the VariantGroup Enum type
- Fixed issue where scale (y-axis) of needle plot is changed even if value of sample count gnomAD frequency is not accepted
- Fixed an issue where no data was generated in the Base Tables after a successful import job in Canada - Central Region (CAC1)
- Fixed issue where long chart axis labels overlap with tick marks on graph

2023 May 31 - ICA v2.16.0

Features and Enhancements

General
- Navigation: Updated URLs for Correlation Engine and Emedgene in the waffle menu
- Authentication: Using POST /api/tokens:refresh for refreshing the JWT is not possible if it has been created using an API-key.
- Authentication: Improved error handling when there is an issue reaching the authentication server
- Authentication: Improved usability of "Create OAuth access token" screen
Data Management
- You can now select 'CYTOBAND' as format after file upload
- Added support for selecting the root folder (of the S3 bucket) for Projects with user-managed
- Added support for creating an AWS Storage Configuration with an S3 bucket with Versioning enabled
Auto-launch
- Added technical tags for upstream BaseSpace Run information to auto-launched analyses
- Added support for multiple versions of BCL Convert for auto-launched analyses
Flow
- Added support for '/' as separator in CWL ResourceRequirements when specifying Compute Type
Flow/API
- The API to retrieve analysis steps now includes exit code for completed steps
Bench
- Restricted Workspaces (Open or Restricted) always allow for access to Project Data within the Workspace
- Restricted Bench workspaces have limited access through whitelisted URLs that are checked before entry
- Restricted Bench Workspaces allow for Open or Restricted workspaces. Restricted workspaces do not have access to the internet except for user-entered whitelist URLs

Fixed Issues

Data Management
- Upload for files names including spaces is now consistent for connector and browser upload. We do still advise not to use spaces in file names in general
- Fixed search functionality in Activity > Data Transfers screen
- Improved performance on opening samples
- Fixed an issue where reference data in download tab initiates an unexpected download
- Fixed intermittent issue where the Storage configuration within a Project can go into Error status and can block users from creating records such as folders and files
- Service Connector: Improved error message for DELETE/api/connectors/{connectorId}/downloadRules/{downloadRuleId}
Data Management/API
- Improved error handling for API endpoints: Delete/api/projects/ {projectId}/bundles/{bundleId} and POST/api/projects/{projectId}/bundles/{bundleId}
- Improved error handling for POST/api/projects/{projectId}/base:ConnectionDetails
Bundles
- Fixed an issue where the Table view in Bundles is not available when linking to a new Bundle version
- Fixed an issue where linking/unlinking a Bundle with Base Tables could result in errors
Bundles/API
- Improved error handling for DELETE/api/bundles/{bundleId}/tools/{toolId} and POST/api/bundles/{bundleId}/tools/{toolId}
- Improved error message for POST/api/bundles/{bundleId}/samples/{sampleId}
Notifications/API
- Custom subscriptions with empty filter expressions will not fail when retrieving them via the API
- Improved error handling for POST/api/projects/{projectId}/notificationSubscriptions
- Improved notification for Pipeline success events
Flow
- When the input for a pipeline is too large, ICA will fail the Analysis and will not retry
- Fixed issue where analysis list does not search-filter by ID correctly
- Improved error handling when issues occur with provisioning resources
- When retry succeeds in a Nextflow pipeline, exit code is now '0' instead of '143'
Flow/API
- Fixed an issue causing API error when attempting to launch an Analysis with 50,000 input files
- Improved pipeline error code for GET/api/projects/{projectId}/pipelines/{pipelineId} when already unlinked pipeline Id is used for API call
- Fixed an issue where Analyses could not be retrieved via API when the Pipeline contained reference data and originated from a different tenant
- Fixed filtering analyses on analysisId. Filtering happens via exact match, so part of the Id won't work
Bench/CLI
- Fixed issue where the latest CLI version was not available in Bench workspace images
Cohorts
- Fixed an issue where CNV data converted from hg19 to hg38 do not show up in Base table views
- Fixed an issue accounting for multiple methods of referring to the alternate allele in a deletion from Nirvana data
- Fixed intermittent issue where GWAS ingestions not working after Base enabled in a project.

2023 May 2 - ICA v2.15.1

Fixed Issue

Fixed an issue causing incorrect empty storage configuration dropdown during Project creation when using the “I want to manage my own storage” option for users with access to a single region

2023 April 25 - ICA v2.15.0

Features and Enhancements

General
- General availability of sequencer integration for Illumina sequencing systems and analysis auto launch
- General usability improvements in the graphical interface, including improved navigation structure and ability to switch between applications via the waffle menu in the header
- Storage Bundle field will be auto-filled based on the Project location that is being chosen if multiple regions are available
- Event Log entries will be paged in the UI and will contain a maximum of 1,000 entries. Exports are limited to the maximum number of entries displayed on the page.
- Read-only temporary credentials will be returned when you are not allowed to modify the contents of a file
- The ICA UI will only allow selection of storage bundles belonging to ICA during Project creation, and the API will only return storage bundles for ICA
Notifications
- Creating Project notifications for BaseSpace externally managed projects is now supported
Flow
- Allow attached storage for Pipeline steps to be set to 0 to disable provisioning attached storage and improve performance
Cohorts
- GRCh37/hg19-aligned molecular data will get converted to GRCh38/hg38 coordinates to facilitate cross-project analyses and incorporating publicly available data sets.
API
- Project list API now contains a parameter to filter on (a) specific workgroup(s)
- Two new API endpoints are added to retrieve regular parameters from a pipeline within or without a Project context

Fixed Issues

General
- Optimized price calculations resulting in less overhead and logging
- Improved error handling:
  - during Project creation
  - of own storage Project creation failures.
  - to indicate connection issue with credential
  - for graphical CWL draft Pipelines being updated during an Analysis
- Improved error messaging in cases where the AWS path contains (a) special character(s)
- Fixed an issue causing errors when navigating via deep link to the Analysis Details view
Data Management
- Fixed an issue causing data records to remain incorrectly in Unarchiving status when an unarchive operation is requested in the US and Germany regions
API
- Fixed returning list of unlinked data in a sample that was linked before in GET/api/projects/{projectId}/data
- Fixed error for getSampleCreationBatch when using status filter
CLI
- Unarchive of folders is supported when archive or unarchive actions are not in progress for the folder
- Improved error message to indicate connection issue with credentials
Flow
- Fixed an issue causing incorrect naming of Analysis tasks generated from CWL Expression Tools
- Fixed an issue when cloning Pipelines linked from Entitled Bundles to preserve the original Tenant as the Owning Tenant of the cloned Pipeline instead of the cloning user’s Tenant
- Fixed an issue causing outputs from CWL Pipelines to not show in the Analysis Details despite being uploaded to the Project Data Analysis output folder when an output folder is empty
- When a Contributor starts an Analysis, but is removed afterwards, the Analysis still runs as expected
- Fixed an issue where Analyses fail where Nextflow is run a second time
- Fixed an issue causing API error when attempting to launch an Analysis with up to 50,000 input files
- Fixed an issue causing degraded performance in APIs to retrieve Analysis steps in Pipelines with many steps
- Fixed an issue causing Analysis failure during output upload with error “use of closed network connection”
- Fixed an issue causing disk capacity alter log to not show when an Analysis fails due to disk capacity and added error message
- Fixed an issue preventing cross-tenant users from being able to open a shared CWL pipeline
Base
- Improved target Table selection for schedulers to be limited to your own Tables
Bench
- Fixed an issue causing Workspaces to hang in the Starting or Stopping statuses
Cohorts
- Now handles large VCFs/gVCFs correctly by splitting them into smaller files for subsequent annotation by Nirvana

2023 March 28 - ICA v2.14.0

Features and Enhancements

General
- Added a limit to Event Log and Audit UI screens to show 10,000 records
API
- Parent output folder can be specified in URN format when launching a Workflow session via the API
Flow
- Reduced Analysis delays when system is experiencing heavy load
- Improved formatting of Pipeline error text shown in Analysis Details view
- Users can now start Analyses from the Analysis Overview screen
- Superfluous “Namespace check-0” step was removed to reduce Analysis failures
- Number of input files for an Analysis is limited to 50,000
- Auto launched Workflow sessions will fail if duplicate sample IDs are detected under Analysis Settings in the Sample Sheet
Base
- Activity screen now contains the size of the query
Cohorts
- Detect and Lift Genome Build: Cohorts documentation provides set-up instructions to convert hg19/GRCh37 VCFs to GRCh38 before import into Cohorts.
- Attribute Queries: Improved the user experience choosing a range of values for numerical attributes when defining a cohort
- Export Cohort to ICA Project Data: Improved the user experience exporting list of subjects that match cohort definition criteria to their ICA project for further analysis
- Ingest Structural Variants into database
  - The Cohorts ingestion pipeline supports structural variant VCFs and will deposit all such variants into an ICA Base table if Base is enabled for the given project
  - Structural variants can be ingested and viewed in base tables
- Needle Plot Enhancements
  - Users can input a numerical value in the Needle Plot legend to display variants with a specific gnomAD frequency percentage or sample count
  - The needle plot combines variants that are observed among subjects in the current project as well as shared and public projects into a single needle, using an additional shape to indicate these occurrences
  - Needle Plot legend color changes for Variant severity; pathogenic color coding is the same as the color coding in the visualization; differentiating hue between proteins and variants; and other color coding changes.
  - Needle plot tool tips that display additional information on variants and mutations are now larger and modal
  - The needle plot now allows to filter by gnomAD allele frequency and sample count in the selected cohort. Variants include links to view a list of all subjects carrying that variant and export that list.
- Remove Samples Individually from Cohorts
  - Exclude individual subjects from a cohort and save the refined list
  - The subjects view allows users to exclude individual subjects from subsequent analyses and plots and save these changes Subject exclusions are reset when editing a cohort
- Subject Selection in Analysis Visualization: Users can follow the link for subject counts in the needle plot to view a list of subjects carrying the selected variant or mutation.
- UI/UX: Start and End time points are available as a date or age with a condition attribute in the subject data summary screen.

Fixed Issues

General
- Improved resilience against misconfiguration of the team page when there is an issue with Workgroup availability
- Removed ‘IGV (beta)’ button from ‘View’ drop down when selecting Project Data in UI
Data Management
- Improved handling of multi-file upload when system is experiencing heavy loads
- Fixed an issue to allow upload of zero-byte files via the UI
- Fixed issue where other Bundles would not be visible after editing and saving your Bundle
API:
- Improved error handling for API endpoint: POST /api/projects/{projectId}/analysisCreationBatch
- Improved performance of API endpoint: getbestmatchingfornextflow
Flow
- Fixed an issue causing Analysis output mapping to incorrectly use source path as target path
- Fixed an issue where the UI may display incorrect or invalid parameters for DRAGEN workflows which do not accurately show the true parameters passed. Settings can be confirmed by looking at the DRAGEN analysis log files.
Base
- “Allow jagged rows” setting in the Scheduler has been replaced with “Ignore unknown values” to handle files containing records with more fields than there are Table columns
- Improved Base Activity view loading time
- Fixed an error message when using the API to load data into a Base Table that has been deleted
Bench
- Fixed an issue resulting in incorrect Bench compute pricing calculations
- Fixed an issue preventing building Docker images from Workspaces in UK, Australia, and India regions
- Fixed an issue where /tmp path is not writeable in a Workspace
Cohorts
- Fixed issue where the bubble plot sometimes failed to display results even though the corresponding scatter plot showed data correctly.
- The order of messages and warnings for ingestion jobs was not consistent between the UI and an error report sent out via e-mail.
- The UI now displays any open cohort view tabs using shortened (“…”) names where appropriate
- Issue fixed where ingestions with multiple errors caused halting to the ingestion queue.
- The needle plot sometimes showed only one source for a given variant as opposed to all projects in which the variant had been observed in.
- Issue fixed with unhandled genotype index format in annotation file to base database table conversion
- Status updates via e-mail sometimes contained individual error messages or warnings without a text.
- Fixed issue where items show in needle plot with incorrect numbering on the y-axis.
- Fixed performance issue with subject count.
- Widget bar-chart counts are intermittently cut off over four digits.
- Fixed slowness when switching between tabs in query builder

2023 March 23 - ICA v2.13.2

Fixed Issue

Fixed issue with BaseSpace Free Trial and Professional users storing data in ICA

2023 March 9 - ICA v2.13.1

Fixed Issue

Fixed an issue resulting in analysis failures caused by a Kubernetes 404 timeout error

2023 February 28 - ICA v2.13.0

Features and Enhancements

General *
- Each tenant supports a maximum of 30,000 Projects
- .MAF files are now recognized as .TSV files instead of UNKNOWN
- Added VCF.IDX as a recognized file format
- General scalability optimizations and performance improvements
API
- POST /api/projects/{projectId}/data:createDownloadUrls now supports a list of paths (in addition to a list of IDs)

Fixed Issues

General
- Fixed an issue preventing the ‘Owning Project’ column from being used outside of Project
- Fixed an issue allowing the region of a Project to be changed. Changing the region of a resource is not supported
- Strengthened data separation and improved resilience against cross-Project metadata contamination
Bundles
- After creating a new Bundle the user will be taken to the Bundle Overview page
Data Management
- Fixed an issue which prevented changing the format of a file back to UNKNOWN
- Fixed an issue causing inaccurate upload progress to be displayed for UI uploads. The Service Connector or CLI are recommended for large file uploads.
- Fixed an issue showing an incorrect status for data linking batch jobs when data is deleted during the linking job
- Service Connector: Fixed an issue allowing download of a Service Connector when no operating system is set
- Service Connector: Cleaned up information available on Service Connectors by removing empty address information fields
API
- Fixed date formatting for GET /api/eventLog (yyyy-MM-dd’T’HH:mm:ss.SSS’Z’)
- Fixed an issue where the GET users API was not case sensitive on email address
- Fixed an issue causing the metadata model to be returned twice in PSOT /api/projects/{projectId}/samples:search
- Fixed the listProjects API 500 response when using the pageoffset query parameter
- The searchProjectSamples API returns Sample metadata for Samples shared via a Bundle
- Fixed an issue causing createProjectDataDownloadUrls API 400 and 502 errors when server is under load
Flow
- Fixed analysis failures caused by kubernetes 404 timeout error
- Fixed an issue where Workflwos would prematurely report completion of an Analysis
- Improved Pipeline retry logic to reduce startup delays
- Fixed an issue where Nextflow pipelines were created with empty files (Nextflow config is allowed to be empty)
- Removed the 1,000 input file limitation when starting an Analysis
- Improved the performance of status update messages for pipelines with many parallel steps
- Fixed an issue with overlapping fields on the Analysis Details screen
- Deactivated the Abort button for Succeeded analyses
Base
- Fixed an issue where Pipeline metadata was not captured in the metadata Table generated by the metadata schedule
- Error logging and notification enhancements
Bench
- Fixed an issue where Workspaces could be started twice
- Fixed an issue where the system checkpoint folder was incorrectly created in Project data when opening a file in a Workspace

2023 February 13 - ICA v2.12.1

Features and Enhancements

Analysis system infrastructure updates

2023 January 31 - ICA v2.12.0

Features and Enhancements

Added ability to refresh Batch Jobs updates without needing to leave the Details screen.
Projects will receive a job queuing priority which can be adjusted by an Administrator.
The text "Only showing the first 100 projects. Use the search criteria to find your projects or switch to Table view." when performing queries is now displayed both on the top and bottom of the page for more clarity.
API: Added a new endpoint to retrieve download URLs for data: POST/api/projects/{projectId}/data:createDownloadUrls
API: Added support for paging of the Project Data/getProjectDataChildren endpoint to handle large amounts of data.
API: Added anew endpoint to deprecate a bundle (POST /api/bundles/{bundleId}:deprecate)
API: If the API client provides request header "Accept-Encoding: gzip", then the API applies GZIP compression on the JSON response. This way the size of the response is significantly smaller which improves the download time of the response, resulting in faster end-to-end API calls. In case of compression the API also provides header "Content-Encoding: gzip" in the response, indicating that compression was effectively applied.
Flow: Optimized Analysis storage billing, resulting in reduced pipeline charges.
Flow: Internal details of a (non-graphical) pipeline marked ‘Proprietary’ will not be shared with users from a different tenant.
Flow: A new grid layout is used to display Logs for Analyses with more than 50 steps. The classic view is retained for analyses with 50 steps or less, though you can choose to also use the grid layout by means of a grid button on the top right on the Analysis Log tab.
CLI: Command to launch a CWL and Nextlfow Pipeline now contains the mount path as a parameter.
CLI: Version command now contains the build number.
CLI: Added support for providing the nextflow.config file when creating a Nextflow pipeline via CLI.
API: HTML documentation for aPipeline can now be returned with the following requests:
- GET /api/pipelines/{pipelineId}/documentation/HTML
- GET /api/projects/{projectId}/pipelines/{pipelineId}/documentation/HTML
API: Added a new endpoint for creating and starting multiple analyses in batch: POST /api/projects/{projectId}/analysisCreationBatch
Flow: Linking to individual Analyses and Workflow sessions is now supported by /ica/link/project//analysis/ and /ica/link/project//workflowSession/
Cohorts: Users can now export subject lists to the ICA Project Data as a file.
Cohorts: Users can query their ingested data through ICA Base. For users who already have ingested private data into ICA Cohorts, another ingestion will need to happen prior to seeing available database shares. Customers can contact support to have previously ingested data sets available in Base.
Cohorts: Correlation bubble plot counts now link to a subject/sample list.

Fixed Issues

Tooltip in the Project Team page provides information about the status of an invite
‘Resend invite’ button in the Project Team page will become available only when the invite is expired instead of from the moment the invite is being send out
Folders, subfolders and files all contain information about which user created the data
Files and folders with UTF-8 character are not supported. Please look at the documentation on how to recover from it in case you already have used them.
Improved performance for creating or hiding a Project in a tenant with many Projects
Service Connector: Updated information in the Service Connector screen to reflect the name change from "Type of Files" to the more accurate "Assign Format"
Service Connector: Folders within a Bundle can be downloaded via the Service Connector
Service Connector: Upload rules can only be modified in the Project where they apply
Service Connector: A message describes when a file is skipped during upload because it already exists in the Project
Service Connector: Fixed an issue where opening the Connectivity tab occasionally results in a null pointer error
Service Connector: Fixed an issue causing excessive logging when downloading files with long file paths
Service Connector: Fixed an issue where the Service Connector log may contain spurious errors which do not impact data transfers
Existing storage configurations are displayed and accessible via API and UI
Newly added storage configurations do no longer remain in ‘Initializing’ state
Fixed error when creating a storage configuration with more than 63 characters
Clicking on a Data folder in flat mode will now open the details of the folder
Only Tools in Released state can be added to a Bundle
Fixed issue preventing new Bundle versions to be created from Restricted Bundles
Deprecated Bundles are displayed upon request in card and table view
Bundles view limited to 100 Bundles
API: Fixed the API spec for ProjectDataTransfer.getDataTransfers
API: Fixed an issue with the projectData getChildren endpoint which returned incorrect object and pagination
API: Fixed an issue where multiple clicks on Create sample batch API endpoint resulted in multiple requests
API: POST /api/projects/{projectId}/data/{dataId}:scheduleDownload can now also perform folder downloads
API: Improved information on the Swagger page for GET /api/pipelines, GET/api/projects/{projectId}/pipelines, and GET/api/projects/{projectId}/pipelines/{pipelineId}
API: Fixed and issue when a user provides the same input multiple times to a multi-value input on an analysis run, that input is only passed to the pipeline once instead of multiple times: POST /api/projects/{projectId}/analysis:nextflow
CLI: Copying files in the CLI from a local directory on MacOS to your Project can result in both the desired file and the metadata file (beginning with ‘./’) being uploaded. The metadata file can safely be deleted from the Project
CLI: Hardened protection against accidental file overwriting
CLI: Improved handling for FUSE when connection to ICA is lost
CLI: icav2 projectdata mount –list shows updated list of mounted Projects
CLI: Paging improvements made for project list, projectanalyses list, and projectsdata list
CLI: When there is no config or session file the user will not be asked to create one for icav2 config reset and icav2 config get
CLI: Fixed an issue where Bundle data could not be seen through FUSE in Bench
CLI: Fixed an error message when missing config file upon entering the Project context
CLI: The unmount is possible without a path and will work via the stored Project ID or with a directory path resulting in an unmount of that path
CLI: Fixed an error when creating a Pipeline using URN for Project identifier
CLI: Attempting to delete a file from an externally-managed project returns an error indicating this not allowed
CLI: Fix to delete session file when config file is not detected
CLI: Paging option added to projectsamples list data
CLI: Fixed “Error finding children for data” error in CLI when downloading a folder
CLI: projectdata list now returns the correct page-size results
Flow: Fixed handling of special characters in CWL pipeline file names
Flow: Fixed an issue where task names exceeding 25 characters cause analysis failure in CWL pipelines
Flow: Fixed an issue which prevented requests for economy tier compute
Flow: Fixed an issue limiting CWL workflow concurrency to two running tasks
Flow: Fixed an issue where analysis file inputs specified in the input.json with ‘location’set to an external URL cause to CWL pipelines to fail
Flow: Fixed an issue resulting in out of sync Pipeline statuses
Flow: Improved Nextflow engine resiliency, including occurrences where Nextflow pipelines fail with ‘pod 404 not found’ error
Flow: Fix issue with intermittent system runtime failures incorrectly causing analysis failures
Flow: Fixed an issue where links to Analysis Details returned errors
Flow: Enabled scrolling for Pipeline documentation
Flow: Improved performance for handling analyses with large numbers of inputs
Flow: Improved handling of hanging Analyses
Flow: Improved error messages for failed Pipelines
Flow: Added documentation on how to use XML configuration files for CWL Pipelines
Flow: Duplicate values for multi-value parameters are no longer automatically removed
Flow: Correct exit code 0 is shown for successful Pipeline steps
Base: Fixed an issue so that only users with correct permissions are allowed to retrieve a list of Base tables
Base: Fixed an issue with metadata scheduler resulting in a null pointer
Base: An empty Table description will not return an error when requesting to list all Tables in a Project
Base: Jobs failed with an error containing 'has locked table' are not shown on the Base Job activity list. They can be displayed by selecting the 'Show transient failures' checkbox at Projects > Activity > Base Jobs.
Base: Users can see Schedulers and their results for the entire tenant if created by a tenant administrator in their project, but not create, edit or run them
Base: Fixed an issue preventing data format change in a schedule
Base: Fixed an issue preventing exporting data to Excel format
Bench: Improved handling to prevent multiple users in a single running Workspace
Bench: Fixed an issue causing Workspaces to be stuck in "Starting" state
Bench: Fixed an issue where usage does not showing up on usage CSV-based report
Bench: Fixed an issue where Bundle data could not be seen via the Fuse driver
Bench: Users can now consistently exit Workspaces with a single click on the ‘Back’ button.
Bench: After leaving a Workspace by clicking on the ‘Back’ button, the Workspace will remain in a ‘Running’ state and become available for a new user to access
Bench: Workspaces in a ‘Stuck’ state can be manually changed to ‘Error’ state, allowing users to restart or delete them
Cohorts: Fixed issue where file system cleanup not occurring after delete.
Cohorts: Fixed sign in and authentication issues in APN1 region.
Cohorts: Fixed issue where gene filter missing when editing a cohort and removing the edited filter and cancelling. The filter was preserved and should not have been.
Cohorts: Fixed issue where users see an application tile in the Illumina application dashboard selection screen called "Cohort Analysis Module".
Cohorts: Correlation: Fixed issue, Data type selections shows half when loading the search result
Cohorts: Fixed issue, Users will see an application tile on the Connected Platform home page screen called “Cohort Analysis Module” if the Cohorts module is added to the domain. Users should not enter the ICA Cohorts through this page. They should enter through ICA."

2024

2024 December 13 - ICA v2.31.2

Fixed Issues

When creating a new cohort, the disease filter’s tree hierarchy was not showing up, meaning it was not possible to add disease filter into the cohort definition. This has been resolved.

2024 December 12 - ICA v2.31.1

Fixed Issues

Flow
- Fixed an issue which caused service degradation where analysis steps were not properly updated until analyses were finished, and log files only became available after completion.

2024 December 4 - ICA v2.31.0

Features and Enhancements

General
- General usability improvements for the project overview screen
- The timing for when jobs are deleted has been updated so that:
  - SUCCEEDED remains 7 days
  - FAILED and PARTIALLY_SUCCEEDED are increased to 90 days
Data Management
- Data can now be uploaded into the BaseSpace-managed project
Flow
- Analyses can now be started from the pipeline details screen
- The analysis details now contain two additional tabs displaying timeline and execution reports for Nextflow analyses to aid in troubleshooting errors
- Introduced a start command for starting a Nextflow pipeline with a JSON-based input form
- Added new API endpoints to create a new CWL pipeline and start an analysis from a CWL pipeline with JSON-based input forms:
  - POST/api/projects/{projectId}/pipelines:createCwlJsonPipeline
  - POST/api/projects/{projectId}/analysis:cwlJson
- Pipelines with JSON-based input forms can now pre-determine and validate storage sizes
- Added support for tree structures in dropdown boxes on JSON-based input forms to simplify searching for specific values
- Introduced a new filtering option on the analyses gid to enable filtering for values which differ from, or do not equal (!=), a given value (such as exit codes in the pipeline steps in the analysis details screen)
- The analysis output folder format will now be user reference-analysis id
Cohorts
- The side panel now displays the Boolean logic used for a query with ‘AND’, ‘OR’ notations
- The needle plot visualization now drives the content of the variant list table below it. By default, the list displays variants in the visualization and can be toggled to display all variants with subsequent filtering
- For diagnostic hierarchies, concept children count and descendant count for each disease name is displayed
- The measurement/lab value can be removed when creating query criteria

Fixed Issues

General
- Notification channels are not created at the tenant level and are not visible to members of external tenants working on the same project
Data Management
- Fixed an issue where move jobs fail when the destination is set to the user’s S3 bucket where the root of the bucket mapped to ICA as storage configuration and volume
- Fixed a data synchronization issue when restoring an already restored object from a project configured with S3 storage
Flow
- Corrected the status of deleted Docker images from incorrect ‘Available’ to ‘Deleted’
- The reference for an analysis has changed to userReference-UUID, where the UUID matches the ID from the analysis. (The previous format was userReference-pipelineCode-UUID.)
- Pipeline files are limited to a file size of 20 Megabytes
Bench
- Fixed an issue which caused ‘ICA_PROJECT_UUID not found in workspaceSettings.EnvironmentVariables’ when creating a new Workspace
Cohorts
- Fixed an issue where the system displays ALL/partial filter chips when the top level tree node is selected in a hierarchical search
- Fixed an issue where the system displays 400 bad request error despite valid input of metadata files during import jobs
- Fixed an issue where the system displays inconsistent hierarchical disease filter results
- Fixed an issue where the system changes the layout when displaying the p-value column
- Fixed an issue where the system disables the next blutton when there is no study available in the dropdown menu
- Fixed an issue where studies could not be selected when a project has one study to ingest data into

2024 October 31 - ICA v2.30.1

Fixed Issues

Mitigated an issue causing intermittent system authentication request failures. Impact includes analysis failures with "createFolderSessionForbidden" error

2024 October 30 - ICA v2.30.0

Features and Enhancements

General
- The projectdata upload CLI command will from now on give you the credentials to access the data
Data Management
- Introduced a limit to the number of data elements that can be put in POST /api/projects/{projectId}/dataUpdateBatch to 100.000 entries
Flow
- Users can now access json-based pipeline input forms for both Nextflow and CWL pipelines. API access is not yet available for CWL pipelines
- Added GPU compute types (gpu-small, gpu-medium) for use in workflows
- Users can now sort analyses by request date instead of start date, which was not always available
- The analysis details page has been upgraded with the following features:
  - The progress bar which could be found on the analyses overview page will now also appear in the details page
  - A maximum of 5 rows of output are shown for each output parameter, but the output can be displayed in a large popup to have a better overview
  - Orchestrated analyses are shown in a separate tab
Cohorts
- Users can now use the Measurement concept API to create cohorts based on lab measurement data and harmonize their values to perform downstream analysis
- Users can now access the Hierarchical concept search API to view the phenotype ontologies

Fixed Issues

General
- The mail option is now automatically filtered out for those events that do not support it
- Fixed an issue where there was no email sent after rerunning a workflow session
- Fixed an issue which caused authentication failures when using a direct link
- Made file and folder selection more consistent
- Fixed an issue with the CLI where using the “projectsamples get” command to retrieve a sample shared via an entitled bundle in another tenant failed
- Fixed filtering so you can only see subscriptions and channels from your own tenant
- Improved GUI handling for smaller display sizes
- Fixed the workflow session user reference and output folder naming to use BaseSpace Experiment Name when available
Data Management
- The unlink action is now greyed out if data not linked is in the selection
- Fixed an issue that when deleting folders, the parent folders were deleted first, giving the impression that the parent folder is deleted but not the subfolders and files
- Fixed an issue where the connector downloads only downloaded the main folder, not the folder contents
- For consistency, it is no longer possible to link to folders or files from within subfolders. Previously, you could link, but the files and folders are always linked to top level instead of the subfolder from where linking was done
- Updated error handling for dataUpdateBatch API endpoint
- Moving small files (>8Mb) will not trigger a "moving" event, only that the move has completed as out-of-order events caused issues and moving small files happens fast enough to not need the status of being moved, only the completion of the move
- Improved error handling when encountering issues during cancellation of data copy/move
- Improved error message when trying to unlink data from a project via the API when this data is native to that project and not linked
- Fixed issue where analysis can proceed to download input data when any of the inputs are in status other than AVAILABLE, including records within folder data inputs
Flow
- Redesigned UI component to prevent issues with Analysis summary display
- Fixed an issue where the field content was not set to empty when the field input forms have changed between the original analysis and a rerun
- Replaced retry exhaustion message, "External Service failed to process after max retries 503 Unique Reference ID: 1234" with a more useful message to end users that advises them to contact Illumina support: "Attempt to launch ICA workflow session failed after max retries. Please contact Illumina Tech Support for assistance. Unique Reference ID: 1234". This does not replace more specific error messages that provide corrective advice to the user, such as "projectId cannot be null"
- For efficiency reasons, pipeline files are limited to a file size of 100 Megabytes
Bench
- Fixed an issue which caused .bash_profile to no longer source .bashrc
- Fixed the status of deleted docker images which previously were displayed as available
- After creating a tool, the Information tab and Create Tool are now no longer accessible to prevent erroneous selection
Cohorts
- Fixed layout issue where buttons were moved up when the user selected the option
- Fixed issue where user was not able to view PheWas plot when multiple cohorts are open and same gene is searched
- Fixed issue where the user was not able to view GWAS plot when multiple cohorts are open and user switched forth and back between cohorts
- Fixed issue where users were not able to see the cryogenic map in the gene summary page for gene associated with the chromosome

2024 September 27 - ICA v2.29.1

Fixed Issues

General
- Fixed an issue where various Data Transfer Ownership API calls were failing with a 'countryView' constraint violation error

2024 September 25 - ICA v2.29.0

Features and Enhancements

General
- Dynamically linked folders and files now have their own icon type, which is a folder/file symbol with a link symbol consisting of three connected circles
Data Management
- With the move from static to dynamic data linking, unlinking data is now only possible from the project top level to prevent inconsistencies
- The user can now manually, dynamically link a folder
- The icav2 project data mount command now supports the “--allow-other” option to allow other users access to the data
- The user can now set a time to be archived or deleted for files and folders with the “timeToBeArchived” and “timeToBeDeleted” parameter on the “POST/api/projects/{projectId}/dataUpdateBatch” command
- Added 4 new API endpoints which combine the create and retrieve upload information
Flow
- The default Nextflow version is now 22.04.03, from 20.10.0
- The user can now specify the Nextflow version when deploying a pipeline via the CLI with the “--nextflow-version” flag
Bench
- The user now has the option to choose either a tool or bench as a docker image when adding new docker images
- It is now possible to open contents of a Bench workspace in a new tab from the Bench details tab > access section

Fixed Issues

General
- Improved handling of API calls with an invalid or expired JWT or API token
Data Management
- Renamed the "New storage credential" button to "Create storage credential"
- Removed the "Edit storage credential" button. The user can now edit the column directly in the open dialog when clicking on the name
- Performance improvements to scheduled data download
- Fixed an issue where data records were shown more than once when updating the tags\
- The data details were erroneously labeled with "size in bytes" while the size was in a variable unit
- Fixed an issue where trying to download files could result in the error "Href must not be null" when the file was not available
- Fixed an issue where existing data catalog views would return an empty screen caused by a mismatch in role naming
Flow
- Fixed an issue that caused opening a pipeline in the read-only view to incorrectly detect there were unsaved changes to the pipeline
- Fixed an issue when having different pipeline bundles with the same (name) resource models would result in duplicate listing of these resources
- Improved error handling when encountering output folder creation failure, which previously could result in analysis being stuck in REQUESTED status
  - By default Nextflow will no longer generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file:
    trace.enabled = true
    trace.file = '.ica/user/trace-report.txt'
    trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'
- Fixed the issue where users not allowed to run or rerun workflows could start them from the API or BaseSpace SequenceHub. Now, users that cannot start workflows cannot rerun them.
Cohorts
- Fixed issue where the users can select the lower needles despite the overlap of multiple needles at the same location in the needle plot
- Fixed issue where the user would not be able to view the cryogenic map in the gene summary page for gene associated with the chromosome
- Fixed issue where the user would not be able to view the PheWas plot when multiple cohorts are open and same gene is searched

2024 September 5 - ICA v2.28.1

Features and Enhancements

Data Management
- Improved performance of data linking jobs

Fixed Issues

General
- Fixed an issue causing slow API responses and 500 errors

2024 August 28 - ICA v2.28.0

Features and Enhancements

General
- The CLI readme file will now additionally contain the CLI build number
Data Management
- Fixed an issue where there was a discrepancy between the Run Input tags shown to the user and what was stored on the data
- Added a 25,000-item limit to the v3 endpoint for batch data linking. Using the v4 endpoint, which does not have this limitation, is recommended
Flow
- Analyses and workflow sessions can now be resubmitted, and parameters can be updated upon resubmission
- Changed the default image used for CWL pipeline processes with undefined image from docker.io/bash:5 to public.ecr.aws/docker/library/bash:5
- Updated the choice of default nextflow docker image which is used when no docker image is defined. It is now public.ecr.aws/lts/ubuntu:22.04_stable
- The analysis logs in the analysis details page can be refreshed
- The user is now able to write a pipeline which executes files located in the /data/bin directory of the runtime environment
- Pipeline files are now shown in a tree structure for easier overview
Cohorts
- Updated GWAS UK Biobank data base gives users access to more phenotype information
- Users can now incrementally ingest their molecular data for germline, CNV, structural variants, and somatic mutation data

Fixed Issues

General
- Added an "All" option to the workgroup selection box in the projects view to reset the filter, which previously required you to delete all characters from the filter
- Fixed an issue where updating two base permissions at the same time would sometimes not execute correctly
- Fixed an issue where creating grid filters could result in a nullpointer error
- Fixed an issue where 'Copy to Clipboard' button did not work anymore
- After searching for a folder in the search box and going into that folder, the search box is now cleared
- Improved the project permissions API to correctly handle empty values
- Previously, when attempting to save and send a message from the Websolutions section without a unique subject, the system would report an error and still send the message. Now the non-unique message subject error is reported and no message is sent
- Fixed an issue where linking samples in the sample screen would result in receiving the same "sample(s) linked" message twice
- Improved error handling for CLI FUSE driver
- Hardened log streaming for ongoing runs to better handle network issues which previously would result in missing log streaming
- Add retries for "connected reset by peer" network-related errors during analysis upload tasks
- Fixed an issue where inviting a user to collaborate on a project containing base would result in the error "entity not managed" if that user did not have base enabled in any project or if base was not enabled in the project tenant
- Data Management
- Fixed an issue where data could be moved to a restricted location called /analyses/ and no longer be visible after the move. Please contact Illumina Support with your data move job information to recover your data if you have encountered this issue
- Fixed an issue where sorting on data format did not work correctly
- Copying empty folders no longer results in a partially copied status
- ICA now performs an automatic refresh after unlinking or deleting a sample
- Improved handling of file path collisions when handling linked projects during data copy / move
- Fixed an issue where, even though uploading a file in a linked folder is not permitted, this would erroneously present a success message without copying the file
- Analysis-events which are too large for SQS (256KB) are now truncated at the first 1000 characters when using SQS
- Improved error handling when trying to upload files which no longer exist
- Fix system degradation under load by introducing rate limit for spawning tasks for a given analysis to 25 per 1 min
- The createUploadUrl endpoint can now be used to upload a file directly from the region where it is located. The user can create both new files and overwrite files in status "partial"
- Improved the project data list command with wildcard support. For example:
  - / or /* will return the contents of the root
  - /folder/ will return the folder
  - /folder/* will return the contents of the folder
- To optimize performance, a limit has been set to prevent concurrent uploading of more than 100 files
- Fixed an issue where folder syncing functionality would sometimes result in “Unhandled exception in scheduling worker”
Flow
- Fixed an issue where writing a pipeline which executes files in the /data/bin folder wasn't functioning properly with larger storage classes
- Nextflow pipelines no longer require pipeline inputs when starting them via the CLI
- Improved error handling when using an unsupported data format in the XML input forms during pipeline creation
- Fixed the issue where it was not possible to add links in the detail page for pipelines and bundles
- Sorting is no longer supported on duration and average duration columns for analysis grids
- In situations where the user would previously get the error "zero choices with weight >= 1" after the first attempt, additional retries will execute to prevent this from occurring
Cohorts
- Fixed an issue resulting in a blank error when a cohort with hundreds of diagnostic concepts was created

2024 July 10 - ICA v2.27.1

Features and Enhancements

Flow
- Improved analysis queue times during periods of limited compute resource availability

2024 June 25 - ICA v2.27.0

Features and Enhancements

General
- New notification to the user when a copy job finishes running
- Updated the "GET analysis storage" API endpoint to account for the billing mode of the project. If the billing mode of the project is set to tenant, then the analysis storage of the user's tenant will be returned. If the billing mode of the project is set to project, then the analysis storage of the project's owner tenant will be returned
- A ReadMe file containing used libraries and licenses is now available for ICA CLI
Data Management
- New DataFormats YAML (.yaml, .cwl), JAVASCRIPT (.js, .javascript), GROOVY (.groovy, .nf), DIFF (.diff), SH (.sh), SQL (.sql) to determine the used syntax highlighting when displaying/editing new pipeline files in the UI
- ICAv2 CLI supports moving data both between and within projects
- Added an alert to notify users when data sharing or data move is disabled for the project
- A new version of the Event Log endpoint has been developed to support paging, retrieval of previous events, and resolution of inconsistencies in date formats. This new endpoint introduces the EventLogListV4 data transfer object
- The user is now able to select a single file and download it from the browser directly. This does not apply for folders and multiple files selected at once
- User can subscribe to notifications when data is unarchived
- The BaseSpace Sequencing Run Experiment name will now be added to the technical tags when a workflow session is launched
Flow
- Fastqs with the .ora extension are now supported when staging these for secondary analysis, either as a list of fastqs or as fastq_list_s3.csv files
- Before, users had to click on the pipeline on the pipeline overview screen to start a new analysis. Now, you will enter the pipeline in edit mode when you click on the pipeline name. If you want to select a pipeline to start an analysis, you need to check the checkbox

Fixed Issues

General
- Removed the refresh button from the workspace detail view as it was superfluous
- Fixed an issue where searching for certain characters in the search field of the Projects or Data overviews screen would result in an indecipherable error
- Improved security handling around tenant admin-level users in the context of data move
Data Management
- Fixed a bug so folders copied from another previously copied folder no longer results in a corrupted file
- Fixed an issue where creating a new bundle would result in an error if a project with the same name already exists
- Data move between projects from different tenants is now supported
- Fixed an issue where not selecting files before using the copy or move commands would result in EmptyDataId errors
- For the CLI, Improved notifications when files can not be downloaded correctly
- Fixed an issue where scheduled downloads of linked data would fail without warning
- Corrected an issue where the tenant billing mode would be erroneously set to Illumina after a data copy
- Fixed an issue where BatchCopy on linked data did not work
Flow
- Resolved an issue to ensure that when a user creates a pipeline using a docker image shared from an entitled bundle, their analyses utilizing that pipeline can pull the docker image without errors
- Removed superfluous options from the analysis status filter
- Awaiting input
- Pending request
- Awaiting previous unit
- Fixed an issue where writing a pipeline which executes files in the /data/bin folder wasn't functioning properly with larger storage classes
- Fixed an issue where many-step analyses are getting stuck in "In Progress" status
- Fixed an issue where the wrapper scripts when running a CommandLineTool in CWL would return a warning
- Fixed the issue which caused the "Save as" option not to work when saving pipelines
Base
- Fixed an issue where the ICA reference fields in the schema definition had the wrong casing. As a result of this update you might end up with 2 different versions of the reference data (one with keys written with an uppercase letter at the start, another one with keys written entirely in lowercase letters). To fix this:
- Update your queries and use the Snowflake function : GET_IGNORE_CASE (ex: select GET_IGNORE_CASE( to_object(ica) , 'data_name' ) from testtableref)
- Update the 'old' field names to the new ones (ex: update testtableref_orig set ica = object_delete(object_insert(ica, 'data_name', ica:Data_name), 'Data_name'))
- Fixed an issue where using an expression to filter the "Base Job Success" event is not working

2024 June 6 - ICA v2.26.1

Fixed Issues

Flow
- Resolved an issue to ensure that when a user creates a pipeline using a docker image shared from an entitled bundle, their analyses utilizing that pipeline can pull the docker image without errors.

2024 June 5 - ICA v2.26.0

Features and Enhancements

General
- The left side navigation bar will collapse by default for screen smaller than 800 pixels. The user can expand it by hovering over it
- The browser URL may be copied to share analyses, pipelines, samples, tools, workspaces and data in various contexts (project, bundle)
Data Management
- Users are now able to move data within and across projects:
  - The user can:
    Move available data
    Move up to 1000 files and/or folders in 1 move operation
    Retain links to entities (sample, sequencing run, etc.) and other meta-data (tags, app-info) when moving
    Move data within a project if the user is a contributor
    Move data across projects if (1) in the source project the user has download rights, has at least contributor rights, and data sharing is enabled, and (2) the user has upload rights and at least viewer rights in the target project
    Move data across projects with different types of storage configurations (user-defined or default ICA-managed storage)
    Select and move data to the folder they are currently in through the graphical UI
    Select and move data in a destination project and/or folder through the API
  - The user cannot:
    Move linked data. Only the source data can be moved
    Move data to linked data. Can only move data to the source data location
    Move data to a folder that is in the process of being moved
    Move data which is in the first level of the destination folder
    Move data to a destination folder which would create a naming conflict such as a file name duplicate
    Move data across regions
- New Event Log entries are provided when a user links (ICA_BASE_100) or unlinks (ICA_BASE_101) a Cohorts data set to a bundle
- Added support for the following data formats: ora, adat, zarr, tiff and wsi
Flow
- New compute types (Transfer Small, Transfer Medium, Transfer Large) are supported and can be used in upload and download tasks to significantly reduce overall analysis runtimes (and overall cost)
- API: All the endpoints containing pipeline information now contain the status from the pipeline(s) as well
Bench
- External Docker images will no longer display a status as they consistently showed 'Available,' even when the URL is not functional
Cohorts
- Performance improvements to needle plot by refactoring its API endpoint to return only sample IDs
- Users now click a cancel button that returns them to the landing page
- Users can now perform time series analysis for a single patient view
- Refresh of PrimateAI data now drives data in variant tables
- Users can now access the structural variant tab in the Marker frequency section

Fixed Issues

General
- Fixed an issue where, when a user is added to or removed from a workgroup, they could be stuck on an infinite redirect loop when logging in
- Fixed syncing discrepancy issues about deleted files in user-managed storage projects with Lifecycle rules & Versioning
Data Access & Management
- Sorting API responses for the endpoint GET /api/jobs is possible on the following criteria: timeCreated, timeStarted and timeFinished
- Improved the error message when trying to link a bundle which is in a different region than the project
- More documentation has been added to the GET /eventLog regarding the order of rows to fetch
- Fixed an issue where the API call - POST api/projects/{projectId}/permissions would return an error when DATA_PROVIDER was set for roleProject
- Fixed an issue stemming from attempts to copy files from the same source to the same destination, which incorrectly updated file statuses to Partial
- CLI: Fixed an issue where the environment variable ICAV2_X_API_KEY did not work
Flow
- The analysis is no longer started from the API if error 400 ( 'Content-Type' and 'Accept' do not match) occurs
Base
- Fixed an issue where the Base schedule would not run automatically in some cases when files are present in the schedule
Bench
- Improved error handling when trying to create a tool with insufficient permissions
- Fixed an issue where the user is unable to download docker-image with adhoc-subscription
- The "version":"string" field is now included in the API response GET /api/referenceSets. If no version is specified, the field is set to "Not Specified"
- Fixed an issue where, under some conditions, fetching a job by id would throw an error if the job was in pending status

2024 April 24 - ICA v2.25.0

Features and Enhancements

Data Management
- The GUI now has a limit of 100 characters for the name and 2048 characters for the URL for links in pipelines and bundles
- Added a link to create a new connector if needed when scheduling a data download
- Improved the data view with additional filtering in the side panel
Flow
- New CLI environment variable ICA_NO_RETRY_RATE_LIMITING allows users to disable the retry mechanism. When it is set to "1”, no retries are performed. For any other value, http code 429 will result in 4 retry attempts after 0.5, 2, 10, and 30 seconds * Code-based pipelines will alphabetically order additional files next to the main.nf or workflow.cwl file
- When the Compute Type is unspecified, it will be determined automatically based on CPU and Memory values using a "best fit" strategy to meet the minimum specified requirements
Bench
- Paths can be whitelisted to allowed URLs on restricted settings

Fixed Issues

General
- Fixed an issue where the online help button does not work upon clicking on it
Data Access & Management
- Improved automatic resource cleanup when hiding a project
- Fixed an issue with the service connector where leading blanks in the path of an upload/download rule would result in errors. It is no longer possible to define rules with leading or trailing blanks
- Fixed an issue where a folder copy job fails if the source folder doesn't have metadata set
- Linking data to sample has been made consistent between API and GUI
- Improved resource handling when uploading large amounts of files via the GUI
- Fixed an issue where the API endpoint to retrieve input parameters for a project pipeline linked to a bundle would fail when the user is not entitled on the bundle
- Fixed an issue where deleting and adding a bundle to a project in one action does not work
Flow
- The event sending protocol was rewritten to limit prematurely exhausting event retries and potentially leaving workflows stuck when experiencing high server loads or outages
- Fixed an issue where specifying the minimum number of CPUs using coresMin in a CWL workflow would always result in the allocation of a standard-small instance, regardless of the coresMin value specified
- Fixed an issue in the API endpoint to create a Nextflow analysis where tags were incorrectly marked as mandatory inputs in the request body
- Fixed an issue with intermittent failures following completion of a workflow session
Base
- Improved syntax highlighting in Base queries by making the different colors more distinguishable
Bench
- Fixed an issue where the Bench workspace disk size cannot be adjusted when the workspace is stopped. Now, the adjusted size is reflected when the workspace is resumed
- Fixed an issue where regions were not populating correctly for Docker images
- Fixed an issue where API keys do not get cleaned up after failed workspace starts, leading to unusable workspaces once the API key limit is reached

2024 April 15 - ICA v2.24.2

Features and Enhancements

Cohorts
- Users can now query variant lists with a large number of associated phenotypes
- Users can now perform multiple concurrent data import jobs

Fixed Issues:

Cohorts
- Fixed an issue with displaying shared views when refreshing a Bundle’s shared database in Base

2024 April 4 - ICA v2.24.1

Fixed Issues

Fixed an issue where autolaunch is broken for any users utilizing run and samplesheet inputs stored in BSSH and operating in a personal context, rather than a workgroup.

2024 March 27 - ICA v2.24.0

Features and Enhancements

Data Management
- Data (files and folders) may be copied from one folder to another within the same Project
- The empty ‘URN’ field in the Project details at Project creation is now removed
- The ‘Linked Bundles’ area in the Project details at Project creation is now removed as you are only allowed to link Bundles after Project creation
- The card or grid view selected will become the default view when navigating back to the Projects or Bundles views
- Added a new API endpoints to retrieve and accept the Terms & Conditions of an entitled bundle:
  - /api/entitledbundles{entitledBundleId}/termsOfUse
  - /api/entitledbundles/{entitledBundleId}/termsOfUse/userAcceptance/currentUser
  - /api/entitledbundles/{entitledBundleId}/termsOfUse:accept
Flow
- Added a new API endpoint to retrieve orchestrated analyses of a workflow session
  - GET /api/projects/{ProjectID}/workflowSessoins/{WorkflowSessionID}/analyses
- Code-based pipelines will alphabetically order additional files next to the main.nf or workflow.cwl file
Bench
- New JupyterLab - 1.0.19 image published for Bench using the Ubuntu 22.04 base image
- Resources have been expanded to include more options for compute families when configuring a workspace. See ICA help documentation for more details
Cohorts
- Sample count for an individual cohort may be viewed in the variants table
- Filter the variants list table through the filter setting in the needle plot
- Execute concurrent jobs from a single tenant
- Improved the display of error and warning messages for import jobs
- Structural variant tab may be accessed from the Marker frequency section

Fixed Issues

Data Access & Management
- Bundles now reflect the correct status when they are released instead of the draft status
- Double clicking a file opens the data details popup only once instead of multiple times
- Improved performance to prevent timeouts during list generation which resulted in Error 500
- The counter is now accurately updated when selecting refresh in the Projects view
- Fixed an issue resulting in one job to succeed and one to fail when running two or more file copy jobs at the same time to copy files from same project to same destination folder
- Fixed an issue resulting in an error in a sample when linking nested files with the same name
- Added a new column to the Source Data tab of the Table view which indicates the upload status of the source data
- Removed the unused ‘storage-bundle’ field from the Data details window
- Fixed an issue where the Project menu does not update when navigating into a Project in Chrome browsers
- (CLI) Fixed an issue where deleting a file/folder via path would result in an error on Windows CLI
Base
- Improved schedule handling to prevent an issue where some files were not correctly picked up by the scheduler in exceptional circumstances
- Fixed an issue where an incorrect owning tenant is set on a schedule when running it before saving
- The number of returned results which is displayed on the scheduler when trying to load files now reflects the total number of files instead of the maximum number of files which could be displayed per page
- Fixed an issue where Null Pointer Exception is observed when deleting Base within a Project
Bench
- Fixed an issue where users were unable to delete their own Bench image(s) from the docker repository
Cohorts
- Fixed an issue where the value in the tumor_or_normal field, in the phenotype table in database, would not set properly for germline and somatic mutation data
- Fixed an issue where large genes with subjects containing large sets of diagnostic concepts caused a 503 error

2024 March 7 - ICA v2.23.1

Fixed Issues

Fixed an issue where automated analysis after sequencing run in non-US regions may fail for certain analysis configurations

2024 February 28 - ICA v2.23.0

Features and Enhancements

Data Management
- The --exclude-source-path flag has been added to the ‘project data download’ command so that subfolders can be downloaded to the target path without including the parent path
- The system automatically re-validates storage credentials updated in the graphical UI
- Added a new API endpoint to validate storage configurations after credentials are changed: /api/storageConfigurations/{storageConfigurationId}:validate
Notifications
- Added support for multi-version notification event payloads corresponding to versioned API response models
Flow
- (API) Improved the analysis-dto by adding a new POST search endpoint as a replacement for the search analysis GET endpoint. The GET endpoint will keep working but we advise using the new POST endpoint.
- Improved analysis statuses to reflect the actual status more accurately
- Parallelized analysis input data downloads and output data uploads to reduce overall analysis time
- No scratch size is allocated if tmpdirMin is not specified
Cohorts
- Performance improvements of the ingestion pipeline
- Performance improvements to subject list retrieval
- Increased the character limit of ingestion log messages to the user

Fixed Issues

Data Access & Management
- Fixed an issue where the target user cannot see analysis outputs after a successful transfer of analysis ownership in BaseSpace Sequence Hub
- Update the API Swagger documentation to include paging information for: /projects/{projectId}/samples/{sampleId}/data
- Fixed an issue resulting in errors when creating a new bundle version
- Fixed an issue where the GET API call with the ‘Sort’ parameter returns an error when multiple values are separated by commas followed by a space
- Fixed an issue where adding the –eligible-link flag to the ‘projectdata list’ API endpoint caused other flags to not work correctly
- Added cursor-based pagination for the ‘projectdata list’ API endpoint
- Fixed an issue with the entitled bundles cards view where the region is cut off when the Status is not present
- Fixed an issue where bundle filtering on categories did not work as expected
- Fixed an issue where file copy across tenants did not work as expected
- Added a cross-account permission check so that file copy jobs fail when the cross-account set up is missing instead of being retried indefinitely
- Fixed an issue where ‘Get Projects’ API endpoint returns an error when too many projects are in the tenant
- Fixed an issue where the UpdateProject API call (PUT /api/projects/{projectId}) returns an error when technical tags are removed from the request
- Fixed an issue where users need to confirm they want to cancel an action multiple times when clicking the back button in the graphical UI
- Fixed an issue where clicking into a new version of a bundle from the details view does not open the new version, and instead directs to the bundle card view
Flow
- Fixed an issue where the analysis logs are returned in the analysis screen “outputs” section and included in the getAnalysisOutputs API response. The log output is no longer considered as part of the analysis outputs
- Analysis history screen has been removed
- Fixed an issue resulting in inability to retrieve pipeline files via the API when the pipeline is shared cross-tenant
- Fixed an issue where the API endpoint to retrieve files for a project pipeline would not return all files for pipelines created via CLI or API
- Fixed an issue where the API does not check the proprietary flag of a pipeline before retrieving or downloading the pipeline files
Base
- The ‘Download’ button is available to download Base activity data locally (and replaces the non-functional ‘Export’ button for restricted bundles)
- Fixed an issue resulting in missing ICA reference fields in table records if the file was loaded into the table with no metadata
- Improved consistency of the references included in the scheduler
Bench
- Users are now logged out from a terminal window opened in a workspace after a period of inactivity
- Fixed an issue where permissions could not be enabled after a workspaces has been created
- Fixed an issue where a Contributor could not start/stop a workspace
Cohorts
- Fixed an issue where large genes with subjects with large sets of diagnostic concepts cause a 503 error
- Fixed an issue where the value in tumor_or_normal field in the phenotype table in the database is not set properly for germline and somatic mutation data
- Resolved a discrepancy between the number of samples reported when hovering over the needle plot and the variant list

2024 January 31 - ICA v2.22.0

Features and Enhancements

General
Data Management
- Users are now able to revalidate storage configurations in an Error state
- Improved existing endpoints and added new endpoints to link and unlink data to a bundle or a project in batch:
  - POST /api/projects/{projectId}/dataUnlinkingBatch
  - GET /api/projects/{projectId}/dataUnlinkingBatch/{batchId}
  - GET /api/projects/{projectId}/dataUnlinkingBatch/{batchId}/items
  - GET /api/projects/{projectId}/dataUnlinkingBatch/{batchId}/items/{itemId}
Flow
- Analyses started via the API can now leverage data stored in BaseSpace Sequence Hub as input
- ICA now supports auto-launching analysis pipelines upon sequencing run completion with run data stored in BaseSpace Sequence Hub (instead of ICA)
- Updated the API for creating pipelines to include "proprietary" setting, which hides pipeline scripts and details from users who do not belong to the tenant which owns the pipeline and prevents pipeline cloning.
Cohorts
- Added support for partial matches against a single input string to the “Search subjects” flexible filtering logic
- Users can now view an overview page for a gene when they search for it or click on a gene in the marker frequency charts
- ICA Cohorts includes access to both pathogenic and benign variants, which are plotted in the “Pathogenic variants” track underneath the needle plot
- Ingestion: UI notifications and/or errors will be displayed in the event of partially completed ingestions
- Users can share cohort comparisons with any other users with access to the same project

Fixed Issues

General
- Improved the project card view in the UI
- Fixed an issue with user administration where changing the permissions of multiple users at the same time would result in users receiving Invalid OAuth access token messages
Data Access & Management
- Improved the error message when downloading project data if the storage configuration is not ready for use
- Fixed an issue causing Folder Copy jobs to time out and restart, resulting in delays in copy operations
- Fixed an issue where only the Docker image of the first restricted bundle that was added could be selected
- Improved the performance of folder linking with "api/projects/{ProjectID}/dataLinkingBatch"
- The URL for links for "post/api/bundles" endpoint can be up to 2048 characters long
- Improved the error response when using offset-based paging on API responses which contain too much data and require cursor-based paging
- Fixed an issue resulting in failures downloading data from CLI using a path
- The correct error message is displayed if the user does not have a valid subscription when creating a new project
- Fixed an issue where changing ownership of a project does not change previous owner access for Base tables
Flow
- Input parameters of pipelines are now displayed in the "label (code)" format unless there is no label available or the label equals the code, in which case only the code is shown
- Fixed an issue where multiple folders were created upon starting new analyses
- Fixed an issue preventing analyses from using inputs with BaseSpace v1pre3 APIs
- Fixed an issue causing analyses with a specified output path to incorrectly return an error stating that the data does not exist
- The following endpoint "/api/projects/{projectId}/workflowSessions/{workflowSessionId}/inputs" now supports using external data as input
- Any value other than "economy" or "standard" for submitted analysis jobs will default to "standard" and use "standard"
- The parameter to pass an activationcode is now optional for start-analysis API endpoints
Base
- Improved the display of errors in the activity jobs screen if a Meta Data schedule fails
- If an error occurs when processing metadata a failed job entry will be added in the Base Activity screen
- Fixed an issue where records ingested via schedules from the same file could be duplicated
- Fixed an issue where exporting the view shared via bundle would show an error 'Could not find data with ID (fol. ....)'
- Resolved a NullPointerException error when clicking on Format and Status filters in the details screen of a Schedule in the Results tab
- Fixed an issue where a schedule download would fail when performed by different user than the initial user
Bench
- Fixed an issue when trying to query a Base table with a high limit within a workspace
- Fixed an apt-get error when building images due to an outdated repository
- Fixed an issue where a stopped workspace would display "Workspace paused" instead of "Workspace stopped"
- Fixed an issue where large files (e.g., 150GB+) could not be downloaded to a fuse-driver location from a Workspace, and set the new limit to 500GB
Cohorts
- Fixed an issue where split Nirvana JSON files are not recognized during ingestion
- Fixed an issue causing the UI hangs on large genes and returns a 502 error
- Fixed an issue where OMOP files are not correctly converted to CAM data model, preventing OMOP data ingestions
- Fixed an issue where OMOP large drug ingestions led to memory issues and preventing further drug data ingestion
- Fixed an issue where users from a different tenant accessing a shared project could not ingest data

Illumina Connected Analytics

Introduction

Get Started

About the Platform

Getting Started

Getting Help

Other Illumina Products

Release Notes

Get Started

Software Registration

Tenant Setup

API Keys

Generate an API Key

Access via Web UI

Access via the CLI

Access via the API

Object Identifiers

JSON Web Token (JWT)

Home

Projects

Introduction

Create new Project

Create with Storage Configuration

Managing Projects

Externally-managed projects

Tutorial

Bundles

Linking an Existing Bundle to a Project

Create a New Bundle

Edit an Existing Bundle

Adding Assets to a Bundle

Create a New Bundle Version

Add Terms of Use to a Bundle

Collaborating on a Bundle

Sharing a Bundle

Entitled Bundles

Event Log

Metadata Models

Metadata concepts

Groups & fields

Project vs. Pipeline Metadata Models

Metadata Actions

Publish a Metadata Model

Retire a Metadata Model

Assign a Metadata Model to a Project

Add Metadata to Samples Manually

Populating a Pipeline Metadata Model

Pushing Metadata Metrics to Base

Docker Repository

Importing a Public External Image (Tools)

Importing a Private Image (Tools + Bench Images)

Copying Docker Images to other Regions

Downloading Docker Images

File Size Considerations

Tool Repository

Create a Tool

Tool Properties

Information Tab

Documentation Tab

General Tool Tab

Tool Arguments Tab

Tool Inputs Tab

Tool Settings Tab

Tool Outputs Tab

Tool CWL Tab

Edit a Tool

Update Tool Status

Import Tool

Creating Your First Tool - Tips and Tricks

Storage

Credentials

Create a Storage Configuration

Storage Configuration Verification

Supported Storage Classes

Connect AWS S3 Bucket

Configure Bucket CORS Permission

Create AWS IAM Policy

Create AWS IAM User

Create AWS Access Key

S3 Bucket Policy