# Analyses

An Analysis is the execution of a pipeline.

## Starting Analyses

You can start an analysis from both the dedicated analysis screen or from the actual pipeline.

#### From Analyses

1. Navigate to **Projects > Your\_Project > Flow > Analyses**.
2. Select **Start**.
3. Select a single Pipeline.
4. Configure the[ analysis settings](https://help.ica.illumina.com/project/f-pipelines#analysis-tab).
5. Select **Start Analysis**.
6. Refresh to see the analysis status. See [lifecycle](#Lifecycle) for more information on statuses.
7. If for some reason, you want to end the analysis before it can complete, select **Projects > Your\_Project > Flow > Analyses > Manage > Abort**. Refresh to see the status update.

#### From Pipelines or Pipeline details

1. Navigate to **Projects > \<Your\_Project> > Flow > Pipelines**
2. Select the pipeline you want to run or open the pipeline details of the pipeline which you want to run.
3. Select **Start Analysis**.
4. Configure [analysis settings](https://help.ica.illumina.com/project/f-pipelines#analysis-tab).
5. Select **Start Analysis**.
6. View the analysis status on the Analyses page. See [lifecycle](#Lifecycle) for more information on statuses.
7. If for some reason, you want to end the analysis before it can complete, select **Manage > Abort** on the Analyses page.

#### Aborting Analyses

You can abort a running analysis from either the analysis overview (**Projects > your\_project > Flow > Analyses > your\_analysis > Manage > Abort**) or from the analysis details (**Projects > your\_project > Flow > Analyses > your\_analysis > Details tab > Abort**).

#### Rerunning Analyses

Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. **Draft pipelines** are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.

{% hint style="info" %}
When rerunning an analysis, the user reference will be the original user reference (up to 231 characters), followed by \_rerun\_yyyy-MM-dd\_HHmmss.
{% endhint %}

When there is an **XML configuration change** on a a pipeline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML. You will need to provide the input data and settings again to rerun the analysis.

Some restrictions apply when trying to rerun an analysis.

<table><thead><tr><th width="374">Analyses</th><th width="138">Rerun</th><th>Rerun with modified parameters</th></tr></thead><tbody><tr><td>Analyses using external data</td><td>Allowed</td><td>-</td></tr><tr><td>Analyses using mount paths on input data</td><td>Allowed</td><td>-</td></tr><tr><td>Analyses using user-provided input json</td><td>Allowed</td><td>-</td></tr><tr><td>Analyses using advanced output mappings</td><td>-</td><td>-</td></tr><tr><td>Analyses with draft pipeline</td><td>Warn</td><td>Warn</td></tr><tr><td>Analyses with XML configuration change</td><td>Warn</td><td>Warn</td></tr></tbody></table>

To rerun one or more analyses with te same settings:

1. Navigate to **Projects > Your\_Project > Flow > Analyses**.
2. In the overview screen, **select one or more** analyses.
3. Select **Manage > Rerun**. The analyses will now be executed with the same parameters as their original run.

To rerun a single analysis with modified parameters:

1. Navigate to **Projects > Your\_Project > Flow > Analyses**.
2. In the overview screen, **open the details** of the analysis you want to rerun by clicking on the analysis user reference.
3. Select **Rerun**. (at the top right)
4. Update the parameters you want to change.
5. Select **Start Analysis** The analysis will now be executed with the updated parameters.

{% hint style="info" %}
You might see parameters (files and folders) on the analysis details tab which are not defined on the XML. This indicates they are added in the xml-pipeline analysis creation endpoint.
{% endhint %}

## Lifecycle <a href="#lifecycle" id="lifecycle"></a>

<table><thead><tr><th>Status</th><th width="393">Description</th><th>Final State</th></tr></thead><tbody><tr><td>Requested</td><td>The request to start the Analysis is being processed</td><td>No</td></tr><tr><td>Queued</td><td>Analysis has been queued</td><td>No</td></tr><tr><td>Initializing</td><td>Initializing environment and performing validations for Analysis</td><td>No</td></tr><tr><td>Preparing Inputs</td><td>Downloading inputs for Analysis</td><td>No</td></tr><tr><td>In Progress</td><td>Analysis execution is in progress</td><td>No</td></tr><tr><td>Generating outputs</td><td>Transferring the Analysis results</td><td>No</td></tr><tr><td>Aborting</td><td>Analysis has been requested to be aborted</td><td>No</td></tr><tr><td>Aborted</td><td>Analysis has been aborted</td><td>Yes</td></tr><tr><td>Failed</td><td>Analysis has finished with error</td><td>Yes</td></tr><tr><td>Succeeded</td><td>Analysis has finished with success</td><td>Yes</td></tr></tbody></table>

{% hint style="info" %}
When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when cloud resources are under high load.
{% endhint %}

During analysis start, ICA runs a verification on the input files to see if they are available. When it encounters files that have not completed their upload or transfer, it will report "*Data found for parameter \[parameter\_name], but status is Partial instead of Available*". Wait for the file to be available and restart the analysis.

{% hint style="info" %}
When the underlying storage provider runs out of storage resources, the Status field of the Analysis details will indicate this. There is no need to abort or rerun the analysis.
{% endhint %}

## Analysis steps logs

During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the **Steps tab** is used to view the steps in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the *tile/grid button* on the top right of the analysis log tab. The steps tab also shows **which resources were used** as compute type in the different main analysis steps. (For child steps, these are displayed on the parent step)

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-71aee9f8d1778bc0657bf04eaa912ce9d4230ccc%2Fimage%20(75).png?alt=media" alt=""><figcaption></figcaption></figure>

There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes. You can choose to display or hide these system processes with the Show technical steps

<table><thead><tr><th width="241">Process</th><th>Description</th></tr></thead><tbody><tr><td>Setup Environment</td><td>Validate analysis execution environment is prepared</td></tr><tr><td>Run Monitor</td><td>Monitor resource usage for billing and reporting</td></tr><tr><td>Prepare Input Data</td><td>Download and mount input data to the shared file system</td></tr><tr><td>Pipeline Runner</td><td>Parent process to execute the pipeline definition</td></tr><tr><td>Finalize Output Data</td><td>Upload Output Data</td></tr></tbody></table>

Additional log entries will show for the processes which execute the steps defined in the pipeline.

Each process shows as a distinct entry in the steps view with a Queue Date, Start Date, and End Date.

<table><thead><tr><th width="175">Timestamp</th><th>Description</th></tr></thead><tbody><tr><td>Queue Date</td><td>The time when the process is submitted to the processes scheduler for execution</td></tr><tr><td>Start Date</td><td>The time when the process has started exection</td></tr><tr><td>End Date</td><td>The time when the process has stopped execution</td></tr></tbody></table>

The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.

Each log entry in the Steps view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.

### Analysis Cost

To see the price of an analysis in iCredits, look at **Projects > your\_project > Flow > Analyses > your\_analysis > Details tab**. The pricing section will show you the entitlement bundle, storage detail and price in iCredits once the analysis has succeeded, failed or been aborted.

### Log Files

By default, the **stdout** and **stderr** files are located in the ***ica\_logs*** subfolder within the analysis. This **location can be changed** by selecting a different [logs folder ](https://help.ica.illumina.com/project/f-pipelines#analysis-settings)in the current project at the start of the analysis. **Do not use a folder which already contains log files** as these will be overwritten.\
To set the log file location, you can also use the CreateAnalysisLogs section of the Create Analysis [endpoints](https://ica.illumina.com/ica/api/swagger/index.html).

{% hint style="warning" %}
If you delete these files, no log information will be available on the **analysis details > Steps tab**.
{% endhint %}

You can access the log files from the analysis details (**projects > your\_project > flow > analysis > your\_analysis > details tab**)

### Log Streaming

Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.

## Analysis Output Mappings

{% hint style="warning" %}
Currently, only FOLDER type output mappings are supported
{% endhint %}

By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:

* the source path on the local disk of the analysis execution environment, relative to the working folder.
* the data type, either FILE or FOLDER
* the target project ID to direct outputs to; analysis launcher must have contributor access to the project.
* the target path relative to the root of the project data to write the outputs.

{% hint style="warning" %}
If the output folder already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis
{% endhint %}

<details>

<summary>Example</summary>

In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths `out/test` and `out/test2`. The data contained in these folders are directed to project with ID `4d350d0f-88d8-4640-886d-5b8a23de7d81` and at paths `/output-testing-01/` and `/output-testing-02/` respectively, relative to the root of the project data.

The following demonstrates the construction of the request body to start an analysis with the output mappings described above:

````
```json
{
...
    "analysisOutput":
    [
        {
            "sourcePath": "out/test1",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-01/"
        },
        {
            "sourcePath": "out/test2",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-02/"
        }
    ]
}
```
````

When the analysis completes the outputs can be seen in the ICA UI, within the folders designated in the payload JSON during pipeline launch (`output-testing-01` and `output-testing-02`).

</details>

You can jump from the Analysis Details to the individual files and folders by opening the output files tab on the detail view (P**rojects > your\_project > Flow > Analyses > your\_analysis > Output files tab > your\_output\_file**) and selecting O**pen in data**.

{% hint style="info" %}
The **Output files** section of the analyses will always show the generated outputs, even when they have since been deleted from storage. This is done so you can always see which files were generated during the analysis.\
In this case it will no longer be possible to navigate to the actual output files.
{% endhint %}

<table><thead><tr><th width="150.98046875">analysis output</th><th width="120.3671875">logs output</th><th>Notes</th></tr></thead><tbody><tr><td>Default</td><td>Default</td><td>Logs are a subfolder of the analysis output.</td></tr><tr><td>Mapped</td><td>Default</td><td>Logs are a subfolder of the analysis output.</td></tr><tr><td>Default</td><td>Mapped</td><td>Outputs and logs may be separated.</td></tr><tr><td>Mapped</td><td>Mapped</td><td>Outputs and logs may be separated.</td></tr></tbody></table>

## Tags

You can add and remove tags from your analyses.

1. Navigate to **Projects > Your\_Project > Flow > Analyses**.
2. Select the analyses whose tags you want to change.
3. Select **Manage > Manage tags**.
4. Edit the user tags, reference data tags (if applicable) and technical tags.
5. Select **Save** to confirm the changes.

Both system tags and customs tags exist. **User** tags are custom tags which you set to help identify and process information while **technical** tags are set by the system for processing. Both **run-in** and **run-out** tags are set on data to identify which analyses use the data. **Connector** tags determine data entry methods and **reference data** tags identify where data is used as reference data.

## Hyperlinking

If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be `<hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID>`. Likewise, workflow sessions will use the syntax `<hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>`. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every user when they open the link.

## Restrictions

Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). Concurrency limits on analyses prevent resource hogging which could result in resource starvation for other tenants. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions. The theoretical limit is 20, but this can be less in practice, depending on a number of external factors.

## Troubleshooting

When your analysis fails, open the analysis details view (**Projects > your\_project> Flow > Analyses > your\_analysis**) and select **display failed steps**. This will give you the steps view filtered on those steps that had non-0 exit codes. If there is only one failed step which has logfiles, the stderr of that step will be displayed.

{% hint style="info" %}
For pipeline developers: **add automatic retrying to the individual steps** that fail with error 55 / 56 (provided these steps are idempotent) See [tips and tricks](https://help.ica.illumina.com/project/p-flow/f-pipelines/pi-tips) for retries.
{% endhint %}

* Exit **code 55** indicates analysis failure on economy instances due to an external event such as spot termination. **You can retry the analysis.**
* Exit **code 56** indicates analysis failure due to pod disruption and deletion by Kubernetes' Pod Garbage Collector (PodGC) because the node it was running on no longer exists. **You can retry the anlaysis.**
