Analyses

An Analysis is the long-running execution of a pipeline.

Starting Analyses

There are several ways to start analyses.

From Analyses

  1. Navigate to Projects > Your_Project > Flow > Analyses.

  2. Select » New Analysis.

  3. Select a single Pipeline.

  4. Configure analysis settings.

  5. Select » Start Analysis.

  6. Refresh to see the analysis status. See lifecycle

  7. To end an analysis, select Abort. (refresh to see the status update)

  8. To perform a completed analysis again, select the analysis and choose Re-run.

From Pipelines

  1. Navigate to Projects > <Your_Project> > Flow > Pipelines

  2. Select the pipeline to run.

  3. Select » Start New Analysis.

  4. Configure analysis settings.

  5. Select » Start Analysis.

  6. View the analysis status on the Analyses page. See lifecycle

  7. To end an analysis, select Abort on the Analyses page.

  8. To perform a completed analysis again, select Re-run on the Analyses page.

Lifecycle

StatusDescriptionFinal State

Requested

The request to start the Analysis is being processed

No

Queued

Analysis has been queued

No

Initializing

Initializing environment and performing validations for Analysis

No

Preparing Inputs

Downloading inputs for Analysis

No

In Progress

Analysis execution is in progress

No

Generating outputs

Transferring the Analysis results

No

Aborting

Analysis has been requested to be aborted

No

Aborted

Analysis has been aborted

Yes

Failed

Analysis has finished with error

Yes

Succeeded

Analysis has finished with success

Yes

When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.

Logs

During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Logs tab is used to view the logs in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab.

There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes.

ProcessDescription

Setup Environment

Validate analysis execution environment is prepared

Run Monitor

Monitor resource usage for billing and reporting

Prepare Input Data

Download and mount input data to the shared file system

Pipeline Runner

Parent process to execute the pipeline definition

Finalize Output Data

Upload Output Data

Additional log entries will show for the processes which execute the steps defined in the pipeline.

Each process shows as a distinct entry in the Logs view with a Queue Date, Start Date, and End Date.

TimestampDescription

Queue Date

The time when the process is submitted to the processes scheduler for execution

Start Date

The time when the process has started exection

End Date

The time when the process has stopped execution

The time between the Start Date and the End Date is used to calculate the Duration. The time of the Duration is used to calculate the usage-based cost for the analysis.

Each log entry in the Logs view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.

Log Files

In the analysis output folder, the ica_logs subfolder will contain the stdout and stderr files. If you delete these files, no log information will be available on the analysis details > logs tab.

Log Streaming

Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.

Analysis Output Mappings

Currently, this feature is only availabe when launching analyses via API.

Currently, only FOLDER type output mappings are supported

By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:

  • the source path on the local disk of the analysis execution environment, relative to the working directory

  • the data type, either FILE or FOLDER

  • the target project ID to direct outputs to; analysis launcher must have contributor access to the project

  • the target path relative to the root of the project data to write the outputs

Use the example analysis output mapping below for guidance.

If the output directory already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis

In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths out/test and out/test2. The data contained in these folders are directed to project with ID 4d350d0f-88d8-4640-886d-5b8a23de7d81 and at paths /output-testing-01/ and /output-testing-02/ respectively, relative to the root of the project data.

The following demonstrates the construction of the request body to start an analysis with the output mappings described above:

```json
{
...
    "analysisOutput":
    [
        {
            "sourcePath": "out/test1",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-01/"
        },
        {
            "sourcePath": "out/test2",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-02/"
        }
    ]
}
```

When the analysis completes the outputs can be seen in the ICA UI, within the folders designated in the payload JSON during pipeline launch (output-testing-01 and output-testing-02).

Hyperlinking

External linking to individual analyses and workflow sessions is done by using the syntax <hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID> and <hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>

Restrictions

Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file).

Last updated