Search…
CWL CLI Workflow
In this tutorial, we will demonstrate how to create and lunch a pipeline using the CWL language using ICA command line interface (CLI).
The tutorial is based on the instructions available in gitbook and ICA v2 Documentation

Installation

Please refer to the the gitbook instruction for installing ICA CLI.

Tutorial project

In this project we will create two simple tools and build a workflow that we can run on ICA using CLI. The first tool (tool-fqTOfa.cwl) will convert a FASTQ file to a FASTA file. The second tool(tool-countLines.cwl) will count the number of lines in an input FASTA file. The workflow (workflow.cwl) will combine the two tools to convert an input FASTQ file to a FASTA file and count the number of lines in the resulting FASTA file.
Following are the two CWL tools and a workflow scripts we will use in the project. If you are new to CWL, please refer to cwl user guide for better understanding of CWL codes. You will also need cwltool installed to create these tools and workflows. You can find installation instructions in the CWL github page.

tool-fqTOfa.cwl

#!/usr/bin/env cwltool
cwlVersion: v1.0
class: CommandLineTool
inputs:
inputFastq:
type: File
inputBinding:
position: 1
stdout: test.fasta
outputs:
outputFasta:
type: File
streamable: true
outputBinding:
glob: test.fasta
arguments:
- 'NR%4 == 1 {print ">" substr($0, 2)}NR%4 == 2 {print}'
baseCommand:
- awk

tool-countLines.cwl

#!/usr/bin/env cwltool
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [wc, -l]
inputs:
inputFasta:
type: File
inputBinding:
position: 1
stdout: lineCount.tsv
outputs:
outputCount:
type: File
streamable: true
outputBinding:
glob: lineCount.tsv

workflow.cwl

cwlVersion: v1.0
class: Workflow
inputs:
ipFQ: File
outputs:
count_out:
type: File
outputSource: count/outputCount
fqTOfaOut:
type: File
outputSource: convert/outputFasta
steps:
convert:
run: tool-fqTOfa.cwl
in:
inputFastq: ipFQ
out: [outputFasta]
count:
run: tool-countLines.cwl
in:
inputFasta: convert/outputFasta
out: [outputCount]

Authentication

Before you can use ICA CLI, you will need to authenticate using Illumina API key. Please follow instructions in the gitbook to authenticate.

Enter/Create a Project

You can create a project or use existing project for creating a new pipeline. You can create a new project using projects create command.
% icav2 projects create basic-cli-tutorial --region c39b1feb-3e94-4440-805e-45e0c76462bf
If you do not provide the "--region" flag, the value defaults to the existing region when there is only one region available. When there is more than one region available, a selection must be made from the available regions at the command prompt. The region input can be determined by calling the separate regions list command first.
You can select the project to work on by entering the project using projects enter command. Thus you won't need to specify the project as an argument.
% icav2 projects enter basic-cli-tutorial
You can also use projects list command to determine the names and ids of the project you have access to.
% icav2 projects list

Create a pipeline on ICA

"projectpipelines" is the root command to perform actions on pipelines in a project. "create" command creates a pipeline in the current project. Please refer to the CLI documentation for additional options.
Parameter file specifies the input for the workflow with additional parameter settings for each step in the workflow. In this tutorial, input is a FASTQ file shown inside <dataInput> tag in the parameter file. There aren't any specific settings for the workflow steps resulting in a parameter file below with empty <steps> tag. Create a parameter file (parameters.xml) with following content using a text editor.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="ipFQ" format="FASTQ" type="FILE" required="true" multiValue="false">
<pd:label>ipFQ</pd:label>
<pd:description></pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>
Following command creates a pipeline called "cli-tutorial" using the workflow "workflow.cwl", tools "tool-fqTOfa.cwl" and "tool-countLines.cwl" and parameter file "parameter.xml" with small storage size.
% icav2 projectpipelines create cwl cli-tutorial --workflow workflow.cwl --tool tool-fqTOfa.cwl --tool tool-countLines.cwl --parameter parameters.xml --storage-size small --description "cli tutorial pipeline"
Once the pipeline is created, you can view it using the "list" command.
% icav2 projectpipelines list
ID CODE DESCRIPTION
6779fa3b-e2bc-42cb-8396-32acee8b6338 cli-tutorial cli tutorial pipeline

Running the pipeline

Upload data to the project using projectdata upload command. Please refer to the Data Overview section of icav2 user guide for advanced data upload features. For this test we will use a small fastq file test.fastq containing following reads.
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
projectdata upload command lets you upload data to ica.
% icav2 projectdata upload test.fastq /
oldFilename= test.fastq en newFilename= test.fastq
bucket= stratus-gds-use1 prefix= 0a488bb2-578b-404a-e09d-08d9e3343b2b/test.fastq
Using: 1 workers to upload 1 files
15:23:32: [0] Uploading /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq
15:23:33: [0] Uploaded /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq to /test.fastq in 794.511591ms
Finished uploading 1 files in 795.244677ms
The list command lets you view the uploaded file. Note the ID of the file you want to use with the pipeline.
% icav2 projectdata list
PATH NAME TYPE STATUS ID OWNER
/test.fastq test.fastq FILE AVAILABLE fil.c23246bd7692499724fe08da020b1014 4b197387-e692-4a78-9304-c7f73ad75e44
projectpipelines start command initiates the pipeline run. Following commands runs the pipeline. Note the id for exploring the analysis later.
Note: If for some reason your create command fails and needs to rerun, you might get an error (ConstraintViolationException). If so, try your command with a different name.
% icav2 projectpipelines start cwl cli-tutorial --type-input STRUCTURED --input ipFQ:fil.c23246bd7692499724fe08da020b1014 --user-reference tut-test
analysisStorage.description 1.2 TB
analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name Small
analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName ica-cp-admin
analysisStorage.timeCreated 2021-11-05T10:28:20Z
analysisStorage.timeModified 2021-11-05T10:28:20Z
id 461d3924-52a8-45ef-ab62-8b2a29621021
ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.analysisStorage.description 1.2 TB
pipeline.analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name Small
pipeline.analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName ica-cp-admin
pipeline.analysisStorage.timeCreated 2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code cli-tutorial
pipeline.description Test, prepared parameters file from working GUI
pipeline.id 6779fa3b-e2bc-42cb-8396-32acee8b6338
pipeline.language CWL
pipeline.ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.tenantId d0696494-6a7b-4c81-804d-87bda2d47279
pipeline.tenantName icav2-entprod
pipeline.timeCreated 2022-03-10T13:13:05Z
pipeline.timeModified 2022-03-10T13:13:05Z
reference tut-test-cli-tutorial-eda7ee7a-8c65-4c0f-bed4-f6c2d21119e6
status REQUESTED
summary
tenantId d0696494-6a7b-4c81-804d-87bda2d47279
tenantName icav2-entprod
timeCreated 2022-03-10T20:42:42Z
timeModified 2022-03-10T20:42:43Z
userReference tut-test
You can check the status of the run using projectanalyses get command.
% icav2 projectanalyses get 461d3924-52a8-45ef-ab62-8b2a29621021
analysisStorage.description 1.2 TB
analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name Small
analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName ica-cp-admin
analysisStorage.timeCreated 2021-11-05T10:28:20Z
analysisStorage.timeModified 2021-11-05T10:28:20Z
endDate 2022-03-10T21:00:33Z
id 461d3924-52a8-45ef-ab62-8b2a29621021
ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.analysisStorage.description 1.2 TB
pipeline.analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name Small
pipeline.analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName ica-cp-admin
pipeline.analysisStorage.timeCreated 2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code cli-tutorial
pipeline.description Test, prepared parameters file from working GUI
pipeline.id 6779fa3b-e2bc-42cb-8396-32acee8b6338
pipeline.language CWL
pipeline.ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.tenantId d0696494-6a7b-4c81-804d-87bda2d47279
pipeline.tenantName icav2-entprod
pipeline.timeCreated 2022-03-10T13:13:05Z
pipeline.timeModified 2022-03-10T13:13:05Z
reference tut-test-cli-tutorial-eda7ee7a-8c65-4c0f-bed4-f6c2d21119e6
startDate 2022-03-10T20:42:42Z
status SUCCEEDED
summary
tenantId d0696494-6a7b-4c81-804d-87bda2d47279
tenantName icav2-entprod
timeCreated 2022-03-10T20:42:42Z
timeModified 2022-03-10T21:00:33Z
userReference tut-test
The pipelines can be run using JSON input type as well. Following is an example of running pipelines using JSON input type.
% icav2 projectpipelines start cwl cli-tutorial --data-id fil.c23246bd7692499724fe08da020b1014 --input-json '{
"ipFQ": {
"class": "File",
"path": "test.fastq"
}
}' --type-input JSON --user-reference tut-test-json
Copy link
On this page
Installation
Tutorial project
tool-fqTOfa.cwl
tool-countLines.cwl
workflow.cwl
Authentication
Enter/Create a Project
Create a pipeline on ICA
Running the pipeline