CWL CLI Pipeline Execution

In this tutorial, we will demonstrate how to create and launch a pipeline using the CWL language using the ICA command line interface (CLI).

Installation

Please refer to these instructions for installing ICA CLI.

Tutorial project

In this project, we will create two simple tools and build a pipeline that we can run on ICA using CLI. The first tool (tool-fqTOfa.cwl) will convert a FASTQ file to a FASTA file. The second tool(tool-countLines.cwl) will count the number of lines in an input FASTA file. The workflow.cwl will combine the two tools to convert an input FASTQ file to a FASTA file and count the number of lines in the resulting FASTA file.

Following are the two CWL tools and scripts we will use in the project. If you are new to CWL, please refer to the cwl user guide for a better understanding of CWL codes. You will also need the cwltool installed to create these tools and processes. You can find installation instructions on the CWL github page.

tool-fqTOfa.cwl

#!/usr/bin/env cwltool

cwlVersion: v1.0
class: CommandLineTool
inputs:
  inputFastq:
    type: File
    inputBinding:
        position: 1
stdout: test.fasta
outputs:
  outputFasta:
    type: File
    streamable: true
    outputBinding:
        glob: test.fasta

arguments:
- 'NR%4 == 1 {print ">" substr($0, 2)}NR%4 == 2 {print}'
baseCommand:
- awk

tool-countLines.cwl

workflow.cwl

If you want to use a different public image, you can specify it using requirements tag in cwl file. Assuming you want to use *ubuntu:latest' you need to add

If you want to use a Docker image from the ICA Docker repository, you need the link to AWS ECR from ICA GUI. Double-click on the image name in the Docker repository and copy the URL to the clipboard. Add the URL to dockerPull key.

To add a custom or public docker image to the ICA repository, refer to the Docker Repository.

Authentication

Before you can use ICA CLI, you need to authenticate using the Illumina API key. Follow these instructions to authenticate.

Enter/Create a Project

Either create a project or use an existing project to create a new pipeline. You can create a new project using the icav2 projects create command.

If you do not provide the --region flag, the value defaults to the existing region when there is only one region available. When there is more than one region available, a selection must be made from the available regions at the command prompt. The region input can be determined by calling the icav2 regions list command first.

You can select the project to work on by entering the project using the icav2 projects enter command. Thus, you won't need to specify the project as an argument.

You can also use the icav2 projects list command to determine the names and ids of the project you have access to.

Create a pipeline on ICA

projectpipelines is the root command to perform actions on pipelines in a project. The create command creates a pipeline in the current project.

The parameter file specifies the input with additional parameter settings for each step in the pipeline. In this tutorial, input is a FASTQ file shown inside <dataInput> tag in the parameter file. There aren't any specific settings for the pipeline steps resulting in a parameter file below with an empty <steps> tag. Create a parameter file (parameters.xml) with the following content using a text editor.

The following command creates a pipeline called "cli-tutorial" using the workflow.cwl, tools "tool-fqTOfa.cwl" and "tool-countLines.cwl" and parameter file "parameter.xml" with small storage size.

Once the pipeline is created, you can view it using the list command.

Running the pipeline

Upload data to the project using the icav2 projectdata upload command. Refer to the Data page for advanced data upload features. For this test, we will use a small FASTQ file test.fastq containing the following reads.

The "icav2 projectdata upload" command lets you upload data to ica.

The list command lets you view the uploaded file. Note the ID of the file you want to use with the pipeline.

The icav2 projectpipelines start command initiates the pipeline run. The following command runs the pipeline. Write down the id for exploring the analysis later.

If for some reason your create command fails and needs to rerun, you might get an error (ConstraintViolationException). If so, try your command with a different name.

You can check the status of the run using the icav2 projectanalyses get command.

The pipelines can be run using JSON input type as well. The following is an example of running pipelines using JSON input type. Note that JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).

Notes

runtime.ram and runtime.cpu

runtime.ram and runtime.cpu values are by default evaluated using the compute environment running the host CWL runner. CommandLineTool Steps within a CWL pipeline run on different compute environments than the host CWL runner, so the valuations of the runtime.ram and runtime.cpu for within the CommandLineTool will not match the runtime environment the tool is running in. The valuation of runtime.ram and runtime.cpu can be overridden by specifying cpuMin and ramMin in the ResourceRequirements for the CommandLineTool.

Last updated

Was this helpful?