CWL Graphical Pipeline

This tutorial aims to guide you through the process of creating CWL tools and pipelines from the very beginning. By following the steps and techniques presented here, you will gain the necessary knowledge and skills to develop your own pipelines or transition existing ones to ICA.

Build and push to ICA your own Docker image

The foundation for every tool in ICA is a Docker image (externally published or created by the user). Here we present how to create your own Docker image for the popular tool (FASTQC). The starting point is a Dockerfile:

FROM centos:7
WORKDIR /usr/local

# DEPENDENCIES
RUN yum -y install java-1.8.0-openjdk wget unzip perl && \
    yum clean all && \
    rm -rf /var/cache/yum

# INSTALLATION fastqc
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip --no-check-certificate && \
    unzip fastqc_v0.11.9.zip && \
    chmod a+rx /usr/local/FastQC/fastqc && rm -rf fastqc_v0.11.9.zip

# Adding FastQC to the PATH
ENV PATH $PATH:/usr/local/FastQC

# DEFAULTS
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENTRYPOINT []

## how to build the docker image
## docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:0 .
## docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:0

Open a terminal window, place this file in a dedicated folder and navigate to this folder location. Then use the following command:

docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:1 .

Check the image has been successfully built:

docker images

Check that the container is functional:

docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:1

Once inside the container check that the fastqc command is responsive and prints the expected help message. Remember to exit the container.

Save a tar of the previously built image locally:

docker save fastqc-0.11.9:1 -o fastqc-0.11.9:1.tar.gz

Upload your docker image .tar to an ICA project (browser upload, Connector, or CLI). Important: In Data tab, select the uploaded .tar file, then click “Manage --> Change Format”, select 'DOCKER' and Save.

Now step outside of the Project and go to Docker Repository, Select New and click on the Search Icon. You can filter on Project names and locations, select your docker file (use the checkbox on the left) and Press Select.

Create a CWL tool

While outside of any Project go to Tool Repository and Select New Tool. Fill the mandatory fields (Name and Version) and click on the Search Icon to look for a Docker image to link to the tool. You must double-click on the image row to confirm the selection. Tool creation in ICA adheres to the cwl standard.

There are 2 ways you can create a (cwl) tool on top of a docker image in ICA UI:

1: Navigate to the Tool cwl tab and use the Text Editor to create the tool definition in CWL syntax. 2: Use the other tabs to independently define inputs, outputs, arguments, settings, etc …

In this tutorial we will present the 1st option using the CWL file: paste the following content into the Tool CWL tab

#!/usr/bin/env cwl-runner

# (Re)generated by BlueBee Platform

$namespaces:
  ilmn-tes: http://platform.illumina.com/rdf/iap/
cwlVersion: cwl:v1.0
class: CommandLineTool
label: FastQC
doc: FastQC aims to provide a simple way to do some quality control checks on raw
  sequence data coming from high throughput sequencing pipelines.
inputs:
  Fastq1:
    type: File
    inputBinding:
      position: 1
  Fastq2:
    type:
    - File
    - 'null'
    inputBinding:
      position: 3
outputs:
  HTML:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - '*.html'
  Zip:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - '*.zip'
arguments:
- position: 4
  prefix: -o
  valueFrom: $(runtime.outdir)
- position: 1
  prefix: -t
  valueFrom: '2'
baseCommand:
- fastqc

Please, observe the following: since the user needs to specify the output folder for FASTQC application (-o prefix), we are using the $(runtime.outdir) runtime parameter to point to the designated output directory.

Create the pipeline

While inside a Project, navigate to Pipelines and click on cwl and then Graphical.

Fill the mandatory fields (Code = pipeline name and free text Description) and click on the Definition tab to open the Graphical Editor.

Expand the Tool Repository menu (lower right) and drag your FastQC tool into the Editor field (center).

Now drag one Input and one Output file icon (on top) into the Editor field as well. Both may be given a Name (editable fields on the right when icon is selected) and need a Format attribute. Set the Input Format to fastq and Output Format to html. Connect both Input and Output files to the matching nodes on the tool itself (mouse over the node, then hold-click and drag to connect).

Press Save, you just created your first FastQC pipeline on ICA!

Run a pipeline

First make sure you have at least one Fastq file uploaded and/or linked to your Project. You may use Fastq files available in the Bundle.

Navigate to Pipelines and select the pipeline you just created, then press Start New Run.

Fill the mandatory field (User Reference = pipeline execution name) and click on the Select button to open the File Selection dialog box. Select any of the Fastq files available to you (use the checkbox on the left and press Select on the lower right).

Press Start Run on the top right, the platform is now orchestrating the workflow execution.

View Results

Navigate to Runs and observe that the pipeline execution is now listed and will first appear to be in “Requested” Status. After a few minutes the Status should change to “In Progress” and then to “Succeeded”.

Once this Run is succeeded click on the row (a single click is enough) to enter Result view. You should see the FastQC HTML output file listed on the right. Click on the file to open Data Details view. Since it is an HTML file Format there is a View tab that allows visualizing the HTML within the browser.

Last updated