CWL Graphical Pipeline

This tutorial aims to guide you through the process of creating CWL tools and pipelines from the very beginning. By following the steps and techniques presented here, you will gain the necessary knowledge and skills to develop your own pipelines or transition existing ones to ICA.

Build and push to ICA your own Docker image

The foundation for every tool in ICA is a Docker image (externally published or created by the user). Here we present how to create your own Docker image for the popular tool (FASTQC).

Copy the contents displayed below to a text editor and save it as a Dockerfile. Make sure you use an editor which does not add formatting to the file.

FROM centos:7
WORKDIR /usr/local

# DEPENDENCIES
RUN yum -y install java-1.8.0-openjdk wget unzip perl && \
    yum clean all && \
    rm -rf /var/cache/yum

# INSTALLATION fastqc
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip --no-check-certificate && \
    unzip fastqc_v0.11.9.zip && \
    chmod a+rx /usr/local/FastQC/fastqc && rm -rf fastqc_v0.11.9.zip

# Adding FastQC to the PATH
ENV PATH $PATH:/usr/local/FastQC

# DEFAULTS
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENTRYPOINT []

## how to build the docker image
## docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:0 .
## docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:0

Open a terminal window, place this file in a dedicated folder and navigate to this folder location. Then use the following command:

docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:1 .

Check the image has been successfully built:

docker images

Check that the container is functional:

docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:1

Once inside the container check that the fastqc command is responsive and prints the expected help message. Remember to exit the container.

Save a tar of the previously built image locally:

docker save fastqc-0.11.9:1 -o fastqc-0.11.9:1.tar.gz

Upload your docker image .tar to an ICA project (browser upload, Connector, or CLI).

Now go outside of the Project and go to System Settings > Docker Repository, Select Create > Image. Select your docker file and fill out a name and version and set your type to tool and Press Select.

Create a CWL tool

While outside of any Project go to System Settings > Tool Repository and Select +Create. Fill the mandatory fields (Name and Version) and look for a Docker image to link to the tool.

Tool creation in ICA adheres to the cwl standard.

You can create a tool by either pasting the tool definition in the code syntax field on the right or you can use the different tabs to manually define inputs, outputs, arguments, settings, etc …

In this tutorial we will use the CWL tool syntax method. Paste the following content in the General tab.

Other tabs, except for the Details tab can also be used.

#!/usr/bin/env cwl-runner

# (Re)generated by BlueBee Platform

$namespaces:
  ilmn-tes: http://platform.illumina.com/rdf/iap/
cwlVersion: cwl:v1.0
class: CommandLineTool
label: FastQC
doc: FastQC aims to provide a simple way to do some quality control checks on raw
  sequence data coming from high throughput sequencing pipelines.
inputs:
  Fastq1:
    type: File
    inputBinding:
      position: 1
  Fastq2:
    type:
    - File
    - 'null'
    inputBinding:
      position: 3
outputs:
  HTML:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - '*.html'
  Zip:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - '*.zip'
arguments:
- position: 4
  prefix: -o
  valueFrom: $(runtime.outdir)
- position: 1
  prefix: -t
  valueFrom: '2'
baseCommand:
- fastqc

Since the user needs to specify the output folder for FASTQC application (-o prefix), we are using the $(runtime.outdir) runtime parameter to point to the designated output folder.

Create the pipeline

Navigate to Projects > your_project > Flow > Pipelines > +Create > CWL Graphical.

Fill the mandatory fields and click on the Definition tab to open the Graphical Editor.

Expand the Tool Repository menu (lower right) and drag your FastQC tool into the Editor field (center).

Now drag one Input and one Output file icon (on top) into the Editor field as well. Both may be given a Name (editable fields on the right when icon is selected) and need a Format attribute. Set the Input Format to fastq and Output Format to html. Connect both Input and Output files to the matching nodes on the tool itself (mouse over the node, then hold-click and drag to connect).

Press Save, you just created your first FastQC pipeline on ICA!

FastQC

Run a pipeline

First make sure you have at least one Fastq file uploaded and/or linked to your Project. You may use Fastq files available in the Bundle.

Navigate to Pipelines and select the pipeline you just created, then press Start analysis

Fill the mandatory fields and click on the + button to open the File Selection dialog box. Select one of the Fastq files available to you.

Press Start analysis on the top right, the platform is now orchestrating the pipeline execution.

View Results

Navigate to Projects > your_project > Flow > Analyses and observe that the pipeline execution is now listed and will first be in “Requested” Status. After a few minutes the Status should change to “In Progress” and then to “Succeeded”.

Once this Analysis succeeds click it, to enter the Analysis details view. You should see the FastQC HTML output file listed on the right under Output files. Click on the file to open Data Details view. Since it is an HTML file Format there is a View tab that allows visualizing the HTML within the browser.

Last updated

Was this helpful?