LogoLogo
Illumina Connected Software
  • Introduction
  • Get Started
    • About the Platform
    • Get Started
  • Home
    • Projects
    • Bundles
    • Event Log
    • Metadata Models
    • Docker Repository
    • Tool Repository
    • Storage
      • Connect AWS S3 Bucket
        • SSE-KMS Encryption
  • Project
    • Data
      • Data Integrity
    • Samples
    • Activity
    • Flow
      • Reference Data
      • Pipelines
        • Nextflow
        • CWL
        • XML Input Form
        • 🆕JSON-Based input forms
          • InputForm.json Syntax
          • JSON Scatter Gather Pipeline
        • Tips and Tricks
      • Analyses
    • Base
      • Tables
        • Data Catalogue
      • Query
      • Schedule
      • Snowflake
    • Bench
      • Workspaces
      • JupyterLab
      • 🆕Bring Your Own Bench Image
      • 🆕Bench Command Line Interface
      • 🆕Pipeline Development in Bench (Experimental)
        • Creating a Pipeline from Scratch
        • nf-core Pipelines
        • Updating an Existing Flow Pipeline
      • 🆕Containers in Bench
      • FUSE Driver
    • Cohorts
      • Create a Cohort
      • Import New Samples
      • Prepare Metadata Sheets
      • Precomputed GWAS and PheWAS
      • Cohort Analysis
      • Compare Cohorts
      • Cohorts Data in ICA Base
      • Oncology Walk-through
      • Rare Genetic Disorders Walk-through
      • Public Data Sets
    • Details
    • Team
    • Connectivity
      • Service Connector
      • Project Connector
    • Notifications
  • Command-Line Interface
    • Installation
    • Authentication
    • Data Transfer
    • Config Settings
    • Output Format
    • Command Index
    • Releases
  • Sequencer Integration
    • Cloud Analysis Auto-launch
  • Tutorials
    • Nextflow Pipeline
      • Nextflow DRAGEN Pipeline
      • Nextflow: Scatter-gather Method
      • Nextflow: Pipeline Lift
        • Nextflow: Pipeline Lift: RNASeq
      • Nextflow CLI Workflow
    • CWL CLI Workflow
      • CWL Graphical Pipeline
      • CWL DRAGEN Pipeline
      • CWL: Scatter-gather Method
    • Base Basics
      • Base: SnowSQL
      • Base: Access Tables via Python
    • Bench ICA Python Library
    • API Beginner Guide
    • Launch Pipelines on CLI
      • Mount projectdata using CLI
    • Data Transfer Options
    • Pipeline Chaining on AWS
    • End-to-End User Flow: DRAGEN Analysis
  • Reference
    • Software Release Notes
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
    • Document Revision History
      • 2025
      • 2024
      • 2023
      • 2022
    • Known Issues
    • API
    • Pricing
    • Security and Compliance
    • Network Settings
    • ICA Terminology
    • Resources
    • Data Formats
    • FAQ
Powered by GitBook
On this page
  • Create the pipeline
  • Launch the pipeline
  • Monitor Analysis
  • View Results

Was this helpful?

Export as PDF
  1. Tutorials

Nextflow Pipeline

PreviousCloud Analysis Auto-launchNextNextflow DRAGEN Pipeline

Last updated 3 days ago

Was this helpful?

In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.

This tutorial references the example in the Nextflow documentation.

Create the pipeline

The first step in creating a pipeline is to create a project. For instructions on creating a project, see the page. In this tutorial, we'll use a project called "Getting Started".

After creating the project, select the project from the Projects view to enter the project. Within the project, navigate to the Flow > Pipelines view in the left navigation pane. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating the Nextflow pipeline.

In the Nextflow pipeline creation view, the Information tab is used to add information about the pipeline. Add values for the required Code (unique pipeline name) and Description fields.

  • Add the container directive to each process with the latest ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04_stable is used as default.

  • Add the publishDir directive with value 'out' to the reverse process.

  • Modify the reverse process to write the output to a file test.txt instead of stdout.

The description of the pipeline from the linked Nextflow docs:

This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq_.

The process that follows, receives these files and it simply reverses their content by using the rev command line tool.

Syntax example:

process iwantstandardsmallresources {
    cpus 2
    memory '8 GB'
    ...

Navigate to the Nextflow files > main.nf tab to add the definition to the pipeline. Since this is a single file pipeline, we won't need to add any additional definition files. Paste the following definition into the text editor:

#!/usr/bin/env nextflow

params.in = "$HOME/sample.fa"

sequences = file(params.in)
SPLIT = (System.properties['os.name'] == 'macOS' ? 'gcsplit' : 'csplit')

process splitSequences {

    container 'public.ecr.aws/lts/ubuntu:22.04'

    input:
    file 'input.fa' from sequences

    output:
    file 'seq_*' into records

    """
    $SPLIT input.fa '%^>%' '/^>/' '{*}' -f seq_
    """

}

process reverse {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    publishDir 'out'

    input:
    file x from records
    
    output:
    file 'test.txt'

    """
    cat $x | rev > test.txt
    """
}

Next we'll create the input form used when launching the pipeline. This is done through the XML Configuration tab. Since the pipeline takes in a single FASTA file as input, the XML-based input form will include a single file input.

Paste the below XML input form into the XML CONFIGURATION text editor. Click the Generate button to preview the launch form fields.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <pd:dataInputs>
        <pd:dataInput code="in" format="FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>fasta file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

With the definition added and the input form defined, the pipeline is complete.

On the Documentation tab, you can fill out additional information about your pipeline. This information will be presented under the Documentation tab whenever a user starts a new analysis on the pipeline.

Click the Save button at the top right. The pipeline will now be visible from the Pipelines view within the project.

Launch the pipeline

To upload the FASTA file to the project, first navigate to the Data section in the left navigation pane. In the Data view, drag and drop the FASTA file from your local machine into the indicated section in the browser. Once the file upload completes, the file record will show in the Data explorer. Ensure that the format of the file is set to "FASTA".

Now that the input data is uploaded, we can proceed to launch the pipeline. Navigate to the Analyses view and click the button to Start Analysis. Next, select your pipeline from the list. Alternatively you can start your pipeline from Projects > your_project > Flow > Pipelines > Start new analysis.

In the Launch Pipeline view, the input form fields are presented along with some required information to create the analysis.

  • Enter a User Reference (identifier) for the analysis. This will be used to identify the analysis record after launching.

  • Set the Entitlement Bundle (there will typically only be a single option).

  • In the Input Files section, select the FASTA file for the single input file. (chr1_GL383518v1_alt.fa)

  • Set the Storage size to small. This will attach a 1.2TB shared file system to the environment used to run the pipeline.

With the required information set, click the button to Start Analysis.

Monitor Analysis

After launching the pipeline, navigate to the Analyses view in the left navigation pane.

The analysis record will be visible from the Analyses view. The Status will transition through the analysis states as the pipeline progresses. It may take some time (depending on resource availability) for the environment to initialize and the analysis to move to the In Progress status.

Click the analysis record to enter the analysis details view.

Once the pipeline succeeds, the analysis record will show the "Succeeded" status. Do note that this may take considerable time if it is your first analysis because of the required resource management. (in our example, the analysis took 28 minutes)

From the analysis details view, the logs produced by each process within the nextflow pipeline are accessible via the Logs tab.

View Results

Analysis outputs are written to an output folder in the project with the naming convention {Analysis User Reference}-{Pipeline Code}-{GUID}. (1)

Inside of the analysis output folder are the files output by the analysis processes written to the 'out' folder. In this tutorial, the file test.txt (2) is written to by the reverse process. Navigating into the analysis output folder, clicking into the test.txt file details, and opening the VIEW tab (3) shows the output file contents.

The "Download" button (4) can be used to download the data to the local machine.

Next we'll add the Nextflow pipeline definition. The pipeline we're creating is a modified version of the example from the Nextflow documentation. Modifications to the pipeline definition from the nextflow documentation include:

Resources: For each process, you can use the and to set the . ICA will then determine the best matching compute type based on those settings. Suppose you set memory '10240 GB' and cpus 6, then ICA will determine you need standard-large ICA Compute Type.

Before we launch the pipeline, we'll need to upload a FASTA file to use as input. In this tutorial, we'll use a public FASTA file from the . Download the file and unzip to decompress the FASTA file.

Basic pipeline
memory directive
cpus directive
Compute Types
UCSC Genome Browser
chr1_GL383518v1_alt.fa.gz
Basic pipeline
Projects
tutorial-nextflowpipeline-1
tutorial-nextflowpipeline-2
tutorial-nextflowpipeline-3
tutorial-nextflowpipeline-4
tutorial-nextflowpipeline-5
tutorial-nextflowpipeline-6
tutorial-nextflowpipeline-7
tutorial-nextflowpipeline-9
tutorial-nextflowpipeline-10
tutorial-nextflowpipeline-11
tutorial-nextflowpipeline-13
tutorial-nextflowpipeline-12
tutorial-nextflowpipeline-15