LogoLogo
Illumina Connected Software
  • Introduction
  • Get Started
    • About the Platform
    • Get Started
  • Home
    • Projects
    • Bundles
    • Event Log
    • Metadata Models
    • Docker Repository
    • Tool Repository
    • Storage
      • Connect AWS S3 Bucket
        • SSE-KMS Encryption
  • Project
    • Data
      • Data Integrity
    • Samples
    • Activity
    • Flow
      • Reference Data
      • Pipelines
        • Nextflow
        • CWL
        • XML Input Form
        • 🆕JSON-Based input forms
          • InputForm.json Syntax
          • JSON Scatter Gather Pipeline
        • Tips and Tricks
      • Analyses
    • Base
      • Tables
        • Data Catalogue
      • Query
      • Schedule
      • Snowflake
    • Bench
      • Workspaces
      • JupyterLab
      • 🆕Bring Your Own Bench Image
      • 🆕Bench Command Line Interface
      • 🆕Pipeline Development in Bench (Experimental)
        • Creating a Pipeline from Scratch
        • nf-core Pipelines
        • Updating an Existing Flow Pipeline
      • 🆕Containers in Bench
      • FUSE Driver
    • Cohorts
      • Create a Cohort
      • Import New Samples
      • Prepare Metadata Sheets
      • Precomputed GWAS and PheWAS
      • Cohort Analysis
      • Compare Cohorts
      • Cohorts Data in ICA Base
      • Oncology Walk-through
      • Rare Genetic Disorders Walk-through
      • Public Data Sets
    • Details
    • Team
    • Connectivity
      • Service Connector
      • Project Connector
    • Notifications
  • Command-Line Interface
    • Installation
    • Authentication
    • Data Transfer
    • Config Settings
    • Output Format
    • Command Index
    • Releases
  • Sequencer Integration
    • Cloud Analysis Auto-launch
  • Tutorials
    • Nextflow Pipeline
      • Nextflow DRAGEN Pipeline
      • Nextflow: Scatter-gather Method
      • Nextflow: Pipeline Lift
        • Nextflow: Pipeline Lift: RNASeq
      • Nextflow CLI Workflow
    • CWL CLI Workflow
      • CWL Graphical Pipeline
      • CWL DRAGEN Pipeline
      • CWL: Scatter-gather Method
    • Base Basics
      • Base: SnowSQL
      • Base: Access Tables via Python
    • Bench ICA Python Library
    • API Beginner Guide
    • Launch Pipelines on CLI
      • Mount projectdata using CLI
    • Data Transfer Options
    • Pipeline Chaining on AWS
    • End-to-End User Flow: DRAGEN Analysis
  • Reference
    • Software Release Notes
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
    • Document Revision History
      • 2025
      • 2024
      • 2023
      • 2022
    • Known Issues
    • API
    • Pricing
    • Security and Compliance
    • Network Settings
    • ICA Terminology
    • Resources
    • Data Formats
    • FAQ
Powered by GitBook
On this page
  • Introduction
  • Preparation
  • Wrapping in Nextflow
  • Wrap the Pipeline in Bench
  • Using Docker:
  • Create NextFlow “test” profile
  • Create NextFlow “docker” profile
  • Deploy as a Flow Pipeline
  • Run Validation Test in Flow

Was this helpful?

Export as PDF
  1. Project
  2. Bench
  3. Pipeline Development in Bench (Experimental)

Creating a Pipeline from Scratch

PreviousPipeline Development in Bench (Experimental)Nextnf-core Pipelines

Last updated 4 days ago

Was this helpful?

Introduction

This tutorial shows you how to start a new pipeline from scratch


Preparation

Start Bench workspace

  • For this tutorial, any instance size will work, even the smallest standard-small.

  • Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.

  • A small amount of disk space (10GB) will be enough.

We are going to wrap the "gzip" linux compression tool with inputs:

  • 1 file

  • compression level: integer between 1 and 9

We intentionally do not include sanity checks, to keep this scenario simple.

Creation of test file:

mkdir demo_gzip
cd demo_gzip
echo test > test_input.txt

Wrapping in Nextflow

Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:

mkdir nextflow-src
# Create nextflow-src/main.nf using contents below
vi nextflow-src/main.nf

nextflow-src/main.nf

nextflow.enable.dsl=2
 
process COMPRESS {
  input:
    path input_file
    val compression_level
 
  output:
    path "${input_file.simpleName}.gz" // .simpleName keeps just the filename
    publishDir 'out', mode: 'symlink'
 
  script:
    """
    gzip -c -${compression_level} ${input_file} > ${input_file.simpleName}.gz
    """
}
 
workflow {
    input_path = file(params.input_file)
    gzip_out = COMPRESS(input_path, params.compression_level)
}

Save this file as nextflow-src/main.nf, and check that it works:

nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5

Result


Wrap the Pipeline in Bench

We now need to:

  • Use Docker

  • Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools

Using Docker:

In NextFlow, Docker images can be specified at the process level

  • Each process may use a different docker image

  • It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified

  • at the start of each process definition

  • or in nextflow config files (preferred when following nf-core guidelines)

For example, create nextflow-src/nextflow.config:

process.container = 'ubuntu:latest'

We can now run with nextflow's -with-docker option:

nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5 -with-docker

Create NextFlow “test” profile

Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:

nextflow-src/nextflow.config

process.container = 'ubuntu:latest'
 
profiles {
  test {
    params {
      input_file = 'test_input.txt'
      compression_level = 5
    }
  }
}

With this profile defined, we can now run the same test as before with this command:

nextflow run nextflow-src/ -profile test -with-docker

Create NextFlow “docker” profile

A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:

nextflow-src/nextflow.config

process.container = 'ubuntu:latest'
 
profiles {
  test {
    params {
      input_file = 'test_input.txt'
      compression_level = 5
    }
  }
 
  docker {
    docker.enabled = true
  }
}

We can now run the same test as before with this command:

nextflow run nextflow-src/ -profile test,docker

We also have enough structure in place to start using the pipeline-dev command:

pipeline-dev run-in-bench

In order to deploy our pipeline to ICA, we need to generate the user interface input form.

This is done by using nf-core's recommended nextflow_schema.json.

For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):

nextflow-src/nextflow_schema.json

{
    "$defs": {
        "input_output_options": {
            "title": "Input/output options",
            "properties": {
                "input_file": {
                    "description": "Input file to compress",
                    "help_text": "The file that will get compressed",
                    "type": "string",
                    "format": "file-path"
                },
                "compression_level": {
                    "type": "integer",
                    "description": "Compression level to use (1-9)",
                    "default": 5,
                    "minimum": 1,
                    "maximum": 9
               }
            }
        }
    }
}

In the next step, this gets converted to the ica-flow-config/inputForm.json file.

Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.

We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.


Deploy as a Flow Pipeline

We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:

pipeline-dev.project_info

$ pipeline-dev project-info --init
 
pipeline-dev.project-info not found. Let's create it with 2 questions:
 
Please enter your project name: demo_gzip
Please enter a project description: Bench gzip demo

We can now run:

pipeline-dev deploy-as-flow-pipeline

After generating the ICA-Flow-specific files in the ica-flow-config folder (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).

It then asks if we want to update the latest version or create a new one.

Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.


Run Validation Test in Flow

pipeline-dev launch-validation-in-flow

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Result

/data/demo $ pipeline-dev launch-validation-in-flow

pipelineld: 331f209d-2a72-48cd-aa69-070142f57f73
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test demo_gzip
- Id: 17106efc-7884-4121-a66d-b551a782b620
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/17106efc-7884-4121-a66d-b551a782620

Following some nf-core to make our source+test compatible with the pipeline-dev tools:

Note: For large pipelines, as described on the nf-core

🆕
best practices
website
prepare linux tool + validation inputs
wrap in Nextflow
wrap the pipeline in Bench
deploy pipeline as an ICA Flow pipeline
launch Flow validation test from Bench