1 of 4

Pipeline Development in Bench (Experimental)

Introduction

The Pipeline Development Kit in Bench makes it easy to create Nextflow pipelines for ICA Flow. This kit consists of a number of development tools which are installed in /data/.software (regardless of which Bench image is selected) and provides the following features:

Import to Bench
- From public nf-core pipelines
- From existing ICA Flow Nextflow pipelines
Run in Bench
Modify and re-run in Bench, providing fast development iterations
Deploy to Flow
Launch validation in Flow

Prerequisites

Recommended workspace size: Nf-core Nextflow pipelines typically require 4 or more cores to run.
The pipeline development tools require
- Conda which is automatically installed by “pipeline-dev” if conda-miniconda.installer.ica-userspace.sh is present in the image.
- Nextflow (version 24.10.2 is automatically installed using conda, or you can use other versions)
- git (automatically installed using conda)
- jq, curl (which should be made available in the image)

NextFlow Requirements / Best Practices

Pipeline development tools work best when the following items are defined:

Nextflow profiles:
- test profile, specifying inputs appropriate for a validation run
- docker profile, instructing NextFlow to use Docker
nextflow_schema.json, as described here. This is useful for the launch UI generation. The nf-core CLI tool (installable via pip install nf-core) offers extensive help to create and maintain this schema.

ICA Flow adds one additional constraint. The output directory out is the only one automatically copied to the Project data when an ICA Flow Analysis completes. The -outdir parameter recommended by nf-core should therefore be set to--outdir=out when running as a Flow pipeline.

Pipeline Development Tools

New Bench pipeline development tools only become active after a workspace reboot.

These are installed in /data/.software (which should be in your $PATH), the pipeline-dev script is the front-end to the other pipeline-dev-* tools.

Pipeline-dev fulfils a number of roles:

Checks that the environment contains the required tools (conda, nextflow, etc) and offers to install them if needed.
Checks that the fast data mounts are present (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.
Redirects stdout and stderr to .pipeline-dev.log, with the history of log files kept as .pipeline-dev.log.<log date>.
Launches the appropriate sub-tool.
Prints out errors with backtrace, to help report issues.

Usage

1) Starting a new Project

A pipeline-dev project relies on the following Folder structure, which is auto-generated when using the pipeline-dev import* tools.

If you start a project manually, you must follow the same folder structure.

Project base folder
- nextflow-src: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.
  - main.nf
  - nextflow.config
  - nextflow_schema.json
- pipeline-dev.project-info: contains project name, description, etc.
- nextflow-bench.config (automatically generated when needed): contains definitions for bench.
- ica-flow-config: Directory of files used when deploying pipeline to Flow.
  - inputForm.json (if not present, gets generated from nextflow-src/nextflow_schema.json): input form as defined in ICA Flow.
  - onSubmit.js, onRender.js (optional, generated at the same time as inputForm.json): javascript code to go with the input form.
  - launchPayload_inputFormValues.json (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.

Pipeline Sources

The above-mentioned project structure must be generated manually. The nf-core CLI tools can assist to generate the nextflow_schema.json. Tutorial Pipeline from Scratch goes into more details about this use case.

A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the nextflow-src subdirectory.

Tutorial Nf Core Pipelines goes into more details about this use case.

A directory called imported-flow-analysis is created and the analysis+pipeline assets are downloaded.

Tutorial Updating an Existing Flow Pipeline goes into more details about this use case.

Currently only pipelines with publicly available Docker images are supported. Pipelines with ICA-stored images are not yet supported.

2) Running in Bench

Optional parameters --local / --sge can be added to force the execution on the local workspace node, or on the workspace cluster (when available). Otherwise, the presence of a cluster is automatically detected and used.

The script then launches nextflow. The full nextflow command line is printed and launched.

In case of errors, full logs are saved as .pipeline-dev.log

Currently, not all corner cases are covered by command line options. Please start from the nextflow command printed by the tool and extend it based on your specific needs.

Output Example

Container (Docker) images

Nextflow can run processes with and without Docker images. In the context of pipeline development, the pipeline-dev tools assume Docker images are used, in particular during execution with the nextflow --profile docker.

In NextFlow, Docker images can be specified at the process level

This is done with the container "<image_name:version>" directive, which can be specified
- in nextflow config files (preferred method when following the nf-core best practices)
- or at the start of each process definition.
Each process can use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Resources such as #cpu and memory can be specified as described here See containers or our tutorials for details about Nextflow-Docker syntax.

Bench can push/pull/create/modify Docker images, as described in Containers.

3) Deploying to ICA Flow

This command does the following:

Generate the JSON file describing the ICA Flow user interface.
- If ica-flow-config/inputForm.json doesn’t exist: generate it from nextflow-src/nextflow_swagger.json .
Generate the JSON file containing the validation launch inputs.
- If ica-flow-config/launchPayload_inputFormValues.json doesn’t exist: generate it from nextflow --profile test inputs.
- If local files are used as validation inputs or as default input values:
  - copy them to /data/project/pipeline-dev-files/temp .
  - get their ICA file ids.
  - use these file ids in the launch specifications.
- If remote files are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.
Identify the pipeline name to use for this new pipeline deployment:
- If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.
- Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (_v<number>, _<number>, _<date>) .
- Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the --create/--update parameters when specified, for scripting without user interactions.
New ICA Flow pipeline gets created (except in case of pipeline update) .
- The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow
nextflow-src folder is uploaded file by file as pipeline assets.

Output Example:

The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

Opening the URL of the pipeline and clicking on Start Analysis shows the generated user interface:

4) Launching Validation in Flow

The ica-flow-config/launchPayload_inputFormValues.json file generated in the previous step is submitted to ICA Flow to start an analysis with the same validation inputs as “nextflow --profile test”.

Output Example:

The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

Creating a Pipeline from Scratch

Introduction

This tutorial shows you how to start a new pipeline from scratch

prepare linux tool + validation inputs
wrap in Nextflow
wrap the pipeline in Bench
deploy pipeline as an ICA Flow pipeline
launch Flow validation test from Bench

Preparation

Start Bench workspace

For this tutorial, any instance size will work, even the smallest standard-small.
Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.
A small amount of disk space (10GB) will be enough.

We are going to wrap the "gzip" linux compression tool with inputs:

1 file
compression level: integer between 1 and 9

We intentionally do not include sanity checks, to keep this scenario simple.

Creation of test file:

Wrapping in Nextflow

Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:

nextflow-src/main.nf

Save this file as nextflow-src/main.nf, and check that it works:

Result

Wrap the Pipeline in Bench

We now need to:

Use Docker
Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools

Using Docker:

In NextFlow, Docker images can be specified at the process level

Each process may use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified

at the start of each process definition
or in nextflow config files (preferred when following nf-core guidelines)

For example, create nextflow-src/nextflow.config:

We can now run with nextflow's -with-docker option:

Following some nf-core best practices to make our source+test compatible with the pipeline-dev tools:

Create NextFlow “test” profile

Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:

nextflow-src/nextflow.config

With this profile defined, we can now run the same test as before with this command:

Create NextFlow “docker” profile

A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:

nextflow-src/nextflow.config

We can now run the same test as before with this command:

We also have enough structure in place to start using the pipeline-dev command:

In order to deploy our pipeline to ICA, we need to generate the user interface input form.

This is done by using nf-core's recommended nextflow_schema.json.

For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):

nextflow-src/nextflow_schema.json

In the next step, this gets converted to the ica-flow-config/inputForm.json file.

Note: For large pipelines, as described on the nf-core website

Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.

We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.

Deploy as a Flow Pipeline

We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:

pipeline-dev.project_info

We can now run:

After generating the ICA-Flow-specific files in the ica-flow-config folder (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).

It then asks if we want to update the latest version or create a new one.

Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

Run Validation Test in Flow

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Result

nf-core Pipelines

Introduction

This tutorial shows you how to

Import any nf-core pipeline from their public repository.
Run the pipeline in Bench.
- monitor the execution
.

Start Bench workspace
- For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:
  - If using a cluster, choose standard-small or standard-medium for the workspace master node

If conda and/or nextflow are not installed, pipeline-dev will offer to install them.

The Nextflow files are pulled into the nextflow-src subfolder.

All nf-core pipelines conveniently define a "test" profile that specifies a set of validation inputs for the pipeline.

The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance.

When a pipeline is running locally (i.e. not on a Bench cluster), you can monitor the task execution from another terminal with docker ps

When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log

After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here). It then asks if you want to update the latest version or create a new one.

Choose "3" and enter a name of your choice to avoid conflicts with other users following this same tutorial.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

This launches an analysis in ICA Flow, using the same inputs as the nf-core pipeline's "test" profile.

Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder bench-pipeline-dev/temp-data.

Updating an Existing Flow Pipeline

Introduction

This tutorial shows you how to

import an existing ICA Flow pipeline with a supporting validation analysis
run the pipeline in Bench
- the execution
Iterative development: and validate in Bench
- Modify code
- Modify Docker image contents ( or method)

Make sure you have access in ICA Flow to:

the pipeline you want to work with
an analysis exercising this pipeline, preferably with a short execution time, to use as validation test

For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

When using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.
Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines

The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).

If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.

If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
A folder called imported-flow-analysis is created.
Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.

The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:

When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log

Nextflow files (located in the nextflow-src folder) are easy to modify. Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.

After modifying the source code, you can run a validation iteration with the same command as before:

Modifying the Docker image is the next step.

Nextflow (and ICA) allow the Docker images to be specified at different places:

in config files such as nextflow-src/nextflow.config
in nextflow code files:

grep container may help locate the correct files:

Use case: Update some of the software (mimalloc) by compiling a new version

With the appropriate permissions, you can then "docker login" and "docker push" the new image.

Update the nextflow code and/or configs to use the new image

Validate your changes in Bench:

It then asks if we want to update the latest version or create a new one.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Pipeline Development in Bench (Experimental)

Introduction

Import to Bench
- From public nf-core pipelines
- From existing ICA Flow Nextflow pipelines
Run in Bench
Modify and re-run in Bench, providing fast development iterations
Deploy to Flow
Launch validation in Flow

Prerequisites

Recommended workspace size: Nf-core Nextflow pipelines typically require 4 or more cores to run.
The pipeline development tools require
- Conda which is automatically installed by “pipeline-dev” if conda-miniconda.installer.ica-userspace.sh is present in the image.
- Nextflow (version 24.10.2 is automatically installed using conda, or you can use other versions)
- git (automatically installed using conda)
- jq, curl (which should be made available in the image)

NextFlow Requirements / Best Practices

Pipeline development tools work best when the following items are defined:

Nextflow profiles:
- test profile, specifying inputs appropriate for a validation run
- docker profile, instructing NextFlow to use Docker
nextflow_schema.json, as described here. This is useful for the launch UI generation. The nf-core CLI tool (installable via pip install nf-core) offers extensive help to create and maintain this schema.

Pipeline Development Tools

New Bench pipeline development tools only become active after a workspace reboot.

These are installed in /data/.software (which should be in your $PATH), the pipeline-dev script is the front-end to the other pipeline-dev-* tools.

Pipeline-dev fulfils a number of roles:

Checks that the environment contains the required tools (conda, nextflow, etc) and offers to install them if needed.
Checks that the fast data mounts are present (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.
Redirects stdout and stderr to .pipeline-dev.log, with the history of log files kept as .pipeline-dev.log.<log date>.
Launches the appropriate sub-tool.
Prints out errors with backtrace, to help report issues.

Usage

1) Starting a new Project

A pipeline-dev project relies on the following Folder structure, which is auto-generated when using the pipeline-dev import* tools.

If you start a project manually, you must follow the same folder structure.

Project base folder
- nextflow-src: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.
  - main.nf
  - nextflow.config
  - nextflow_schema.json
- pipeline-dev.project-info: contains project name, description, etc.
- nextflow-bench.config (automatically generated when needed): contains definitions for bench.
- ica-flow-config: Directory of files used when deploying pipeline to Flow.
  - inputForm.json (if not present, gets generated from nextflow-src/nextflow_schema.json): input form as defined in ICA Flow.
  - onSubmit.js, onRender.js (optional, generated at the same time as inputForm.json): javascript code to go with the input form.
  - launchPayload_inputFormValues.json (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.

Pipeline Sources

$ pipeline-dev import-from-nextflow <repo name e.g. nf-core/demo>

A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the nextflow-src subdirectory.

Tutorial Nf Core Pipelines goes into more details about this use case.

$ pipeline-dev import-from-flow [--analysis-id=…]

A directory called imported-flow-analysis is created and the analysis+pipeline assets are downloaded.

Tutorial Updating an Existing Flow Pipeline goes into more details about this use case.

Currently only pipelines with publicly available Docker images are supported. Pipelines with ICA-stored images are not yet supported.

2) Running in Bench

$ pipeline-dev run-in-bench [--local|--sge]

The script then launches nextflow. The full nextflow command line is printed and launched.

In case of errors, full logs are saved as .pipeline-dev.log

Currently, not all corner cases are covered by command line options. Please start from the nextflow command printed by the tool and extend it based on your specific needs.

Output Example

Container (Docker) images

In NextFlow, Docker images can be specified at the process level

This is done with the container "<image_name:version>" directive, which can be specified
- in nextflow config files (preferred method when following the nf-core best practices)
- or at the start of each process definition.
Each process can use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Resources such as #cpu and memory can be specified as described here See containers or our tutorials for details about Nextflow-Docker syntax.

Bench can push/pull/create/modify Docker images, as described in Containers.

3) Deploying to ICA Flow

$ pipeline-dev deploy-as-flow-pipeline [--create|--update]

This command does the following:

Generate the JSON file describing the ICA Flow user interface.
- If ica-flow-config/inputForm.json doesn’t exist: generate it from nextflow-src/nextflow_swagger.json .
Generate the JSON file containing the validation launch inputs.
- If ica-flow-config/launchPayload_inputFormValues.json doesn’t exist: generate it from nextflow --profile test inputs.
- If local files are used as validation inputs or as default input values:
  - copy them to /data/project/pipeline-dev-files/temp .
  - get their ICA file ids.
  - use these file ids in the launch specifications.
- If remote files are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.
Identify the pipeline name to use for this new pipeline deployment:
- If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.
- Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (_v<number>, _<number>, _<date>) .
- Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the --create/--update parameters when specified, for scripting without user interactions.
New ICA Flow pipeline gets created (except in case of pipeline update) .
- The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow
nextflow-src folder is uploaded file by file as pipeline assets.

Output Example:

The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

Opening the URL of the pipeline and clicking on Start Analysis shows the generated user interface:

4) Launching Validation in Flow

$ pipeline-dev launch-validation-in-flow

Output Example:

The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

Tutorials

Creating a Pipeline from Scratch
nf-core Pipelines
Updating an Existing Flow Pipeline

Creating a Pipeline from Scratch

Introduction

This tutorial shows you how to start a new pipeline from scratch

prepare linux tool + validation inputs
wrap in Nextflow
wrap the pipeline in Bench
deploy pipeline as an ICA Flow pipeline
launch Flow validation test from Bench

Preparation

Start Bench workspace

For this tutorial, any instance size will work, even the smallest standard-small.
Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.
A small amount of disk space (10GB) will be enough.

We are going to wrap the "gzip" linux compression tool with inputs:

1 file
compression level: integer between 1 and 9

We intentionally do not include sanity checks, to keep this scenario simple.

Creation of test file:

mkdir demo_gzip
cd demo_gzip
echo test > test_input.txt

Wrapping in Nextflow

Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:

mkdir nextflow-src
# Create nextflow-src/main.nf using contents below
vi nextflow-src/main.nf

nextflow-src/main.nf

nextflow.enable.dsl=2
 
process COMPRESS {
  input:
    path input_file
    val compression_level
 
  output:
    path "${input_file.simpleName}.gz" // .simpleName keeps just the filename
    publishDir 'out', mode: 'symlink'
 
  script:
    """
    gzip -c -${compression_level} ${input_file} > ${input_file.simpleName}.gz
    """
}
 
workflow {
    input_path = file(params.input_file)
    gzip_out = COMPRESS(input_path, params.compression_level)
}

Save this file as nextflow-src/main.nf, and check that it works:

nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5

Result

Wrap the Pipeline in Bench

We now need to:

Use Docker
Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools

Using Docker:

In NextFlow, Docker images can be specified at the process level

Each process may use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified

at the start of each process definition
or in nextflow config files (preferred when following nf-core guidelines)

For example, create nextflow-src/nextflow.config:

process.container = 'ubuntu:latest'

We can now run with nextflow's -with-docker option:

nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5 -with-docker

Following some nf-core best practices to make our source+test compatible with the pipeline-dev tools:

Create NextFlow “test” profile

Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:

nextflow-src/nextflow.config

process.container = 'ubuntu:latest'
 
profiles {
  test {
    params {
      input_file = 'test_input.txt'
      compression_level = 5
    }
  }
}

With this profile defined, we can now run the same test as before with this command:

nextflow run nextflow-src/ -profile test -with-docker

Create NextFlow “docker” profile

A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:

nextflow-src/nextflow.config

process.container = 'ubuntu:latest'
 
profiles {
  test {
    params {
      input_file = 'test_input.txt'
      compression_level = 5
    }
  }
 
  docker {
    docker.enabled = true
  }
}

We can now run the same test as before with this command:

nextflow run nextflow-src/ -profile test,docker

We also have enough structure in place to start using the pipeline-dev command:

pipeline-dev run-in-bench

In order to deploy our pipeline to ICA, we need to generate the user interface input form.

This is done by using nf-core's recommended nextflow_schema.json.

For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):

nextflow-src/nextflow_schema.json

{
    "$defs": {
        "input_output_options": {
            "title": "Input/output options",
            "properties": {
                "input_file": {
                    "description": "Input file to compress",
                    "help_text": "The file that will get compressed",
                    "type": "string",
                    "format": "file-path"
                },
                "compression_level": {
                    "type": "integer",
                    "description": "Compression level to use (1-9)",
                    "default": 5,
                    "minimum": 1,
                    "maximum": 9
               }
            }
        }
    }
}

In the next step, this gets converted to the ica-flow-config/inputForm.json file.

Note: For large pipelines, as described on the nf-core website

Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.

We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.

Deploy as a Flow Pipeline

We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:

pipeline-dev.project_info

$ pipeline-dev project-info --init
 
pipeline-dev.project-info not found. Let's create it with 2 questions:
 
Please enter your project name: demo_gzip
Please enter a project description: Bench gzip demo

We can now run:

pipeline-dev deploy-as-flow-pipeline

It then asks if we want to update the latest version or create a new one.

Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

Run Validation Test in Flow

pipeline-dev launch-validation-in-flow

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Result

/data/demo $ pipeline-dev launch-validation-in-flow

pipelineld: 331f209d-2a72-48cd-aa69-070142f57f73
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test demo_gzip
- Id: 17106efc-7884-4121-a66d-b551a782b620
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/17106efc-7884-4121-a66d-b551a782620

Updating an Existing Flow Pipeline

Introduction

This tutorial shows you how to

import an existing ICA Flow pipeline with a supporting validation analysis
run the pipeline in Bench
- the execution
Iterative development: and validate in Bench
- Modify code
- Modify Docker image contents ( or method)

Make sure you have access in ICA Flow to:

the pipeline you want to work with
an analysis exercising this pipeline, preferably with a short execution time, to use as validation test

For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

When using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.
Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines

The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).

If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.

If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
A folder called imported-flow-analysis is created.
Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.

The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:

When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log

After modifying the source code, you can run a validation iteration with the same command as before:

Modifying the Docker image is the next step.

Nextflow (and ICA) allow the Docker images to be specified at different places:

in config files such as nextflow-src/nextflow.config
in nextflow code files:

grep container may help locate the correct files:

Use case: Update some of the software (mimalloc) by compiling a new version

With the appropriate permissions, you can then "docker login" and "docker push" the new image.

Update the nextflow code and/or configs to use the new image

Validate your changes in Bench:

It then asks if we want to update the latest version or create a new one.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Pipeline Development in Bench (Experimental)

Introduction

Prerequisites

NextFlow Requirements / Best Practices

Pipeline Development Tools

Usage

1) Starting a new Project

Pipeline Sources

2) Running in Bench

Output Example

Container (Docker) images

3) Deploying to ICA Flow

4) Launching Validation in Flow

Tutorials

Creating a Pipeline from Scratch

Introduction

Preparation

Creation of test file:

Wrapping in Nextflow

nextflow-src/main.nf

Result

Wrap the Pipeline in Bench

Using Docker:

Create NextFlow “test” profile

nextflow-src/nextflow.config

Create NextFlow “docker” profile

nextflow-src/nextflow.config

nextflow-src/nextflow_schema.json

Deploy as a Flow Pipeline

pipeline-dev.project_info

Run Validation Test in Flow

Result

nf-core Pipelines

Introduction

Updating an Existing Flow Pipeline

Introduction

Pipeline Development in Bench (Experimental)

Introduction

Prerequisites

NextFlow Requirements / Best Practices

Pipeline Development Tools

Usage

1) Starting a new Project

Pipeline Sources

2) Running in Bench

Output Example

Container (Docker) images

3) Deploying to ICA Flow

4) Launching Validation in Flow

Tutorials

Creating a Pipeline from Scratch

Introduction

Preparation

Creation of test file:

Wrapping in Nextflow

nextflow-src/main.nf

Result

Wrap the Pipeline in Bench

Using Docker:

Create NextFlow “test” profile

nextflow-src/nextflow.config

Create NextFlow “docker” profile

nextflow-src/nextflow.config

nextflow-src/nextflow_schema.json

Deploy as a Flow Pipeline

pipeline-dev.project_info

Run Validation Test in Flow

Result

nf-core Pipelines

Introduction

Preparation

Import nf-core Pipeline to Bench

Result

Run Validation Test in Bench

Result

Monitoring

Data Locations

Deploy as Flow Pipeline

Run Validation Test in Flow

Hints