Creating a Pipeline from Scratch
Introduction
This tutorial shows you how to start a new pipeline from scratch
Preparation
Start Bench workspace
For this tutorial, any instance size will work, even the smallest standard-small.
Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.
A small amount of disk space (10GB) will be enough.
We are going to wrap the "gzip" linux compression tool with inputs:
1 file
compression level: integer between 1 and 9
We intentionally do not include sanity checks, to keep this scenario simple.
Creation of test file:
Wrapping in Nextflow
Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” directory:
nextflow-src/main.nf
Save this file as nextflow-src/main.nf, and check that it works:
Result
Wrap the Pipeline in Bench
We now need to:
Use Docker
Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools
Using Docker:
In NextFlow, Docker images can be specified at the process level
Each process may use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.
Specifying the Docker image is done with the container '<image_name:version>'
directive, which can be specified
at the start of each process definition
or in nextflow config files (preferred when following nf-core guidelines)
For example, create nextflow-src/nextflow.config:
We can now run with nextflow's -with-docker
option:
Following some nf-core best practices to make our source+test compatible with the pipeline-dev tools:
Create NextFlow “test” profile
Here is an example of “test” profile that can be added to nextflow-src/nextflow.config
to define some input values appropriate for a validation run:
nextflow-src/nextflow.config
With this profile defined, we can now run the same test as before with this command:
Create NextFlow “docker” profile
A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:
nextflow-src/nextflow.config
We can now run the same test as before with this command:
We also have enough structure in place to start using the pipeline-dev command:
In order to deploy our pipeline to ICA, we need to generate the user interface input form.
This is done by using nf-core's recommended nextflow_schema.json.
For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):
nextflow-src/nextflow_schema.json
In the next step, this gets converted to the ica-flow-config/inputForm.json
file.
Note: For large pipelines, as described on the nf-core website
Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.
We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.
Deploy as a Flow Pipeline
We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init
:
pipeline-dev.project_info
We can now run:
After generating the ICA-Flow-specific files in the ica-flow-config
directory (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).
It then asks if we want to update the latest version or create a new one.
Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.
At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
Run Validation Test in Flow
This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.
Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data
.
Result
Last updated
Was this helpful?