Links

Nextflow: RNASeq Lift

How to lift a simple NextFlow pipeline to ICA?

In this tutorial we are lifting RNASeq Nextflow pipeline from https://www.nextflow.io/example4.html.

How to modify the main.nf file

The following comparison highlights the differences between the original file and the version for deployment in ICA. The main difference is the explicit specification of containers and pods within processes. Additionally, some channels' specification modified, and debugging message added.
#!/usr/bin/env nextflow
/*
* The following pipeline parameters specify the reference genomes
* and read pairs and can be provided as command line options
*/
-params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
-params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"
+println("All input parameters: ${params}")
workflow {
- read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
+ read_pairs_ch = channel.fromFilePairs("${params.reads}/*_{1,2}.fq")
- INDEX(params.transcriptome)
+ INDEX(Channel.fromPath(params.transcriptome))
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)
}
process INDEX {
+ container 'quay.io/nextflow/rnaseq-nf:v1.1'
+ pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
process FASTQC {
+ container 'quay.io/nextflow/rnaseq-nf:v1.1'
+ pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
tag "FASTQC on $sample_id"
publishDir params.outdir
input:
tuple val(sample_id), path(reads)
output:
path "fastqc_${sample_id}_logs"
script:
- """
- fastqc.sh "$sample_id" "$reads"
- """
+ """
+ # we need to explicitly specify the output directory for fastqc tool
+ # we are creating one using sample_id variable
+ mkdir fastqc_${sample_id}_logs
+ fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}
+ """
}
process QUANT {
+ container 'quay.io/nextflow/rnaseq-nf:v1.1'
+ pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
tag "$pair_id"
publishDir params.outdir
input:
path index
tuple val(pair_id), path(reads)
output:
path pair_id
script:
"""
salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
"""
}

The XML configuration

In the XML configuration the input files and the settings are specified. For this particular pipeline one needs to specify the transcriptome and the reads directory:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="reads" format="UNKNOWN" type="DIRECTORY" required="true" multiValue="false">
<pd:label>Folder with FASTQ files</pd:label>
<pd:description></pd:description>
</pd:dataInput>
<pd:dataInput code="transcriptome" format="FASTA" type="FILE" required="true" multiValue="false">
<pd:label>FASTA</pd:label>
<pd:description>FASTA file</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>