CWL: Scatter-gather Method
The following is a guide to how to achieve scattering in a CWL pipeline in the ICA platform.
First make a simple tool with 2 single-value file inputs called left and right. This tool will merely ‘cat’ the content of the given input files. Make sure to use a docker image containing the cat command (e.g. alpine).
#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: ScatterFeatureRequirement
hints:
- class: InlineJavascriptRequirement
label: catstdoutsinglevalue
doc: 'cat tool for testing: a simple tool with 2 single-value file inputs called left
and right. It just ‘cats’ the content of the given input files.'
stdout: $(inputs.outputName)
inputs:
left:
type: File
inputBinding:
position: 1
right:
type:
- File
- 'null'
inputBinding:
position: 2
outputName:
type: string
doc: Output file name
outputs:
tool_output:
type: stdout
baseCommand:
- cat

scatter_0
We’ll also be using another similar tool but this time with multi-value inputs:
#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: ScatterFeatureRequirement
hints:
- class: InlineJavascriptRequirement
label: catstdout2
doc: 'This cat tool can be used as a workaround for scatter-gather approach on ICA:
this step takes in the output array of the scattered step. The output of this step
can be mapped as pipeline output.'
stdout: $(inputs.outputName)
inputs:
left:
type:
type: array
items: File
inputBinding:
position: 1
right:
type:
- type: array
items: File
- 'null'
inputBinding:
position: 2
outputName:
type: string
doc: Output file name
outputs:
tool_output:
type: stdout
baseCommand:
- cat

scatter_1
Now we can build a pipeline using these tools:

scatter_2
Relevant aspects of this pipeline:
- LeftInput is multivalue
- The step ‘catstdoutsinglevalue’ has scattering configured: it scatters on the input named ‘left’ (see lower right of screenshot). This means that as many instances of this step will be executed as there are entries in the LeftInput array. To indicate that this step is executed multiple times, the icon of the LeftInput has doubled borders.
- Currently a direct mapping of the outputs of a scattered tool to pipeline outputs is not supported yet. To circumvent, the ‘catstdout2’ step takes in the output array of the scattered step. The output of this step can be mapped as pipeline output.
To test we run this pipeline and assign 3 files to the input in the following arrangement: file1.txt and file2.txt for LeftInput and fileB.txt for RightInput. The content of those files is file1.txt=’file1’, file2.txt=’file2’, fileB.txt=’fileB’. We will get as result an output file with content ‘file1fileBfile2fileB’. This is the resulting content because the first step was scattered resulting in 2 outputs ‘file1fileB’ and ‘file2fileB’ that are again concatenated in the 2nd step.
Last modified 4mo ago