Links

CWL: Scatter-gather Method

The following is a guide to how to achieve scattering in a CWL pipeline in the ICA platform.
First make a simple tool at System Settings > Tool Repository > + New Tool with 2 single-value file inputs called left and right. This tool will merely concatenate the content of the given input files.
  • Paste the content below on the Tool CWL tab.
  • On the information tab, enter a tool version and select a docker image containing the cat command (for example alpine).
#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: ScatterFeatureRequirement
hints:
- class: InlineJavascriptRequirement
label: catstdoutsinglevalue
doc: 'cat tool for testing: a simple tool with 2 single-value file inputs called left
and right. It just ‘cats’ the content of the given input files.'
stdout: $(inputs.outputName)
inputs:
left:
type: File
inputBinding:
position: 1
right:
type:
- File
- 'null'
inputBinding:
position: 2
outputName:
type: string
doc: Output file name
outputs:
tool_output:
type: stdout
baseCommand:
- cat
scatter_0
Now create another tool in the same way but with multi-value inputs:
#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: ScatterFeatureRequirement
hints:
- class: InlineJavascriptRequirement
label: catstdout2
doc: 'This cat tool can be used as a workaround for scatter-gather approach on ICA:
this step takes in the output array of the scattered step. The output of this step
can be mapped as pipeline output.'
stdout: $(inputs.outputName)
inputs:
left:
type:
type: array
items: File
inputBinding:
position: 1
right:
type:
- type: array
items: File
- 'null'
inputBinding:
position: 2
outputName:
type: string
doc: Output file name
outputs:
tool_output:
type: stdout
baseCommand:
- cat
scatter_1
Next, build a graphical CWL pipeline using these tools at Projects > your_project > Flow > Pipelines:
  • On the Definition tab, go to the tool repository and drag and drop the two tools which you just created on the pipeline editor.
  • Connect the catsdoutsinglevalue output to catsdout2 by hovering over the middle of the round, blue connector of catsdoutsinglevalue until the icon changes to a hand and then drag the connection to catsdout2. You can use the magnification symbols to make it easier to connect these tools.
  • Above the diagram, drag and drop two input files and an output file on to the pipeline editor and connect the blue markers to match the diagram below.
scatter_2
Relevant aspects of this pipeline:
  • LeftInput is multivalue
  • The step ‘catstdoutsinglevalue’ has scattering configured: it scatters on the input named ‘left’. This means that as many instances of this step will be executed as there are entries in the LeftInput array. To indicate that this step is executed multiple times, the icon of the LeftInput has doubled borders.
  • Currently a direct mapping of the outputs of a scattered tool to pipeline outputs is not supported yet. To circumvent, the ‘catstdout2’ step takes in the output array of the scattered step. The output of this step can be mapped as pipeline output.
To test your configuration,
  • Create 3 files in a text editor with as content file1.txt=’file1’, file2.txt=’file2’, fileB.txt=’fileB’.
  • Upload these files at projects > your_project > Data to the input in the following arrangement: file1.txt and file2.txt for LeftInput and fileB.txt for RightInput.
  • Run your pipeline at projects > your_project > Flow > Pipelines by selecting your pipeline and Start New Analysis
  • The result will be an output file with content ‘file1fileBfile2fileB’. This is the resulting content because the first step was scattered resulting in 2 outputs ‘file1fileB’ and ‘file2fileB’ that are again concatenated in the 2nd step.