Input Form

Pipelines defined using the "Code" mode require an XML-based input form to define the fields shown on the launch view in the user interface (UI). This is defined in the "XML Configuration" tab of the pipeline editing view.

The input form XML must adhere to the input form schema.

Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <dataInputs>
    </dataInputs>
    <steps>
    </steps>
</pipeline>

Files

The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:

  • code: an unique id. Required.

  • format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.

  • type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.

  • required: is this input required for the execution of a pipeline? Required.

  • multiValue: are multiple files as an input allowed? Required.

  • dataFilter: TBD. Optional.

Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.

Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

        <pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>Input file</pd:label>
            <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
        </pd:dataInput>

Folder as an input

To use a folder as an input the following form is required:

    <pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
         <pd:label>fastq folder path</pd:label>
        <pd:description>Providing Fastq folder</pd:description>
    </pd:dataInput>

Multiple files as an input

For multiple files set the attribute multiValue to true.

<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
    <pd:label>Tumor FASTQs</pd:label>
    <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
    </pd:description>
</pd:dataInput>

Settings

Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:

  • code: unique id. This is the parameter name that is passed to the workflow

  • minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.

  • maxValues: how many values (at most) should be specified for this setting

  • classification: is this setting specified by the user?

In the code below a string setting with the identifier inp1 is specified.

    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description>General parameters</pd:description>
            <pd:tool code="generalparameters">
                <pd:label>generalparameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
                    <pd:label>inp1</pd:label>
                    <pd:description>first</pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>

Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.

Integers

For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.

<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
    <pd:label>Seed Length</pd:label>
    <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
    </pd:description>
    <pd:integerType minimumValue="10" maximumValue="50"/>
    <pd:value>21</pd:value>
</pd:parameter>

Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
    <pd:label>Segmentation Algorithm</pd:label>
    <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
    </pd:description>
    <pd:optionsType>
        <option>CBS</option>
        <option>SLM</option>
        <option>HSLM</option>
        <option>ASLM</option>
    </pd:optionsType>
    <pd:value>false</pd:value>
</pd:parameter>

Option types can also be used to specify a boolean, for example

<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
    <pd:label>Map/Align Output</pd:label>
    <pd:description></pd:description>
    <pd:optionsType>
        <pd:option>BAM</pd:option>
        <pd:option>CRAM</pd:option>
    </pd:optionsType>
    <pd:value>BAM</pd:value>
</pd:parameter>

Strings

For a string setting the following schema with an element stringType is to be used.

<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
    <pd:label>Output File Prefix</pd:label>
    <pd:description></pd:description>
    <pd:stringType/>
    <pd:value>tumor</pd:value>
</pd:parameter>

Booleans

For a boolean setting, booleanType can be used.

<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
    <pd:label>quick_qc</pd:label>
    <pd:description></pd:description>
    <pd:booleanType/>
    <pd:value></pd:value>
</pd:parameter>

Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

nextflow.enable.dsl = 2

// Define parameters with default values
params.file = false
params.str = false

// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
    error "You must specify at least one input: --file or --str"
}

process printInputs {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    file(input_file)

    script:
    """
    echo "File contents:"
    cat $input_file
    """
}

process printInputs2 {

    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    val(input_str)

    script:
    """
    echo "String input: $input_str"
    """
}

workflow {
    if (params.file) {
        file_ch = Channel.fromPath(params.file)
        file_ch.view()
        str_ch = Channel.empty()
        printInputs(file_ch)
    }
    else {
        file_ch = Channel.empty()
        str_ch = Channel.of(params.str)
        str_ch.view()
        file_ch.view()
        printInputs2(str_ch)
    } 
}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>Generic file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="general">
            <pd:label>General Options</pd:label>
            <pd:description locked="false"></pd:description>
            <pd:tool code="general">
                <pd:label locked="false"></pd:label>
                <pd:description locked="false"></pd:description>
                <pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
                    <pd:label>String</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value>string</pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

Last updated