# XML Input Form

Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-2d2071cf45820bb5f3d9d6e9d9650ec8402bbe5f%2Fimage%20(74).png?alt=media" alt=""><figcaption></figcaption></figure>

The input form XML must adhere to the input form schema.

## Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

{% code overflow="wrap" %}

```xml
<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <dataInputs>
    </dataInputs>
    <steps>
    </steps>
</pipeline>
```

{% endcode %}

## Files

The input files are specified within a single **DataInputs** node. An individual input is then specified in a separate **DataInput** node. A **DataInput** node contains following attributes:

* *code*: an unique id. Required.
* *format*: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
* *type*: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
* *required*: is this input required for the execution of a pipeline? Required.
* *multiValue*: are multiple files as an input allowed? Required.
* *dataFilter*: TBD. Optional.

Additionally, **DataInput** has two elements: *label* for labelling the input and *description* for a free text description of the input.

### Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

{% code overflow="wrap" %}

```xml
        <pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>Input file</pd:label>
            <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
        </pd:dataInput>
```

{% endcode %}

### Folder as an input

To use a folder as an input the following form is required:

{% code overflow="wrap" %}

```xml
    <pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
         <pd:label>fastq folder path</pd:label>
        <pd:description>Providing Fastq folder</pd:description>
    </pd:dataInput>
```

{% endcode %}

### Multiple files as an input

For multiple files, set the attribute *multiValue* to true. This will make it so the variable is considered to be of **type list \[]**, so adapt your pipeline when changing from single value to multiValue.

{% code overflow="wrap" %}

```xml
<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
    <pd:label>Tumor FASTQs</pd:label>
    <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
    </pd:description>
</pd:dataInput>
```

{% endcode %}

## Settings

Settings (as opposed to files) are specified within the **steps** node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: *steps* > *step* > *tool* > *parameter*. The *parameter* node must contain following attributes:

* *code*: unique id. This is the parameter name that is passed to the workflow
* *minValues*: how many values (at least) should be specified for this setting. If this setting is required, `minValues` should be set to 1.
* *maxValues*: how many values (at most) should be specified for this setting
* *classification*: is this setting specified by the user?

In the code below a string setting with the identifier *inp1* is specified.

{% code overflow="wrap" %}

```xml
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description>General parameters</pd:description>
            <pd:tool code="generalparameters">
                <pd:label>generalparameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
                    <pd:label>inp1</pd:label>
                    <pd:description>first</pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
```

{% endcode %}

Examples of the following types of settings are shown in the subsequent sections. Within each type, the `value` tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has **no impact on analyses launched via the API.**

### Integers

For an integer setting the following schema with an element *integerType* is to be used. To define an allowed range use the attributes *minimumValue* and *maximumValue*.

{% code overflow="wrap" %}

```xml
<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
    <pd:label>Seed Length</pd:label>
    <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
    </pd:description>
    <pd:integerType minimumValue="10" maximumValue="50"/>
    <pd:value>21</pd:value>
</pd:parameter>
```

{% endcode %}

### Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

{% code overflow="wrap" %}

```xml
<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
    <pd:label>Segmentation Algorithm</pd:label>
    <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
    </pd:description>
    <pd:optionsType>
        <pd:option>CBS</pd:option>
        <pd:option>SLM</pd:option>
        <pd:option>HSLM</pd:option>
        <pd:option>ASLM</pd:option>
    </pd:optionsType>
    <pd:value>false</pd:value>
</pd:parameter>
```

{% endcode %}

Option types can also be used to specify a boolean, for example

{% code overflow="wrap" %}

```xml
<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
    <pd:label>Map/Align Output</pd:label>
    <pd:description></pd:description>
    <pd:optionsType>
        <pd:option>BAM</pd:option>
        <pd:option>CRAM</pd:option>
    </pd:optionsType>
    <pd:value>BAM</pd:value>
</pd:parameter>
```

{% endcode %}

### Strings

For a string setting the following schema with an element `stringType` is to be used.

{% code overflow="wrap" %}

```xml
<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
    <pd:label>Output File Prefix</pd:label>
    <pd:description></pd:description>
    <pd:stringType/>
    <pd:value>tumor</pd:value>
</pd:parameter>
```

{% endcode %}

### Booleans

For a boolean setting, `booleanType` can be used.

```xml
<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
    <pd:label>quick_qc</pd:label>
    <pd:description></pd:description>
    <pd:booleanType/>
    <pd:value></pd:value>
</pd:parameter>
```

## Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the *file* parameter is set, it will be used. If the *str* parameter is set but *file* is not, the *str* parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

```groovy
nextflow.enable.dsl = 2

// Define parameters with default values
params.file = false
params.str = false

// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
    error "You must specify at least one input: --file or --str"
}

process printInputs {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    file(input_file)

    script:
    """
    echo "File contents:"
    cat $input_file
    """
}

process printInputs2 {

    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    val(input_str)

    script:
    """
    echo "String input: $input_str"
    """
}

workflow {
    if (params.file) {
        file_ch = Channel.fromPath(params.file)
        file_ch.view()
        str_ch = Channel.empty()
        printInputs(file_ch)
    }
    else {
        file_ch = Channel.empty()
        str_ch = Channel.of(params.str)
        str_ch.view()
        file_ch.view()
        printInputs2(str_ch)
    } 
}
```

```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>Generic file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="general">
            <pd:label>General Options</pd:label>
            <pd:description locked="false"></pd:description>
            <pd:tool code="general">
                <pd:label locked="false"></pd:label>
                <pd:description locked="false"></pd:description>
                <pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
                    <pd:label>String</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value>string</pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>
```
