# XML Input Form

Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.

<figure><img src="/files/gZKFQC35Sjcy9f5Pgl1x" alt=""><figcaption></figcaption></figure>

The input form XML must adhere to the input form schema.

## Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

{% code overflow="wrap" %}

```xml
<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <dataInputs>
    </dataInputs>
    <steps>
    </steps>
</pipeline>
```

{% endcode %}

## Files

The input files are specified within a single **DataInputs** node. An individual input is then specified in a separate **DataInput** node. A **DataInput** node contains following attributes:

* *code*: an unique id. Required.
* *format*: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
* *type*: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
* *required*: is this input required for the execution of a pipeline? Required.
* *multiValue*: are multiple files as an input allowed? Required.
* *dataFilter*: TBD. Optional.

Additionally, **DataInput** has two elements: *label* for labelling the input and *description* for a free text description of the input.

### Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

{% code overflow="wrap" %}

```xml
        <pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>Input file</pd:label>
            <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
        </pd:dataInput>
```

{% endcode %}

### Folder as an input

To use a folder as an input the following form is required:

{% code overflow="wrap" %}

```xml
    <pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
         <pd:label>fastq folder path</pd:label>
        <pd:description>Providing Fastq folder</pd:description>
    </pd:dataInput>
```

{% endcode %}

### Multiple files as an input

For multiple files, set the attribute *multiValue* to true. This will make it so the variable is considered to be of **type list \[]**, so adapt your pipeline when changing from single value to multiValue.

{% code overflow="wrap" %}

```xml
<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
    <pd:label>Tumor FASTQs</pd:label>
    <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
    </pd:description>
</pd:dataInput>
```

{% endcode %}

## Settings

Settings (as opposed to files) are specified within the **steps** node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: *steps* > *step* > *tool* > *parameter*. The *parameter* node must contain following attributes:

* *code*: unique id. This is the parameter name that is passed to the workflow
* *minValues*: how many values (at least) should be specified for this setting. If this setting is required, `minValues` should be set to 1.
* *maxValues*: how many values (at most) should be specified for this setting
* *classification*: is this setting specified by the user?

In the code below a string setting with the identifier *inp1* is specified.

{% code overflow="wrap" %}

```xml
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description>General parameters</pd:description>
            <pd:tool code="generalparameters">
                <pd:label>generalparameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
                    <pd:label>inp1</pd:label>
                    <pd:description>first</pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
```

{% endcode %}

Examples of the following types of settings are shown in the subsequent sections. Within each type, the `value` tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has **no impact on analyses launched via the API.**

### Integers

For an integer setting the following schema with an element *integerType* is to be used. To define an allowed range use the attributes *minimumValue* and *maximumValue*.

{% code overflow="wrap" %}

```xml
<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
    <pd:label>Seed Length</pd:label>
    <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
    </pd:description>
    <pd:integerType minimumValue="10" maximumValue="50"/>
    <pd:value>21</pd:value>
</pd:parameter>
```

{% endcode %}

### Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

{% code overflow="wrap" %}

```xml
<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
    <pd:label>Segmentation Algorithm</pd:label>
    <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
    </pd:description>
    <pd:optionsType>
        <pd:option>CBS</pd:option>
        <pd:option>SLM</pd:option>
        <pd:option>HSLM</pd:option>
        <pd:option>ASLM</pd:option>
    </pd:optionsType>
    <pd:value>false</pd:value>
</pd:parameter>
```

{% endcode %}

Option types can also be used to specify a boolean, for example

{% code overflow="wrap" %}

```xml
<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
    <pd:label>Map/Align Output</pd:label>
    <pd:description></pd:description>
    <pd:optionsType>
        <pd:option>BAM</pd:option>
        <pd:option>CRAM</pd:option>
    </pd:optionsType>
    <pd:value>BAM</pd:value>
</pd:parameter>
```

{% endcode %}

### Strings

For a string setting the following schema with an element `stringType` is to be used.

{% code overflow="wrap" %}

```xml
<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
    <pd:label>Output File Prefix</pd:label>
    <pd:description></pd:description>
    <pd:stringType/>
    <pd:value>tumor</pd:value>
</pd:parameter>
```

{% endcode %}

### Booleans

For a boolean setting, `booleanType` can be used.

```xml
<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
    <pd:label>quick_qc</pd:label>
    <pd:description></pd:description>
    <pd:booleanType/>
    <pd:value></pd:value>
</pd:parameter>
```

## Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the *file* parameter is set, it will be used. If the *str* parameter is set but *file* is not, the *str* parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

```groovy
nextflow.enable.dsl = 2

// Define parameters with default values
params.file = false
params.str = false

// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
    error "You must specify at least one input: --file or --str"
}

process printInputs {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    file(input_file)

    script:
    """
    echo "File contents:"
    cat $input_file
    """
}

process printInputs2 {

    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    val(input_str)

    script:
    """
    echo "String input: $input_str"
    """
}

workflow {
    if (params.file) {
        file_ch = Channel.fromPath(params.file)
        file_ch.view()
        str_ch = Channel.empty()
        printInputs(file_ch)
    }
    else {
        file_ch = Channel.empty()
        str_ch = Channel.of(params.str)
        str_ch.view()
        file_ch.view()
        printInputs2(str_ch)
    } 
}
```

```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>Generic file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="general">
            <pd:label>General Options</pd:label>
            <pd:description locked="false"></pd:description>
            <pd:tool code="general">
                <pd:label locked="false"></pd:label>
                <pd:description locked="false"></pd:description>
                <pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
                    <pd:label>String</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value>string</pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.ica.illumina.com/project/p-flow/f-pipelines/pi-inputform.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
