# Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

* subject:
  * demographics such as age, sex, ancestry;
  * phenotypes and diseases;
  * biometrics such as body height, body mass index, etc.;
  * pathological classification, tumor stages, etc.;
  * family and patient medical history;
* sample:
  * sample type such as FFPE,
  * tissue type,
  * sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.

You can use these attributes while [creating a cohort](https://help.ica.illumina.com/project/p-cohorts/cohorts-create) to define the cases and/or controls that you want to include.

During [import](https://help.ica.illumina.com/project/p-cohorts/cohorts-import), you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the **Import files** page in the ICA Cohorts UI.

A metadata sheet will need to contain at least these four columns per row:

* **Subject ID** - identifier referring to individuals; use the column header "SubjectID".
* **Sample ID** - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same **SubjectID**; use the column header "SampleID".
* **Biological sex** - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM\_Sex" (demographics).
* **Sequencing technology** - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).

A description of all attributes and data types currently supported by ICA Cohorts can be found here: [ICA\_Cohorts\_Supported\_Attributes.xlsx](https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/ICA_Cohorts_Supported_Attributes.xlsx)

You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas ([TCGA](https://www.cancer.gov/ccg/research/genome-sequencing/tcga)) and their publicly available clincal attributes, here: [ICA\_Cohorts\_Example\_Metadata.tsv](https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/ICA_Cohorts_Example_Metadata.tsv)

A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: [PublicData\_AllConditionsSummarized.xlsx](https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/PublicData_AllConditionsSummarized.xlsx)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.ica.illumina.com/project/p-cohorts/cohorts-metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
