Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

  • subject:

    • demographics such as age, sex, ancestry;

    • phenotypes and diseases;

    • biometrics such as body height, body mass index, etc.;

    • pathological classification, tumor stages, etc.;

    • family and patient medical history;

  • sample:

    • sample type such as FFPE,

    • tissue type,

    • sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.

You can use these attributes while creating a cohort to define the cases and/or controls that you want to include.

During import, you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the Import files page in the ICA Cohorts UI.

A metadata sheet will need to contain at least these four columns per row:

  • Subject ID - identifier referring to individuals; use the column header "SubjectID".

  • Sample ID - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same SubjectID; use the column header "SampleID".

  • Biological sex - can be "Female (XX)", "Female", or simply "F"; "Male (XY)", "Male", "M"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM_Sex" (demographics).

  • Sequencing technology - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).

A description of all attributes and data types currently supported by ICA Cohorts can be found here: ICA_Cohorts_Supported_Attributes.xlsx

You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas (TCGA) and their publicly available clincal attributes, here: ICA_Cohorts_Example_Metadata.tsv

A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: PublicData_AllConditionsSummarized.xlsx

Last updated