LogoLogo
Illumina Connected Software
  • Introduction
  • Get Started
    • About the Platform
    • Get Started
  • Home
    • Projects
    • Bundles
    • Event Log
    • Metadata Models
    • Docker Repository
    • Tool Repository
    • Storage
      • Connect AWS S3 Bucket
        • SSE-KMS Encryption
  • Project
    • Data
      • Data Integrity
    • Samples
    • Activity
    • Flow
      • Reference Data
      • Pipelines
        • Nextflow
        • CWL
        • XML Input Form
        • 🆕JSON-Based input forms
          • InputForm.json Syntax
          • JSON Scatter Gather Pipeline
        • Tips and Tricks
      • Analyses
    • Base
      • Tables
        • Data Catalogue
      • Query
      • Schedule
      • Snowflake
    • Bench
      • Workspaces
      • JupyterLab
      • Bring Your Own Bench Image
      • Bench Command Line Interface
      • Pipeline Development in Bench (Experimental)
        • Creating a Pipeline from Scratch
        • nf-core Pipelines
        • Updating an Existing Flow Pipeline
      • Containers in Bench
      • FUSE Driver
    • Cohorts
      • Create a Cohort
      • Import New Samples
      • Prepare Metadata Sheets
      • Precomputed GWAS and PheWAS
      • Cohort Analysis
      • Compare Cohorts
      • Cohorts Data in ICA Base
      • Oncology Walk-through
      • Rare Genetic Disorders Walk-through
      • Public Data Sets
    • Details
    • Team
    • Connectivity
      • Service Connector
      • Project Connector
    • Notifications
  • Command-Line Interface
    • Installation
    • Authentication
    • Data Transfer
    • Config Settings
    • Output Format
    • Command Index
    • Releases
  • Sequencer Integration
    • Cloud Analysis Auto-launch
  • Tutorials
    • Nextflow Pipeline
      • Nextflow DRAGEN Pipeline
      • Nextflow: Scatter-gather Method
      • Nextflow: Pipeline Lift
        • Nextflow: Pipeline Lift: RNASeq
      • Nextflow CLI Workflow
    • CWL CLI Workflow
      • CWL Graphical Pipeline
      • CWL DRAGEN Pipeline
      • CWL: Scatter-gather Method
    • Base Basics
      • Base: SnowSQL
      • Base: Access Tables via Python
    • Bench ICA Python Library
    • API Beginner Guide
    • Launch Pipelines on CLI
      • Mount projectdata using CLI
    • Data Transfer Options
    • Pipeline Chaining on AWS
    • End-to-End User Flow: DRAGEN Analysis
  • Reference
    • Software Release Notes
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
    • Document Revision History
      • 2025
      • 2024
      • 2023
      • 2022
    • Known Issues
    • API
    • Pricing
    • Security and Compliance
    • Network Settings
    • ICA Terminology
    • Resources
    • Data Formats
    • FAQ
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Project
  2. Cohorts

Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

  • subject:

    • demographics such as age, sex, ancestry;

    • phenotypes and diseases;

    • biometrics such as body height, body mass index, etc.;

    • pathological classification, tumor stages, etc.;

    • family and patient medical history;

  • sample:

    • sample type such as FFPE,

    • tissue type,

    • sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.

You can use these attributes while creating a cohort to define the cases and/or controls that you want to include.

During import, you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the Import files page in the ICA Cohorts UI.

A metadata sheet will need to contain at least these four columns per row:

  • Subject ID - identifier referring to individuals; use the column header "SubjectID".

  • Sample ID - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same SubjectID; use the column header "SampleID".

  • Biological sex - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM_Sex" (demographics).

  • Sequencing technology - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).

A description of all attributes and data types currently supported by ICA Cohorts can be found here: ICA_Cohorts_Supported_Attributes.xlsx

You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas (TCGA) and their publicly available clincal attributes, here: ICA_Cohorts_Example_Metadata.tsv

A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: PublicData_AllConditionsSummarized.xlsx

PreviousImport New SamplesNextPrecomputed GWAS and PheWAS

Last updated 2 months ago

Was this helpful?