Links

Cohorts Data in ICA Base

ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.

ICA Cohorts Base Tables

After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.
  1. 1.
    Post ingestion, data will be represented in Base.
  2. 2.
    Select BASE from the ICA left-navigation and click Query.
  3. 3.
    Under the New Query window, a list of tables is displayed. Expand the Shared Database for Project \<your project name\> .
  4. 4.
    Cohorts tables will be displayed.
  5. 5.
    To preview the table and fields click each view listed.
  6. 6.
    Clicking any of these views then selecting PREVIEW on the right-hand side will show you a preview of the data in the tables.
Note: If your ingestion includes Somatic variants, there will be two molecular tables: ANNOTATED_SOMATIC_MUTATIONS and ANNOTATED_VARIANTS. All ingestions will include a PHENOTYPE table.
Note: The PHENOTYPE table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the PHENOTYPE table.

Phenotype Data

Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample Identifier
SUBJECTID
STRING
Identifer for Subject entity
STUDY
STRING
Study designation
AGE
NUMERIC
Age in years
SEX
STRING
Sex field to drive annotation
POPULATION
STRING
Population Designation for 1000 Genomes Project
SUPERPOPULATION
STRING
Superpopulation Designation from 1000 Genomes Project
RACE
STRING
Race according to NIH standard
CONDITION_ONTOLOGIES
VARIANT
Diagnosis Ontology Source
CONDITION_IDS
VARIANT
Diagnosis Concept Ids
CONDITIONS
VARIANT
Diagnosis Names
HARMONIZED_CONDITIONS
VARIANT
Diagnosis High-level concept to drive UI
LIBRARYTYPE
STRING
Seqencing technology
ANALYTE
STRING
Substance sequenced
TISSUE
STRING
Tissue source
TUMOR_OR_NORMAL
STRING
Tumor designation for somatic
GENOMEBUILD
STRING
Genome Build to drive annotations - hg38 only
SAMPLE_BARCODE_VCF
STRING
Sample ID from VCF
AFFECTED_STATUS
NUMERIC
Affected, Unaffected, or Unknown for Family Based Analysis
FAMILY_RELATIONSHIP
STRING
Relationship designation for Family Based Analysis

Annotated Variants

This table will be available for all projects with ingested molecular data
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
CHROMOSOMEID
NUMERIC
Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt
DBSNP
STRING
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
NIRVANA_VID
STRING
Broad Institute VID: "1-12345678-A-C"
VARIANT_TYPE
STRING
Description of Variant Type (e.g. SNV, Deletion, Insertion)
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
DENOVO
BOOLEAN
true / false
GENOTYPE
STRING
"G|T"
READ_DEPTH
NUMERIC
Sequencing read depth
ALLELE_COUNT
NUMERIC
Counts of each alternate allele for each site across all samples
ALLELE_DEPTH
STRING
Unfiltered count of reads that support a given allele for an individual sample
FILTERS
STRING
Filter field from VCF. If all filters pass, field is PASS
ZYGOSITY
NUMERIC
0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
GID
NUMERIC
NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
STRING
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSC
STRING
The HGVS coding sequence name
HGVSP
STRING
The HGVS protein sequence name

Annotated Somatic Mutations

This table will only be available for data sets with ingested Somatic molecular data.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode, used in VCF column
SUBJECTID
STRING
Identifier for Subject entity
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
DBSNP
NUMERIC
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
MUTATION_TYPE
NUMERIC
Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
GENOTYPE
STRING
"G|T"
REF_ALLELE
STRING
Reference allele
ALLELE1
STRING
First allele call in the tumor sample
ALLELE2
STRING
Second allele call in the tumor sample
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
BOOLEAN
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSP
STRING
HGVS nomenclature for AA change: p.Pro72Ala