Cohorts Data in ICA Base
ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.
After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.
- 1.Post ingestion, data will be represented in Base.
- 2.Select
BASE
from the ICA left-navigation and clickQuery
. - 3.Under the New Query window, a list of tables is displayed. Expand the
Shared Database for Project \<your project name\>
. - 4.Cohorts tables will be displayed.
- 5.To preview the table and fields click each view listed.
- 6.Clicking any of these views then selecting
PREVIEW
on the right-hand side will show you a preview of the data in the tables.
Note: If your ingestion includes Somatic variants, there will be two molecular tables: ANNOTATED_SOMATIC_MUTATIONS and ANNOTATED_VARIANTS. All ingestions will include a PHENOTYPE table.
Note: The PHENOTYPE table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the PHENOTYPE table.
Field Name | Type | Description |
SAMPLE_BARCODE | STRING | Sample Identifier |
SUBJECTID | STRING | Identifer for Subject entity |
STUDY | STRING | Study designation |
AGE | NUMERIC | Age in years |
SEX | STRING | Sex field to drive annotation |
POPULATION | STRING | Population Designation for 1000 Genomes Project |
SUPERPOPULATION | STRING | Superpopulation Designation from 1000 Genomes Project |
RACE | STRING | Race according to NIH standard |
CONDITION_ONTOLOGIES | VARIANT | Diagnosis Ontology Source |
CONDITION_IDS | VARIANT | Diagnosis Concept Ids |
CONDITIONS | VARIANT | Diagnosis Names |
HARMONIZED_CONDITIONS | VARIANT | Diagnosis High-level concept to drive UI |
LIBRARYTYPE | STRING | Seqencing technology |
ANALYTE | STRING | Substance sequenced |
TISSUE | STRING | Tissue source |
TUMOR_OR_NORMAL | STRING | Tumor designation for somatic |
GENOMEBUILD | STRING | Genome Build to drive annotations - hg38 only |
SAMPLE_BARCODE_VCF | STRING | Sample ID from VCF |
AFFECTED_STATUS | NUMERIC | Affected, Unaffected, or Unknown for Family Based Analysis |
FAMILY_RELATIONSHIP | STRING | Relationship designation for Family Based Analysis |
This table will be available for all projects with ingested molecular data
Field Name | Type | Description |
SAMPLE_BARCODE | STRING | Original sample barcode used in VCF column |
STUDY | STRING | Study designation |
GENOMEBUILD | STRING | Only hg38 is supported |
CHROMOSOME | STRING | Chromosome without 'chr' prefix |
CHROMOSOMEID | NUMERIC | Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt |
DBSNP | STRING | dbSNP Identifiers |
VARIANT_KEY | STRING | Variant ID in the form "1:12345678:12345678:C" |
NIRVANA_VID | STRING | Broad Institute VID: "1-12345678-A-C" |
VARIANT_TYPE | STRING | Description of Variant Type (e.g. SNV, Deletion, Insertion) |
VARIANT_CALL | NUMERIC | 1=germline, 2=somatic |
DENOVO | BOOLEAN | true / false |
GENOTYPE | STRING | "G|T" |
READ_DEPTH | NUMERIC | Sequencing read depth |
ALLELE_COUNT | NUMERIC | Counts of each alternate allele for each site across all samples |
ALLELE_DEPTH | STRING | Unfiltered count of reads that support a given allele for an individual sample |
FILTERS | STRING | Filter field from VCF. If all filters pass, field is PASS |
ZYGOSITY | NUMERIC | 0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt |
GENEMODEL | NUMERIC | 1=Ensembl, 2=RefSeq |
GENE_HGNC | STRING | HUGO/HGNC gene symbol |
GENE_ID | STRING | Ensembl gene ID ("ENSG00001234") |
GID | NUMERIC | NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID |
TRANSCRIPT_ID | STRING | Ensembl ENST or RefSeq NM_ |
CANONICAL | STRING | Transcript designated 'canonical' by source |
CONSEQUENCE | STRING | missense, stop gained, intronic, etc. |
HGVSC | STRING | The HGVS coding sequence name |
HGVSP | STRING | The HGVS protein sequence name |
This table will only be available for data sets with ingested Somatic molecular data.
Field Name | Type | Description |
SAMPLE_BARCODE | STRING | Original sample barcode, used in VCF column |
SUBJECTID | STRING | Identifier for Subject entity |
STUDY | STRING | Study designation |
GENOMEBUILD | STRING | Only hg38 is supported |
CHROMOSOME | STRING | Chromosome without 'chr' prefix |
DBSNP | NUMERIC | dbSNP Identifiers |
VARIANT_KEY | STRING | Variant ID in the form "1:12345678:12345678:C" |
MUTATION_TYPE | NUMERIC | Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant |
VARIANT_CALL | NUMERIC | 1=germline, 2=somatic |
GENOTYPE | STRING | "G|T" |
REF_ALLELE | STRING | Reference allele |
ALLELE1 | STRING | First allele call in the tumor sample |
ALLELE2 | STRING | Second allele call in the tumor sample |
GENEMODEL | NUMERIC | 1=Ensembl, 2=RefSeq |
GENE_HGNC | STRING | HUGO/HGNC gene symbol |
GENE_ID | STRING | Ensembl gene ID ("ENSG00001234") |
TRANSCRIPT_ID | STRING | Ensembl ENST or RefSeq NM_ |
CANONICAL | BOOLEAN | Transcript designated 'canonical' by source |
CONSEQUENCE | STRING | missense, stop gained, intronic, etc. |
HGVSP | STRING | HGVS nomenclature for AA change: p.Pro72Ala |
Last modified 1mo ago