Create a Cohort

ICA Cohorts lets you create a research cohort of subjects and associated samples based on the following criteria:

  • Project:

    • Include subjects that are part of any ICA Project that you own or that is shared with you.

    • Sample:

      • Sample type such as FFPE.

      • Tissue type.

      • Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.

  • Subject:

    • Demographics such as age, sex, ancestry.

    • Biometrics such as body height, body mass index.

    • Family and patient medical history.

  • Sample:

    • Sample type such as FFPE.

    • Tissue type.

    • Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.

  • Disease:

    • Phenotypes and diseases from standardized ontologies.

  • Drug:

    • Drugs from standardized ontologies along with specific typing, stop reasons, drug administration routes, and time points.

  • Molecular attributes:

    • Samples with a somatic mutation in one or multiple, specified genes.

    • Samples with a germline variant of a specific type in one or multiple, specified genes.

    • Samples over- or under-expressed in one or multiple, specified genes.

    • Samples with a copy number gain or loss involving one or multiple, specified genes.

ICA Cohorts currently uses six standard medical ontologies to 1) annotate each subject during ingestion and then to 2) search for subjects: HPO for phenotypes, MeSH, SNOMED-CT, ICD9-CM, ICD10-CM, and OMIM for diseases. By default, any 'type-ahead' search will find matches from all six; and you can limit the search to only the one(s) you prefer. When searching for subjects using names or codes from one of these ontologies, ICA Cohorts will automatically match your query against all the other ontologies, therefore returning subjects that have been ingested using a corresponding entry from another ontology.

In the 'Disease' tab, you can search for subjects diagnosed with one or multiple diseases, as well as phenotypes, in two ways:

  • Start typing the English name of a disease/phenotype and pick from the suggested matches. Continue typing if your disease/phenotype of interest is not listed initially.

    • Use the mouse to select the term or navigate to the term in the dropdown using the arrow buttons.

    • If applicable, the concept hierarchy is shown, with ancestors and immediate children visible.

    • For diagnostic hierarchies, concept children count and descendant count for each disease name is displayed.

      • Descendant Count: Displays next to each disease name in the tree hierarchy (e.g., "Disease (10)").

      • Leaf Nodes: No children count shown for leaf nodes.

      • Missing Counts: Children count is hidden if unavailable.

      • Show Term Count: A new checkbox below "Age of Onset" that is always checked. Unchecking it hides the descendant count.

    • Select a checkbox to include the diagnostic term along with all of its children and decedents.

    • Expand the categories and select or deselect specific disease concepts.

  • Paste one or multiple diagnostic codes separated by a pipe (‘|’).

In the 'Drug' tab, you can search for subjects who have a specific medication record:

  • Start typing the concept name for the drug and pick from suggested matches. Continue typing if the drug is not listed initially.

  • Paste one or multiple drug concept codes. ICA Cohorts currently use RXNorm as a standard ontology during ingestion. If multiple concepts are in your instance of ICA Cohorts, they will be listed under 'Concept Ontology.'

  • 'Drug Type' is a static list of qualifiers that denote the specific administration of the drug. For example, where the drug was dispensed.

  • 'Stop Reason' is a static list of attributes describing a reason why a drug was stopped if available in the data ingestion.

  • 'Drug Route' is a static list of attributes that describe the physical route of administration of the drug. For example, Intravenous Route (IV).

In the ‘Measurements’ tab, you can search for vital signs and laboratory test data leveraging LOINC concept codes. ·

  • Start typing the English name of the LOINC term, for example, ‘Body height’. A dropdown will appear with matching terms. Use the mouse or down arrows to select the term.

  • Upon selecting a term, the term will be available for use in a query.

  • Terms can be added to your query criteria.

  • For each term, you can set a value `Greater than or equal`, `Equals`, `Less than or equal`, `In range`, or `Any value`.

  • `Any value` will find any record where there is an entry for the measurement independent of an available value.

  • Click `Apply` to add your criteria to the query.

  • Click `Update Now` to update the running count of the Cohort.Include/Exclude

Include/Exclude

  • As attributes are added to the 'Selected Condition' on the right-navigation panel, you can choose to include or exclude the criteria selected.

    • Select a criterion from 'Subject', 'Disease', and/or 'Molecular' attributes by filling in the appropriate checkbox on the respective attribute selection pages.

    • When selected, the attribute will appear in the right-navigation panel.

    • You can use the 'Include' / 'Exclude' dropdown next to the selected attribute to decide if you want to include or exclude subjects and samples matching the attribute.

    • Note: the semantics of 'Include' work in such a way that a subject needs to match only one or multiple of the 'included' attributes in any given category to be included in the cohort. (Category refers to disease, sex, body height, etc.) For example, if you specify multiple diseases as inclusion criteria, subjects will only need to be diagnosed with one of them. Using 'Exclude', you can exclude any subject who matches one or multiple exclusion criteria; subjects do not have to match all exclusion criteria in the same category to be excluded from the cohort.

    • Note: This feature is not available on the 'Project' level selections as there is no overlap between subjects in datasets.

    • Note: Using exclusion criteria does not account for NULL values. For example, if the Super-population 'Europeans' is excluded, subjects will be in your cohort even if they do not contain this data point.

Once you selected Create Cohort, the above data are organized in tabs such as Project, Subject, Disease, and Molecular. Each tab then contains the aforementioned sections, among others, to help you identify cases and/or controls for further analysis. Navigate through these tabs, or search for an attribute by name to directly jump to that tab and section, and select attributes and values that are relevant to describe your subjects and samples of interest. Assign a new name to the cohort you created, and click Apply to save the cohort.

Duplicate a Cohort Definition

  • After creating a Cohort, select the Duplicate icon.

  • A copy of the Cohort definition will be created and tagged with "_copy".

Delete a Cohort Definition

  • Deleting a Cohort Definition can be accomplished by clicking the Delete Cohort icon.

  • This action cannot be undone.

Sharing a Cohort within an ICA Project

After creating a Cohort, users can set a Cohort bookmark as Shared. By sharing a Cohort, the Cohort will be available to be applied across the project by other users with access to the Project. Cohorts created in a Project are only accessible at scope of the user. Other users in the project cannot see the cohort created unless they use this sharing functionality.

Share Cohort Definition

  • Create a Cohort using the directions above.

  • To make the Cohort available to other users in your Project, click the Share icon.

  • The Share icon will be filled in black and the Shared Status will be turned from Private to Shared.

  • Other users with access to Cohorts in the Project can now apply the Cohort bookmark to their data in the project.

Unshare a Cohort Definition

  • To unshare the Cohort, click the Share icon.

  • The icon will turn from black to white, and other users within the project will no longer have access to this cohort definition.

Archive a Cohort Definition

  • A Shared Cohort can be Archived.

  • Select a Shared Cohort with a black Shared Cohort icon.

  • Click the Archive Cohort icon.

  • You will be asked to confirm this selection.

  • Upon archiving the Cohort definition, the Cohort will no longer be seen by other users in the Project.

  • The archived Cohort definition can be unarchived by clicking the Unarchive Cohort icon.

  • When the Cohort definition is unarchived, it will be visible to all users in the Project.

Sharing a Cohort as Bundle

You can link cohorts data sets to a bundle as follows:

  • Create or edit a bundle at Bundles from the main navigation.

  • Navigate to Bundles > your_bundle > Cohorts > Data Sets.

  • Select Link Data Set to Bundle.

  • Select the data set which you want to link and +Select.

  • After a brief time, the cohorts data set will be linked to your bundle and ICA_BASE_100 will be logged.

If you can not find the cohorts data sets which you want to link, verify if

  • Your data set is part of a project (Projects > your_project > Cohorts > Data Sets)

  • This project is set to Data Sharing (Projects > your_project > Project Settings > Details)

Stop sharing a Cohort as Bundle

You can unlink cohorts data sets from bundles as follows:

  • Edit the desired bundle at Bundles from the main navigation.

  • Navigate to Bundles > your_bundle > Cohorts > Data Sets.

  • Select the cohorts data set which you want to unlink.

  • Select Unlink Data Set from Bundle.

  • After a brief time, the cohorts data set will be unlinked from your bundle and ICA_BASE_101 will be logged.

Last updated