Search…
Base Basics
Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides information management and knowledge mining. Refer to the Base documentation for more details.
This tutorial provides an exmple for exercising the basic operations used with the Base, including how to create a table, load the table with data, and query the table.

Prerequisites

  • An ICA project with access to Base
    • If you don't already have a project, please follow the instructions in the Project documentation to create a project.
  • File to import
    • A tab delimited gene expression file (sampleX.final.count.tsv). Example format:
      1
      HES4-NM_021170-T00001 1392
      2
      ISG15-NM_005101-T00002 46
      3
      SLC2A5-NM_003039-T00003 14
      4
      H6PD-NM_004285-T00004 30
      5
      PIK3CD-NM_005026-T00005 200
      6
      MTOR-NM_004958-T00006 156
      7
      FBXO6-NM_018438-T00007 10
      8
      MTHFR-NM_005957-T00008 154
      9
      FHAD1-NM_052929-T00009 10
      10
      PADI2-NM_007365-T00010 12
      Copied!

Create table

Tables are components of databases that store data in a 2-dimensional format of columns and rows. Each row represents a new data record in the table; each column represents a field in the record. On ICA, you can use Base to create custom tables to fit your data. A schema definition defines the fields in a table. On ICA you can create a schema definition from scratch, or from a template. In this activity, you will create a table for RNAseq count data, by creating a schema definition from scratch.
  1. 1.
    Go to the Tables page under Base in your project and enable Base by clicking on the Enable button.
    Click the New Table button
  2. 2.
    Create your table
    1. 1.
      To create your table from scratch, select Empty Table from the Create table from dropdown.
    2. 2.
      Name your table FeatureCounts
    3. 3.
      Uncheck the box next to Include reference, to exclude reference data from your table.
    4. 4.
      Check the box next to Edit as text. This will reveal a text box that can be used to create your schema.
    5. 5.
      Copy the schema text below and paste it in into the text box, to create your schema.
    1
    {
    2
    "Fields": [
    3
    {
    4
    "NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
    5
    "Name": "TranscriptID",
    6
    "Type": "STRING",
    7
    "Mode": "REQUIRED",
    8
    "Description": null,
    9
    "DataResolver": null,
    10
    "SubBluebaseFields": []
    11
    },
    12
    {
    13
    "NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
    14
    "Name": "ExpressionCount",
    15
    "Type": "INTEGER",
    16
    "Mode": "REQUIRED",
    17
    "Description": null,
    18
    "DataResolver": null,
    19
    "SubBluebaseFields": []
    20
    }
    21
    ]
    22
    }
    Copied!
    Click the Save button
  3. 3.

Upload data to load into your table

  1. 1.
    Upload sampleX.final.count.tsv file with the final count.
    1. 1.
      Select Data tab from the left menu.
    2. 2.
      Click on the grey box to choose the file to upload or drage and drop the sampleX.final.count.tsv into the grey box
      Your schedule is active and will check for files to load data from every 24 hours. You can also run you schedule anytime you want. Run your schedule now to load data into your table now.
    3. 3.
      The uploaded file will appear on the data page after successfull upload.

Create a schedule to load data into your table

Data can be loaded into tables manually or automatically. To load data automatically, you can set up a schedule. The schedule specifies which files’ data should be automatically loaded into a table, when those files are uploaded to ICA or created by an analyses on ICA. Active schedules will check for new files to every 24 hours.
In this exercise, you will create a schedule to automatically load RNA transcript counts from *.final.count.tsv files into the table you created above.
  1. 1.
    Go to the your project’s Schedule page and click the + Add New button.
  2. 2.
    Select the option to load the contents from files into a table.
  3. 3.
  4. 4.
    Create your schedule.
    1. 1.
      Name your schedule LoadFeatureCounts
    2. 2.
      Choose Project as the source of data for your table.
    3. 3.
      To specify that data from *.final.count.tsv files should be loaded into your table, enter .final.count.tsv in the Search for a part of a specific ‘Orignal Name’ or Tag text box.
    4. 4.
      Specify your table as the one to load data into, by selecting your table (FeatureCounts) from the dropdown under Target Base Table.
    5. 5.
      Under Write preference, select Append to table. New data will be appended to your table, rather than overwriting existing data in your table.
    6. 6.
      The format of the *.final.count.tsv files that will be loaded into your table are TSV/tab-delimited, and do not contain a header a row. For the Data format, Delimiter, and Header rows to skip fields, use these values:
      • Data format: TSV
      • Delimiter: Tab
      • Header rows to skip: 0
    7. 7.
      Click the Save button
  5. 5.
    Highlight your schedule. Click the Run button to run your schedule now.
    • It will take a short time to prepare and load data into your table.
      1. 1.
        Check the status of your job on your project’s Activity page.
      2. 2.
        Click the BASE JOBS tab to view the status of scheduled Base jobs.
      3. 3.
        Click BASE ACTIVITY to view Base activity.
  6. 6.
    Check the data in the table.
    1. 1.
      Go back to your project’s Tables page.
    2. 2.
      Double-click your table to view its details.
    3. 3.
      Explore different tabs.
      1. 1.
        You will land on the SCHEMA DEFINITION page.
      2. 2.
        Click the PREVIEW tab to view the records that were loaded into your table.
      3. 3.
        Click the DATA tab, to view a list of the files whose data has been loaded into your table.

Query a table

To request data or information from a Base table, you can run a query using the query language SQL. You can create and run new queries or run saved queries.
In this activity, you will create and run a new SQL query, to find out how many records (RNA transcripts) in your table have counts greater than 1000.
  1. 1.
    Go to your project’s Query page.
  2. 2.
    Run a test query below
  3. 3.
    1
    SELECT TranscriptID,ExpressionCount FROM FeatureCounts WHERE ExpressionCount > 1000;
    Copied!
  4. 4.
    1. 1.
      Paste the above query into the NEW QUERY text box
    2. 2.
      Click the Run Query button to run your query
    3. 3.
      View your query results.
    4. 4.
      Save your query for future use by clicking the Save Query button. You will be asked to Name the query before clicking on the Create button.