Schedule

On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.

When creating or editing a schedule, Automatic import is performed when the Active box is checked. The job will run at 10 minute intervals. In addition, for both active and inactive schedules, a manual import is performed when selecting the schedule and clicking the »run button.

Configure a schedule

There are different types of schedules that can be set up:

Files
Metadata
Administrative data.

Files

This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:

Name (required): The name of the scheduled job
Description: Extra information about the schedule
File name pattern (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.
Generated by Pipelines: Only files generated by these selected pipelines are taken into account. When left clear, files from all pipelines are used.
Target Base Table (required): The table to which the information needs to be added. A drop-down list with all created tables is shown. This means the table needs to be created before the schedule can be created.
Write preference (required): Define data handling; whether it can overwrite the data
Data format (required): Select the data format of the files (CSV, TSV, JSON)
Delimiter (required): to indicate which delimiter is used in the delimiter separated file. If the delimiter is not present in list, it can be indicated as custom.
Active: The job will run automatically if checked
Custom delimiter: the custom delimiter that is used in the file. You can only enter a delimiter here if custom delimiter is selected.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table
Advanced Options
- Encoding (required): Select the encoding of the file.
- Null Marker: Specifies a string that represents a null value in a CSV/TSV file.
- Quote: The value (single character) that is used to quote data sections in a CSV/TSV file. When this character is encountered at the beginning and end of a field, it will be removed. For example, entering " as quote will remove the quotes from "bunny" and only store the word bunny itself.
- Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
  - If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
  - If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.

Metadata

This type will create two new tables: BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL and ICA_PROJECT_SAMPLE_META_DATA. The job will load metadata (added to the samples) into ICA_PROJECT_SAMPLE_META_DATA. The process gathers the metadata from the samples via the data linked to the project and the metadata from the analyses in this project. Furthermore, the schedular will add provenance data to BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. This process gathers the execution details of all the analyses in the project: the pipeline name and status, the user reference, the input files (with identifiers), and the settings selected at runtime. This enables you to track the lineage of your data and to identify any potential sources of errors or biases. So, for example, the following query will count how many times each of the pipelines was executed and sort it accordingly:

SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;

To obtained the similar table for the failed runs, you can execute the following SQL query:

SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
WHERE PIPELINE_STATUS = 'Failed'
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;

When adding or editing this schedule you can define the following parameters:

Name (required): the name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Active: the job will run automatically if ticked.
Source (Tenant Administrators Only):
- Project (default): All administrative data from this project will be added.
- Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.

Administrative data

This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.

When adding or editing this schedule the following parameters can be defined:

Name (required): The name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Active: The job will run automatically if checked.
Source (Tenant Administrators Only):
- Project (default): All administrative data from this project will be added.
- Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.

Delete schedule

Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.

Run schedule

When clicking the Run button, or Save & Run when editing, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables.

PreviousQuery NextSnowflake

Last updated 14 days ago

Was this helpful?