Querying the Database Using the SDK
You can use the Gen3 SDK to query BRAINCommons™ from within the secure
BC-workspace environment. Full documentation of the Gen3 SDK can be found here.
To help you get started, a list of the basic operations is included below.
Table of Contents
1. Initialize variables
Important!
Remember to copy your credentials file into your BC-workspace from the UI and make sure that you use the correct path to your file.
from expansion.expansion import Gen3Expansion from gen3 import scripts from gen3.auth import Gen3Auth from gen3.submission import Gen3Submission endpoint = "https://internal.api.braincommons.org" path_to_credentials_file = "credentials.json" auth = Gen3Auth(endpoint, refresh_file=path_to_credentials_file) exp = Gen3Expansion(endpoint, auth) Gen3Submission=Gen3Submission(endpoint,auth)
2. Get List of accessible programs
programs_list = scripts.get_programs(endpoint, path_to_credentials_file) print(programs_list)
3. Get List of accessible Project Ids
project_ids = exp.get_project_ids() print(project_ids)
4. Get all nodes that have at least 1 record for a project
project_id = "CVB1-simulate1" # Replace with project_id for the project you want to use node_list = scripts.get_nodes_with_minimum_one_record_in_project(endpoint, path_to_credentials_file, project_id) print(node_list)
5. Download data for a node
Note: See the discussion of retrieving structured data files at the end of this article.
output_file_format = 'tsv' # tsv, csv, json program_name = '' # fill-in program_name that you get from (2) above project_id = '' # fill-in the project_id that you want to use from (3) above node_type = '' # fill-in node_type from (4) above output_filename = '' # fill-in the filename of the output file scripts.export_node(endpoint, path_to_credentials_file, program_name, project_id, node_type, output_file_format, output_filename)
6. Download data for all nodes for a project.
All files will be downloaded to the folder mentioned as output_dir_path.
Note: See the discussion of retrieving structured data files at the end of this article.
projects_to_get_data_for: list = [''] optional_nodes: list = None output_dir_path = '' exp.get_project_tsvs(projects=projects_to_get_data_for, nodes=optional_nodes, outdir=output_dir_path)
7. List all data files for a project
project_id_for_data_file: str = '' object_files_dict: dict = exp.list_project_files(project_id_for_data_file) print(object_files_dict)
8. Download data files
guid_of_files_to_download: list = [''] exp.download_files_for_guids(guids=guid_of_files_to_download)
9. Execute a GraphQL query
query = "{ project(first:0) { code } }" Gen3Submission.query(query)
Retrieving Structured Data Files from the BRAINCommons
Important!
When Downloading (outside of the BC platform) or Importing (into your BC-workspace) BRAINCommons structured data files, each file is represented as a table of data, described in the data model (see the article Exploring the Data Model). Each node represents a single table of data.
The columns in each of these tables represent a property which is a single data element or variable. The rows in each table represents a unique instance of the dataset found in the table. For example, in the “medication” table, each column represents information about a medication (drug name, dose, etc.) and each row represents a single medication for a participant or group of participants. See the example below:
Type | submitter_id | case_id | drug_name | drug_dose |
Medication | 1234-drug1 | 1234 | Ibuprofen | None |
Medication | 1234-drug2 | 1234 | Atorvastatin | 10 mg |
Medication | 5678-drug1 | 5678 | Atorvastatin | 20 mg |
When data are present, the cell where a column and row meet is filled with meaningful data. For example, if a participant has an entry in the medication table, the drug name column that matches the row with their participant id (i.e. submitter_id), would include a drug name like “Ibuprofen”. However, if data are not available for a particular property the cell is not blank. Instead, the text “None” is included. For example, if a row for a patient shows “Ibuprofen” in the drug name column, but the drug dose is unknown, the dose column would show “None”.
While this is the current output format, there are plans for future releases to remove the “None” text in fields without data, hence leaving the field blank. However, in the meantime it’s important that you are aware of this detail when using data within BRAINCommons structured data files.