Querying the Database Using the SDK

You can use the Gen3 SDK to query BRAINCommons™ from within the secure

BC-workspace environment. Full documentation of the Gen3 SDK can be found here.

To help you get started, a list of the basic operations is included below.

Initialize variables
Get List of accessible programs
Get List of accessible Project Ids
Get all nodes that have at least 1 record for a project
Download data for a node
Download data for all nodes for a project
List all data files for a project
Download data files

1. Initialize variables

Important!

Remember to copy your credentials file into your BC-workspace from the UI and make sure that you use the correct path to your file.

from expansion.expansion import Gen3Expansion 
from gen3 import scripts 
from gen3.auth import Gen3Auth 
from gen3.submission import Gen3Submission  

endpoint = "https://internal.api.braincommons.org" 
path_to_credentials_file = "credentials.json" 
auth = Gen3Auth(endpoint, refresh_file=path_to_credentials_file) 
exp = Gen3Expansion(endpoint, auth) 
Gen3Submission=Gen3Submission(endpoint,auth)

2. Get List of accessible programs

programs_list = scripts.get_programs(endpoint, path_to_credentials_file)
print(programs_list)

3. Get List of accessible Project Ids

project_ids = exp.get_project_ids()
print(project_ids)

4. Get all nodes that have at least 1 record for a project

project_id = "CVB1-simulate1" # Replace with project_id for the project you want to use

node_list = scripts.get_nodes_with_minimum_one_record_in_project(endpoint, path_to_credentials_file, project_id)
print(node_list)

5. Download data for a node

Note: See the discussion of retrieving structured data files at the end of this article.

output_file_format = 'tsv' # tsv, csv, json 
program_name = '' # fill-in program_name that you get from (2) above
project_id = '' # fill-in the project_id that you want to use from (3) above
node_type = '' # fill-in node_type from (4) above
output_filename = '' # fill-in the filename of the output file
scripts.export_node(endpoint, path_to_credentials_file, program_name, project_id, node_type, output_file_format, output_filename)

6. Download data for all nodes for a project.

All files will be downloaded to the folder mentioned as output_dir_path.
Note: See the discussion of retrieving structured data files at the end of this article.

projects_to_get_data_for: list = [''] 
optional_nodes: list = None 
output_dir_path = '' 
exp.get_project_tsvs(projects=projects_to_get_data_for, nodes=optional_nodes, outdir=output_dir_path)

7. List all data files for a project

project_id_for_data_file: str = '' 
object_files_dict: dict = exp.list_project_files(project_id_for_data_file) 
print(object_files_dict)

8. Download data files

guid_of_files_to_download: list = [''] 
exp.download_files_for_guids(guids=guid_of_files_to_download)

9. Execute a GraphQL query

query = "{ project(first:0) { code } }" 
Gen3Submission.query(query)

Retrieving Structured Data Files from the BRAINCommons

Important!

When Downloading (outside of the BC platform) or Importing (into your BC-workspace) BRAINCommons structured data files, each file is represented as a table of data, described in the data model (see the article Exploring the Data Model). Each node represents a single table of data.

The columns in each of these tables represent a property which is a single data element or variable. The rows in each table represents a unique instance of the dataset found in the table. For example, in the “medication” table, each column represents information about a medication (drug name, dose, etc.) and each row represents a single medication for a participant or group of participants. See the example below:

Type	submitter_id	case_id	drug_name	drug_dose
Medication	1234-drug1	1234	Ibuprofen	None
Medication	1234-drug2	1234	Atorvastatin	10 mg
Medication	5678-drug1	5678	Atorvastatin	20 mg

When data are present, the cell where a column and row meet is filled with meaningful data. For example, if a participant has an entry in the medication table, the drug name column that matches the row with their participant id (i.e. submitter_id), would include a drug name like “Ibuprofen”. However, if data are not available for a particular property the cell is not blank. Instead, the text “None” is included. For example, if a row for a patient shows “Ibuprofen” in the drug name column, but the drug dose is unknown, the dose column would show “None”.

While this is the current output format, there are plans for future releases to remove the “None” text in fields without data, hence leaving the field blank. However, in the meantime it’s important that you are aware of this detail when using data within BRAINCommons structured data files.

Querying the Database Using the SDK

Table of Contents

1. Initialize variables

Important!

2. Get List of accessible programs

3. Get List of accessible Project Ids

4. Get all nodes that have at least 1 record for a project

5. Download data for a node

6. Download data for all nodes for a project.

7. List all data files for a project

8. Download data files

9. Execute a GraphQL query

Retrieving Structured Data Files from the BRAINCommons

Important!