Gen3 Submission Class

class gen3.submission.Gen3Submission(endpoint=None, auth_provider=None)[source]

Bases: object

Submit/Export/Query data from a Gen3 Submission system.

A class for interacting with the Gen3 submission services. Supports submitting and exporting from Sheepdog. Supports GraphQL queries through Peregrine.

Parameters:

auth_provider (Gen3Auth) – A Gen3Auth class instance.

Examples

This generates the Gen3Submission class pointed at the sandbox commons while using the credentials.json downloaded from the commons profile page.

>>> auth = Gen3Auth(refresh_file="credentials.json")
... sub = Gen3Submission(auth)
create_program(json)[source]

Create a program. :param json: The json of the program to create :type json: object

Examples

This creates a program in the sandbox commons.

>>> Gen3Submission.create_program(json)
create_project(program, json)[source]

Create a project. :param program: The program to create a project on :type program: str :param json: The json of the project to create :type json: object

Examples

This creates a project on the DCF program in the sandbox commons.

>>> Gen3Submission.create_project("DCF", json)
delete_node(program, project, node_name, batch_size=100, verbose=True)[source]

Delete all records for a node from a project.

Parameters:
  • program (str) – The program to delete from.

  • project (str) – The project to delete from.

  • node_name (str) – Name of the node to delete

  • (int (batch_size) – 100): how many records to query and delete at a time

  • optional – 100): how many records to query and delete at a time

  • default – 100): how many records to query and delete at a time

  • (bool (verbose) – True): whether to print progress logs

  • optional – True): whether to print progress logs

  • default – True): whether to print progress logs

Examples

This deletes a node from the CCLE project in the sandbox commons.

>>> Gen3Submission.delete_node("DCF", "CCLE", "demographic")
delete_nodes(program, project, ordered_node_list, batch_size=100, verbose=True)[source]

Delete all records for a list of nodes from a project.

Parameters:
  • program (str) – The program to delete from.

  • project (str) – The project to delete from.

  • ordered_node_list (list) – The list of nodes to delete, in reverse graph submission order

  • (int (batch_size) – 100): how many records to query and delete at a time

  • optional – 100): how many records to query and delete at a time

  • default – 100): how many records to query and delete at a time

  • (bool (verbose) – True): whether to print progress logs

  • optional – True): whether to print progress logs

  • default – True): whether to print progress logs

Examples

This deletes a list of nodes from the CCLE project in the sandbox commons.

>>> Gen3Submission.delete_nodes("DCF", "CCLE", ["demographic", "subject", "experiment"])
delete_program(program)[source]

Delete a program.

This deletes an empty program from the commons.

Parameters:

program (str) – The program to delete.

Examples

This deletes the “DCF” program.

>>> Gen3Submission.delete_program("DCF")
delete_project(program, project)[source]

Delete a project.

This deletes an empty project from the commons.

Parameters:
  • program (str) – The program containing the project to delete.

  • project (str) – The project to delete.

Examples

This deletes the “CCLE” project from the “DCF” program.

>>> Gen3Submission.delete_project("DCF", "CCLE")
delete_record(program, project, uuid)[source]

Delete a record from a project.

Parameters:
  • program (str) – The program to delete from.

  • project (str) – The project to delete from.

  • uuid (str) – The uuid of the record to delete

Examples

This deletes a record from the CCLE project in the sandbox commons.

>>> Gen3Submission.delete_record("DCF", "CCLE", uuid)
delete_records(program, project, uuids, batch_size=100)[source]

Delete a list of records from a project.

Parameters:
  • program (str) – The program to delete from.

  • project (str) – The project to delete from.

  • uuids (list) – The list of uuids of the records to delete

  • (int (batch_size) – 100): how many records to delete at a time

  • optional – 100): how many records to delete at a time

  • default – 100): how many records to delete at a time

Examples

This deletes a list of records from the CCLE project in the sandbox commons.

>>> Gen3Submission.delete_records("DCF", "CCLE", ["uuid1", "uuid2"])
export_node(program, project, node_type, fileformat, filename=None)[source]

Export all records in a single node type of a project.

Parameters:
  • program (str) – The program to which records belong.

  • project (str) – The project to which records belong.

  • node_type (str) – The name of the node to export.

  • fileformat (str) – Export data as either ‘json’ or ‘tsv’

  • filename (str) – Name of the file to export to; if no filename is provided, prints data to screen

Examples

This exports all records in the “sample” node from the CCLE project in the sandbox commons.

>>> Gen3Submission.export_node("DCF", "CCLE", "sample", "tsv", filename="DCF-CCLE_sample_node.tsv")
export_record(program, project, uuid, fileformat, filename=None)[source]

Export a single record into json.

Parameters:
  • program (str) – The program the record is under.

  • project (str) – The project the record is under.

  • uuid (str) – The UUID of the record to export.

  • fileformat (str) – Export data as either ‘json’ or ‘tsv’

  • filename (str) – Name of the file to export to; if no filename is provided, prints data to screen

Examples

This exports a single record from the sandbox commons.

>>> Gen3Submission.export_record("DCF", "CCLE", "d70b41b9-6f90-4714-8420-e043ab8b77b9", "json", filename="DCF-CCLE_one_record.json")
get_dictionary_all()[source]

Returns the entire dictionary object for a commons.

This gets a json of the current dictionary schema for a commons.

Examples

This returns the dictionary schema for a commons.

>>> Gen3Submission.get_dictionary_all()
get_dictionary_node(node_type)[source]

Returns the dictionary schema for a specific node.

This gets the current json dictionary schema for a specific node type in a commons.

Parameters:

node_type (str) – The node_type (or name of the node) to retrieve.

Examples

This returns the dictionary schema the “subject” node.

>>> Gen3Submission.get_dictionary_node("subject")
get_graphql_schema()[source]

Returns the GraphQL schema for a commons.

This runs the GraphQL introspection query against a commons and returns the results.

Examples

This returns the GraphQL schema.

>>> Gen3Submission.get_graphql_schema()
get_programs()[source]

List registered programs

get_project_dictionary(program, project)[source]

Get dictionary schema for a given project

Parameters:
  • program – the name of the program the project is from

  • project – the name of the project you want the dictionary schema from

Example

>>> Gen3Submission.get_project_dictionary("DCF", "CCLE")
get_project_manifest(program, project)[source]

Get a projects file manifest

Parameters:
  • program – the name of the program the project is from

  • project – the name of the project you want the manifest from

Example

>>> Gen3Submission.get_project_manifest("DCF", "CCLE")
get_projects(program)[source]

List registered projects for a given program

Parameters:

program – the name of the program you want the projects from

Example

This lists all the projects under the DCF program

>>> Gen3Submission.get_projects("DCF")
open_project(program, project)[source]

Mark a project open. Opening a project means uploads, deletions, etc. are allowed.

Parameters:
  • program – the name of the program the project is from

  • project – the name of the project you want to ‘open’

Example

>>> Gen3Submission.get_project_manifest("DCF", "CCLE")
query(query_txt, variables=None, max_tries=1)[source]

Execute a GraphQL query against a Data Commons.

Parameters:
  • query_txt (str) – Query text.

  • variables (object, optional) – Dictionary of variables to pass with the query.

  • max_tries (int, optional) – Number of times to retry if the request fails.

Examples

This executes a query to get the list of all the project codes for all the projects in the Data Commons.

>>> query = "{ project(first:0) { code } }"
... Gen3Submission.query(query)
submit_file(project_id, filename, chunk_size=30, row_offset=0)[source]

Submit data in a spreadsheet file containing multiple records in rows to a Gen3 Data Commons.

Parameters:
  • project_id (str) – The project_id to submit to.

  • filename (str) – The file containing data to submit. The format can be TSV, CSV or XLSX (first worksheet only for now).

  • chunk_size (integer) – The number of rows of data to submit for each request to the API.

  • row_offset (integer) – The number of rows of data to skip; ‘0’ starts submission from the first row and submits all data.

Examples

This submits a spreadsheet file containing multiple records in rows to the CCLE project in the sandbox commons.

>>> Gen3Submission.submit_file("DCF-CCLE","data_spreadsheet.tsv")
submit_record(program, project, json)[source]

Submit record(s) to a project as json.

Parameters:
  • program (str) – The program to submit to.

  • project (str) – The project to submit to.

  • json (object) – The json defining the record(s) to submit. For multiple records, the json should be an array of records.

Examples

This submits records to the CCLE project in the sandbox commons.

>>> Gen3Submission.submit_record("DCF", "CCLE", json)