vardb.queries package

The queries package contains standalone command-line routines for making production queries to the database. So far, these include:

  • Germline CNV query: identifies CNVs that overlap with specified genes
  • Experimental records: Queries the database to obtain the effects and annotations for each variant in a test library, as well as all other libraries/patients in vcall matching the variants in the test library have been seen

Submodules

vardb.queries.CNV_overlap module

Performs a query on the controlfreec table which contains germline cnvs. The query identifies CNVs that overlap with genes specified in the GENE_FILE.

INPUTS

usage: python CNV_overlap.pyc [-h] -g GENE_FILE -p PROJECTS_FILE -ov
                              VARIANTS_OUTPUT_FILE -ol LIBRARIES_OUTPUT_FILE
                              [-db DATABASE] [-th THRESHOLD]

optional arguments:
  -h, --help            show this help message and exit
  -db DATABASE, --database DATABASE
                        the database to query
  -th THRESHOLD, --thresh THRESHOLD
                        threshold: only variants with % occurrence in database
                        < threshold will be reported; if option is not
                        selected, all variants are reported (i.e. thresh = 0)

required named arguments:
  -g GENE_FILE, --genes GENE_FILE
                        file containing the gene coordinates
  -p PROJECTS_FILE, --projects_file PROJECTS_FILE
                        file containing the projects to query
  -ov VARIANTS_OUTPUT_FILE, --variants_output VARIANTS_OUTPUT_FILE
                        file name to store variants returned in the query
  -ol LIBRARIES_OUTPUT_FILE, --libraries_output LIBRARIES_OUTPUT_FILE
                        file containing the libraries detected in the query

OUTPUTS

VARIANTS_OUTPUT_FILE
  • tsv file with the following columns:
    gene name, chromosome, cnv start position, cnv end position, copy number, fraction of libraries where the exact cnv was found, library name, path to data, project, overlap type [partial_overlap|full_overlap], analysis date
  • variants are in descending order of analysis date (most recent on top)
LIBRARIES_OUTPUT_FILE
  • contains a list of all of the libraries that were queried
  • one library per line, in alphabetical/numerical order
vardb.queries.CNV_overlap.main()

Gets command line arguments and runs query.

vardb.queries.CNV_overlap.query(*args, **kwargs)

Performs a query on the controlfreec table which contains germline cnvs. The query identifies CNVs that overlap with genes specified in the gene_file.

Parameters:
  • gene_file – list of genes of interest
  • projects_list – list of projects to include
  • variants_output_path – output path of overlapping variants
  • libraries_output_path – output path of all libraries with an overlap
  • database – database to query
  • threshold – a threshold on the MAXIMUM number of variants found
Modifies:

creates/overwrites output files

vardb.queries.experimental_records module

Queries the database to obtain

  1. various effects and annotations for each variant in a test library
  2. all libraries/patients in vcall where the the above variants have been seen

INPUTS

usage: python experimental_records.py [-h] -l LIBRARY
                                      (-p PROJECTS [PROJECTS ...] | -pf PROJECTS_FILE | -a)
                                      [-db DATABASE] [-e] -d OUTPUT_DIRECTORY
                                      [--log_level {debug,info,warning,error}]

optional arguments:
  -h, --help            show this help message and exit
  -l LIBRARY, --library LIBRARY
                        library name
  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
                        projects to include in the search
  -pf PROJECTS_FILE, --projects_file PROJECTS_FILE
                        file containing the projects to query
  -a, --all             search all projects
  -db DATABASE, --database DATABASE
                        the database in which to load
  -e, --exclude         exclude, rather than include projects
  -d OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
                        file name for storing the output
  --log_level {debug,info,warning,error}, -log {debug,info,warning,error}
                        the level of logging

OUTPUTS

Two tsv files containing the variant information for synonymous and non-synonymous variants:

  • OUTPUT_DIRECTORY/output.[LIBRARY].coding.exprecords.tsv
  • OUTPUT_DIRECTORY/output.[LIBRARY].non_coding.exprecords.tsv
Both files have one row per variant in the test library, and the following columns:
  • variant_id
  • chromosome
  • position
  • ref
  • alt
  • var_obs (the number of reads for the variant)
  • total_obs (total number of reads at this position)
  • heterozygosity (currently ‘Unknown’)
  • aligner
  • variant_caller
  • consequence_type (from SnpEff)
  • impact (from SnpEff)
  • variation_name (dbSNP and/or COSMIC)
  • darned_annotation (ch:pos:strand:ref>alt)
  • diseased_patient_count (number of unique diseased patients with the variant)
  • normal_patient_count (number of unique normal patients with the variant)
  • aa_change
  • dna_change
  • other_snpeff (other effects)
  • gene_id
  • ensembl_id (transcript_id)
  • diseases (a list of the pathology alias for matching diseased libraries)
  • variant_type (SNV, INS, DEL, INDEL)

The null string is ‘.’

vardb.queries.experimental_records.main()

Gets command line arguments and runs query.

vardb.queries.experimental_records.query(library, output_directory, database, projects=None, projects_file=None, all=True, exclude=False)

Queries for unpaired (vcall) libraries which have variants that match those in a comparison library.

Parameters:
  • library – library name of comparison library
  • output_directory – location where output files will be created
  • database – database to query
  • [True]] ([projects|projects_file|all) – a list of projects to either include (default) or exclude (set exclude flag to true) | a file with a list of projects as described above | include all projects in database
  • exclude – [False] exclude listed projects in query, instead of including them
Returns:

0 if execution is successful, -1 otherwise

Writes:

two files - [library].coding.exprecords.tsv for non-synonymous coding variants, [library].non_coding.exprecords.tsv

vardb.queries.experimental_records_external module

Queries the database to obtain

  1. Various effects and annotations for each variant in a test library found in PROJECTS_FILE. The test library is loaded to a temporary table just for this query and leaves no permanent record on the database.
  2. All libraries/patients in vcall where the the above variants have been seen

INPUTS

usage: python experimental_records_external.py [-h] [-l LIBRARY_FILE]
                                               (-p PROJECTS [PROJECTS ...] | -pf PROJECTS_FILE | -a)
                                               [-db DATABASE] [-e] -d
                                               OUTPUT_DIRECTORY
                                               [--log_level {debug,info,warning,error}]

optional arguments:
  -h, --help            show this help message and exit
  -l LIBRARY_FILE, --library_file LIBRARY_FILE
                        file name of library
  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
                        projects to include in the search
  -pf PROJECTS_FILE, --projects_file PROJECTS_FILE
                        file containing the projects to query
  -a, --all             search all projects
  -db DATABASE, --database DATABASE
                        the database in which to load
  -e, --exclude         exclude, rather than include projects
  -d OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
                        file name for storing the output
  --log_level {debug,info,warning,error}, -log {debug,info,warning,error}
                        the level of logging

OUTPUTS

Two tsv files containing the variant information for synonymous and non-synonymous variants:

  • OUTPUT_DIRECTORY/output.[library_name].synonymous.exprecords.tsv
  • OUTPUT_DIRECTORY/output.[library_name].non_synonymous.exprecords.tsv
Both files have one row per variant in the test library, and the following columns:
  • variant_id
  • chromosome
  • position
  • ref
  • alt
  • var_obs (the number of reads for the variant)
  • total_obs (total number of reads at this position)
  • heterozygosity (currently ‘Unknown’)
  • aligner
  • variant_caller
  • consequence_type (from SnpEff)
  • variation_name (dbSNP and/or COSMIC)
  • darned_annotation (ch:pos:strand:ref>alt)
  • diseased_library_count (number of unique diseased libraries with the variant)
  • normal_library_count (number of unique normal libraries with the variant)
  • aa_change
  • other_snpeff (other effects)
  • flanking (flanking amino acids - currently ‘Unknown’)
  • gene_id
  • ensembl_id (gene_id)
  • diseased_patient_count (number of diseased patients with the variant)
  • normal_patient_count (number of normal patients with the variant)
  • diseased_libraries (a list of all the diseased libraries where the variants were found)
  • normal_libraries (a list of all the normal libraries where the variants were found)

The null string is ‘.’

vardb.queries.experimental_records_external.main()
vardb.queries.experimental_records_external.query(library_file, output_directory, database, projects=None, projects_file=None, all=True, exclude=False)

Queries for unpaired (vcall) libraries which have variants that match those in a comparison library.

Parameters:
  • library_file – full path to comparison library vcf file
  • output_directory – location where output files will be created
  • database – database to query
  • [True]] ([projects|projects_file|all) – a list of projects to either include (default) or exclude (set exclude flag to true) | a file with a list of projects as described above | include all projects in database
  • exclude – [False] exclude listed projects in query, instead of including them
Returns:

0 if execution is successful, -1 otherwise

Writes:

two files - [library].coding.exprecords.tsv for non-synonymous coding variants, [library].non_coding.exprecords.tsv