vardb.variant_data_files package¶

There is one variant data file class per file type that is loaded to vardb. They contain all of the information required to load each type of data to the database.

column names and data types for the files
information on the headers
which table the data is loaded to in vardb
PSQL commands for parsing and loading the data to tables on vardb
routines to compute required metadata from the file, such as the creation date and md5sum
the pipelines that the class belongs to

When a new data type is added to vardb, a new variant data file class must be created with this information.

Submodules¶

vardb.variant_data_files.cnv module¶

cnv contains classes for germline (controlfree) and somatic (somatic_cnv) pipelines. The somatic_cnv pipeline actually creates several file types, which are all represented here.

class vardb.variant_data_files.cnv.ControlFreeC(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

ControlFreeC class gets metadata for cnvs produced by the ControlFreeC pipeline

class vardb.variant_data_files.cnv.HomozygousDeletion(**kwargs)¶: Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

class vardb.variant_data_files.cnv.HomozygousDeletion_v1(**kwargs)¶

Bases: vardb.variant_data_files.cnv.HomozygousDeletion

HomozygousDeletion class gets metadata for homozygous deletions that have been selected during review from the somatic cnv pipeline

class vardb.variant_data_files.cnv.HomozygousDeletion_v2(**kwargs)¶

Bases: vardb.variant_data_files.cnv.HomozygousDeletion

HomozygousDeletion class gets metadata for homozygous deletions that have been selected during review from the somatic cnv pipeline

class vardb.variant_data_files.cnv.HomozygousDeletion_v3(**kwargs)¶

Bases: vardb.variant_data_files.cnv.HomozygousDeletion

HomozygousDeletion class gets metadata for homozygous deletions that have been selected during review from the somatic cnv pipeline

class vardb.variant_data_files.cnv.SomaticCna(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

SomaticCna class gets metadata for raw cna data produced by the somatic cnv pipeline.

class vardb.variant_data_files.cnv.SomaticCnv(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

SomaticCna class gets metadata for cnv segment data produced by the somatic cnv pipeline.

class vardb.variant_data_files.cnv.SomaticLOH(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

SomaticLOH class gets metadata for loss of heterozygosity states (LOH) produced by the APOLLOH

Zygosity states are:: DLOH=deletion-LOH (state 1) NLOH=copy-neutral-LOH (states 2,4) ALOH=amplified-LOH (states 5,8,9,13,14,19) HET=heterozygous (states 3,6,7) ASCNA=allele-specific-amplification (states 10,12,15,18) BCNA=balanced-amplification (states 11,16,17)

class vardb.variant_data_files.cnv.SomaticVAF(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

SomaticVAF class gets metadata and allele frequencies from APOLLOH.

Tab-delimited output file for position-level results.

9-columns:

chr (‘X’ and ‘Y’ will be output as 23 and 24)
position
reference count
non-reference count
total depth
allelic ratio
copy number (from input)
APOLLOH genotype state
Zygosity state.

N additional columns:

posterior marginal probabilities (responsibilities) for each APOLLOH genotype state.

Zygosity states are:

DLOH=deletion-LOH (state 1) NLOH=copy-neutral-LOH (states 2,4) ALOH=amplified-LOH (states 5,8,9,13,14,19) HET=heterozygous (states 3,6,7) ASCNA=allele-specific-amplification (states 10,12,15,18) BCNA=balanced-amplification (states 11,16,17)

class vardb.variant_data_files.cnv.TcgaCnv(**kwargs)¶: Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

class vardb.variant_data_files.cnv.TcgaGermlineMaskedCnv(**kwargs)¶: Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

vardb.variant_data_files.data_classes module¶

vardb.variant_data_files.data_classes.DataClass(**kwargs)¶

This is a factory for choosing the correct VariantDataFile subclass based on the pipeline information

Parameters:	kwargs – metadata arguments
Returns:	correct class

vardb.variant_data_files.expression module¶

class vardb.variant_data_files.expression.RSEM(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

RSEM class gets metadata for .rsem files

class vardb.variant_data_files.expression.TranscriptNormalized(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

TranscriptNormalized class gets metadata for transcript.normalized files

vardb.variant_data_files.maf module¶

class vardb.variant_data_files.maf.TCGASimpleSomatic(**kwargs)¶

Bases: vardb.variant_data_files.variant_data_file.VariantDataFile

TranscriptNormalized class gets metadata for transcript.normalized files

vardb.variant_data_files.variant_data_file module¶

class vardb.variant_data_files.variant_data_file.Columns(cols)¶

Bases: object

Immutable object containing a list of tuples with column name and type for each column of the data file

valid_types = ('INT', 'FLOAT', 'DATE', 'TIMESTAMP', 'TEXT', 'BIGINT', 'INTEGER', 'BOOLEAN')¶

class vardb.variant_data_files.variant_data_file.VariantDataFile(**kwargs)¶

Bases: object

close()¶: Closes the file, resets the file pointer to None

get_columns_from_pandas(filename, **kwargs)¶

Reads a file into a pandas dataframe and extracts the column names and column types of the data file. This is useful in cases where the data file has variable numbers of columns.

Parameters:	filename – path to data kwargs – any optional arguments for the pandas read_csv function
Returns:	the Columns object corresponding to the columns in the data file

get_data()¶

Sets the member variables for header and data. The header is a list of strings, and the file data is a pandas dataframe. Returns the data.

Returns:	the data

get_data_ptr()¶

Sets the file pointer to the first line of data. If the _get_header function has been properly defined in the subclasses, this should always work.

Returns:	file pointer at beginning of data

get_header()¶

Returns the file header, and closes the file

Returns:	file header

get_md5sum()¶: Gets md5sum and adds it to the metadata

line_count()¶

Just calculates the line count of a file

Returns:	line count of file with filename

open()¶

Opens vcf file for reading

Raises:	DataFileException if file couldn’t be opened

exception vardb.variant_data_files.variant_data_file.VariantDataFileException¶: Bases: exceptions.Exception

vardb.variant_data_files.vcf module¶

class vardb.variant_data_files.vcf.MutSeq_v1(**kwargs)¶

Bases: vardb.variant_data_files.vcf.VCF, vardb.variant_data_files.variant_data_file.VariantDataFile

class for somatic vcf files created by mutation seq version 1.0.2

class vardb.variant_data_files.vcf.MutSeq_v2(**kwargs)¶

Bases: vardb.variant_data_files.vcf.VCF, vardb.variant_data_files.variant_data_file.VariantDataFile

class for somatic vcf files created by mutation seq version 4.3.5

class vardb.variant_data_files.vcf.StrelkaIndels(**kwargs)¶

Bases: vardb.variant_data_files.vcf.VCF, vardb.variant_data_files.variant_data_file.VariantDataFile

Class for strelka indel files

class vardb.variant_data_files.vcf.StrelkaSnps(**kwargs)¶

Bases: vardb.variant_data_files.vcf.VCF, vardb.variant_data_files.variant_data_file.VariantDataFile

Class for strelka snp files

class vardb.variant_data_files.vcf.VCF¶

Bases: object

VCF class has functionality applicable to all vcf data classes

class vardb.variant_data_files.vcf.VCall(**kwargs)¶

Bases: vardb.variant_data_files.vcf.VCF, vardb.variant_data_files.variant_data_file.VariantDataFile

class for vcf files created by vcall pipeline (mpileup)

get_md5sum()¶: Gets md5sum and adds it to the metadata

normalize_indels()¶: normalizes self.unnormalized file to self.path IF self.path does not exist (the file is not already normalized)

class vardb.variant_data_files.vcf.VcfAnnotations(path)¶

Bases: vardb.variant_data_files.vcf.VCF, vardb.variant_data_files.variant_data_file.VariantDataFile

This class is for annotation VCFs. This VCF does not belong to a library and is only used for importing annotations.

vardb.variant_data_files.vcf_tools module¶

exception vardb.variant_data_files.vcf_tools.VcfToolsException¶: Bases: exceptions.Exception

vardb.variant_data_files.vcf_tools.annotate(log_path, vcf_path)¶

Wrapper function for running bioapps annotator on gphost

Parameters:	log_path – path to place log files vcf_path – path to vcf file to annotate
Returns:

vardb.variant_data_files.vcf_tools.check_vt(normalized_vcf_file, log_file)¶

Checks to make sure that the normalized vcf file was correctly created

Parameters:	normalized_vcf_file – log_file –
Returns:

vardb.variant_data_files.vcf_tools.normalize(unnormalized_file, normalized_file, log_path)¶

Wrapper function for running vt to normalize indels on gphost

Parameters:	unnormalized_file – normalized_file – log_path –
Returns:

vardb.variant_data_files package¶

Submodules¶

vardb.variant_data_files.cnv module¶

vardb.variant_data_files.data_classes module¶

vardb.variant_data_files.expression module¶

vardb.variant_data_files.maf module¶

vardb.variant_data_files.variant_data_file module¶

vardb.variant_data_files.vcf module¶

vardb.variant_data_files.vcf_tools module¶

vardb

Navigation

Related Topics