biofx.variants package¶
Submodules¶
biofx.variants.Genes module¶
-
class
biofx.variants.Genes.
Inheritance
[source]¶ Bases:
enum.Enum
An enumeration.
-
COMPLEX
= 'COMPLEX'¶
-
DOMINANT
= 'DOMINANT'¶
-
RECESSIVE
= 'RECESSIVE'¶
-
UNKNOWN
= 'UNKNOWN'¶
-
biofx.variants.attributes module¶
biofx.variants.features module¶
Created: July, 2014
@author: Carolyn Ch’ng
Todo
think about whether logging should be part of module.
- Variant SNV, Indel, CNV inherits Variant
-
class
biofx.variants.features.
CNV
(chromosome, start, end)[source]¶ Bases:
biofx.variants.features.Variant
Generic CNV class with basic chromosome, start, end attributes, as well as annotation attributes. Inherits from Variant
-
GAIN
= 'copy gain'¶
-
LOSS
= 'copy loss'¶
-
NEUTRAL
= 'copy neutral'¶
-
-
class
biofx.variants.features.
Indel
(chromosome, start, end=None)[source]¶ Bases:
biofx.variants.features.Variant
Generic Indel class with basic chromosome, start, end attributes, as well as optional ref/alt alleles, annotation attributes. Inherits methods from Variant.
-
get_allele_frequencies
(mpileup_record)[source]¶ Parameters: mpileup_record (string) – output from
Variants.get_variant_mpileup()
or single mpileup recordReturns: - dictionary containing:
ref_count (int): reference allele count. 0 if empty record alt_count (int): alternate allele count. 0 if empty record
Return type: Raises: ValueError
– more than one record providedValueError
– if reference and/or alternate allele not set
Note:
Reference counts for indels are currently obtained by subtracting alternative allele counts from total coverage. Usage of total coverage (self.coverage) is preferred over reference counts.
-
get_length
()[source]¶ Get indel length.
Todo
think about definitions of start and end
Raises: ValueError
– if reference and/or alternate allele not set
-
set_alt
(alt)[source]¶ Set alternative allele. White spaces are stripped off.
Parameters: alt (string) – alt string
-
set_state
(state)[source]¶ Parameters: state (string) – indel type, usually from genome validator Raises: ValueError
– unrecognized state
-
vt_normalize
()[source]¶ Implementation of algorithm in http://genome.sph.umich.edu/wiki/Variant_Normalization.
Modified for samtools/bcftools output - left aligned but not parsimonous.
Raises: AssertionError
– normalization cannot be done if ref alt bases are not setAssertionError
– algorithm failed. shouldn’t get here. never observed.
Reference: http://genome.sph.umich.edu/wiki/Variant_Normalization
See also: http://bioinformatics.oxfordjournals.org/content/31/13/2202.full.
-
-
class
biofx.variants.features.
SNV
(chromosome, start, end=None)[source]¶ Bases:
biofx.variants.features.Variant
Generic SNV class with basic chromosome, start, end attributes, as well as optional ref/alt alleles, annotation attributes. Inherits from Variant
-
get_allele_frequencies
(mpileup_record)[source]¶ Get allele frequencies for each base at a given position of this SNV. Currently supports samtools 0.1.18
Parameters: mpileup_record (string) – output from Variants.get_variant_mpileup()
or single mpileup recordReturns: frequency of each base Return type: base_frequencies (dict)
-
get_alt_count
(**kwargs)[source]¶ Parameters: **kwargs – used kwargs: mpileup_params Returns: alt base count Return type: int
-
get_ref_count
(**kwargs)[source]¶ Get reference base count.
Parameters: **kwargs – used kwargs: mpileup_params Returns: ref base count Return type: int
-
set_alt
(alt)[source]¶ Parameters: alt (string) – alt base Raises: ValueError
– alt length must be 1 and one of ATCG.
-
set_ref
(ref)[source]¶ Parameters: ref (string) – reference base Raises: ValueError
– reference length must be 1 and one of ATCG.
-
-
class
biofx.variants.features.
Variant
(chromosome, start, end=None)[source]¶ Bases:
object
A base class for describing variants. Currently supports small mutations.
-
chromosome
¶ string
-
start
¶ int
-
end
¶ int
-
ref
¶ string – reference base
-
alt
¶ string – alternative base
-
ref_count
¶ int – reference base count
-
alt_count
¶ int – alternative base count
-
coverage
¶ int – total coverage
-
state
¶ string – insertion, deletion, duplication etc.
-
misc
¶ list
-
effects
¶ string
-
snpeff_effects
¶
-
identifier
¶ string – external IDs associated with variant. Usually column 3 in vcf
-
provenance
¶ string – if applicable, tool used to call variant
-
cosmic
¶ string – cosmic ID
-
dbsnp
¶ string – dbSNP ID
-
bamfile
¶ string – path to bamfile where this variant came from
-
source
¶ string – source ID where this variant came from
-
is_special
¶ bool – True if special characters (not ATCG) in ref/alt base
-
homopolymer
¶ bool – True if homopolymer
-
clnsig_values
¶ string – bar delimited integers
-
cgl_pathogenicity
¶ string – PATHOGENIC, VUS, BENIGN
-
info
¶ ? – vcf info
-
add_comment
(comment)[source]¶ Adds a new comment to a list of comments
Parameters: comment (string) – comment string
-
add_misc
(**kwargs)[source]¶ add miscellaneous info
Parameters: **kwargs – key word args for additional info.
-
get_variant_info
(data, samtools, fasta_file)[source]¶ Get read support and total coverage for variant.
Parameters: Returns: a dictionary with library IDs as keys and attribute keywords
Return type: variant_info (dict)
Raises: AssertionError
– each bam file in bams is assumed to come from a different library
-
get_variant_mpileup
(bamfile, samtools='/gsc/software/linux-x86_64-centos6/samtools-0.1.19/samtools', reference='/projects/alignment_references/9606/hg19a/genome/bwa_32/hg19a.fa', mpileup_params=['--ff', '1540', '-BQ0'])[source]¶ Get mpileup record for variant.
Todo
add functionality for region? move executables to a configs module/file or something like that
Parameters: - bamfile (string) – path to bamfile
- samtools (string) – samtools exectutable. Default: /gsc/software/linux-x86_64-centos6/samtools-0.1.19
- reference (string) – path to reference fasta file. Default: /projects/alignment_references/9606/hg19a/genome/bwa_32/GRCh37-lite.fa
- mpileup_params (list) – list of mpileup parameters
-
static
has_pathogenicity_conflict
(p1, p2)[source]¶ Check for pathogenicity conflict between two sources. :param p1: clinvar pathogenicity :type p1: string :param p2: cgl pathogenicity :type p2: string
Returns: True if has conflict Return type: boolean
-
normalize_splice_site
()[source]¶ Checks hgvs effect type attributes and change value to a normalized splice site string if effect is a splice site.
-
set_allele_model
(gmaf, maf_cutoff)[source]¶ Get allele model string from gmaf. TODO: enumerate this? :param gmaf: allele model gmaf :type gmaf: float :param maf_cutoff: Cut off for rare/common :type maf_cutoff: float
Returns: allele model string Return type: allele_model (string)
-
set_alternative
(eff_maps, model, to_string=True)[source]¶ Set alternative (not best/canonical) effect descriptions.
Parameters: - eff_maps (list) – a list of dictionaries
- model (string) – transcript or gene
- to_string –
Raises: ValueError
– invalid model
-
set_alternative_genes
(eff_maps, to_string=True)[source]¶ Todo
merge set alternative transcripts and set alternative genes
Parameters: - eff_maps –
- to_string –
Returns:
-
set_alternative_transcripts
(eff_maps, to_string=True)[source]¶ Todo
merge set alternative transcripts and set alternative genes
Parameters: - eff_maps (list) – a list of dictionaries
- to_string –
Returns:
-
set_cosmic
(cosmic)[source]¶ Parameters: cosmic (string) – cosmic ID Raises: ValueError
– not a cosmic ID
-
set_dbsnp
(dbsnp)[source]¶ Parameters: dbsnp (string) – dbSNP ID Raises: ValueError
– not a dbsnp ID
-
set_eff
(eff_map, hgvs=True, classic=False)[source]¶ Parameters: Note: Set both hgvs and classic to True if it is a merged eff map from merge_eff_maps
-
set_zygosity
(genotype)[source]¶ Assign zygosity to variant.
Parameters: - v (Variant) – an instance of :class`biofx.variants.Variants`
- genotype (string) – genotype from vcf record
Raises: AssertionError
– assume no existing zygosity for variantAssertionError
– assume 1/1 or 0/1 for genotype
-
biofx.variants.operations module¶
-
biofx.variants.operations.
chunk_variants_by_chromosome
(infile, outdir='.')[source]¶ Split a file with chromosome positions into separate files grouped by chromosome.
Parameters: - infile (string) – input file path. a tab delimited file with chromosomes in column 1 and positions in column 2.
- outdir (string) – output file directory.
Returns: a dictionary of chromosome as keys and positions as values.
Return type: var_by_chromosome (dict)
-
biofx.variants.operations.
generate_mpileup
(bamfile, variants, output=None, region=None, samtools='/gsc/software/linux-x86_64-centos6/samtools-0.1.19/samtools', reference='/projects/alignment_references/9606/hg19a/genome/bwa_32/hg19a.fa', mpileup_params=['--ff', '1540', '-BQ0'])[source]¶ Generate mpileup file for a set of variants.
Parameters: - bamfile (string) – path to bamfile
- variants (string) – path to file containing list of positions (chr pos) or regions (BED)
- output (string) – mpileup output file
- region (string) – region in the format of chr:start-end
- samtools (string) – samtools exectutable. Default: /gsc/software/linux-x86_64-centos6/samtools-0.1.19
- reference (string) – path to reference fasta file.
- mpileup_params (list) – list of mpileup parameters
Returns: - tuple containing:
- (string): mpileup record
- (string): mpileup command executed
Return type: