GeneticCode¶
- class GeneticCode(code_sequence, ID=None, name=None, start_codon_sequence=None)¶
Holds codon to amino acid mapping, and vice versa.
Use the get_code() function to get one of the included code instances. These are created as follows.
>>> code_sequence = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG' >>> gc = GeneticCode(code_sequence) >>> sgc['UUU'] == 'F' >>> sgc['TTT'] == 'F' >>> sgc['F'] == ['TTT', 'TTC'] #in arbitrary order >>> sgc['*'] == ['TAA', 'TAG', 'TGA'] #in arbitrary order
code_sequence : 64 character string containing NCBI genetic code translation
GeneticCode is immutable once created.
- property blocks¶
Returns list of lists of codon blocks in the genetic code.
- A codon block can be:
a quartet, if all 4 XYn codons have the same amino acid.
a doublet, if XYt and XYc or XYa and XYg have the same aa.
a singlet, otherwise.
Returns a list of the quartets, doublets, and singlets in the order UUU -> GGG.
Note that a doublet cannot span the purine/pyrimidine boundary, and a quartet cannot span the boundary between two codon blocks whose first two bases differ.
- changes(other)¶
Returns dict of {codon:’XY’} for codons that differ.
X is the string representation of the amino acid in self, Y is the string representation of the amino acid in other. Always returns a 2-character string.
- get_stop_indices(dna, start=0)¶
returns indexes for stop codons in the specified frame
- is_start(codon)¶
Returns True if codon is a start codon, False otherwise.
- is_stop(codon)¶
Returns True if codon is a stop codon, False otherwise.
- sixframes(dna)¶
Returns six-frame translation as dict containing {frame:translation}
- to_regex(seq)¶
returns a regex pattern with an amino acid expanded to its codon set
- Parameters:
seq – a Sequence or string of amino acids
- to_table()¶
returns aa to codon mapping as a cogent3 Table
- translate(dna, start=0)¶
Translates DNA to protein with current GeneticCode.
- Parameters:
dna (str) – a string of nucleotides
start (int) – position to begin translation (used to implement frames)
- Returns:
String containing amino acid sequence. Translates the entire sequence.
It is the caller’s responsibility to find open reading frames.