Annotations¶
Annotations with coordinates¶
For more extensive documentation about annotations see Advanced sequence handling.
Automated introduction from reading genbank files¶
We load a sample genbank file with plenty of features and grab the CDS features.
Customising annotation construction from reading a genbank file¶
You can write your own code to construct annotation objects. One reason you might do this is some genbank files do not have a /gene
tag on gene related features, instead only possessing a /locus_tag
. For illustrating the approach we only create annotations for CDS
features. We write a custom callback function that uses the locus_tag
as the Feature
name.
Creating directly on a sequence¶
Via¶
add_annotation
¶
add_feature
¶
There are other annotation types.
Adding as a series or item-wise¶
Taking the union of annotations¶
Construct a pseudo-feature (cds
) that’s a union of other features (exon1
, exon2
, exon3
).
Getting annotation coordinates¶
These are useful for doing custom things, e.g. you could construct intron features using the below.
Annotations have shadows¶
A shadow is a span representing everything but the annotation.
Compare to the coordinates of the original.
Adding to a sequence member of an alignment¶
The following annotation is directly applied onto the sequence and so is in ungapped sequence coordinates.
Adding to an alignment¶
We add an annotation directly onto an alignment. In this example we add a Variable
that can be displayed as a red line on a drawing. The resulting annotation (red_data
here) is in alignment coordinates!
Slicing sequences and alignments by annotations¶
By a feature or coordinates returns same sequence span
Using the annotation object get_slice
method returns the same thing.
Slicing by pseudo-feature or feature series¶
Warning
Slices are applied in order!
Slice series must not be overlapping¶
But get_region_covering_all
resolves this, ensuring no overlaps.
You can slice an annotation itself¶
Sequence vs Alignment slicing¶
You can’t slice an alignment using an annotation from a sequence.
Copying annotations¶
You can copy annotations onto sequences with the same name, even if the length differs
but if the feature lies outside the sequence being copied to, you get a lost span
You can copy to a sequence with a different name, in a different alignment if the feature lies within the length
If the sequence is shorter, again you get a lost span.
Querying¶
You need to get a corresponding annotation projected into alignment coordinates via a query.
Querying produces objects only valid for their source¶
Querying for absent annotation¶
You get back an empty list, and slicing with this returns an empty sequence.
Querying features that span gaps in alignments¶
If you query for a feature from a sequence, it’s alignment coordinates may be discontinuous.
Note
The T
opposite the gap is missing since this approach only returns positions directly corresponding to the feature.
as_one_span
unifies features with discontinuous alignment coordinates¶
To get positions spanned by a feature, including gaps, use as_one_span
.
Behaviour of annotations on nucleic acid sequences¶
Reverse complementing a sequence does not reverse annotations, that is they retain the reference to the frame for which they were defined.
Masking annotated regions¶
We mask the CDS regions.
The above sequence could then have positions filtered so no position with the ambiguous character ‘?’ was present.
Masking annotated regions on alignments¶
We mask exon’s on an alignment.
These also persist through reverse complement operations.
You can take mask of the shadow¶
What features of a certain type are available?¶
Getting all features of a type, or everything but that type¶
The annotation methods get_region_covering_all
and get_shadow
can be used to grab all the coding sequences or non-coding sequences in a DnaSequence
object.
Getting sequence features when you have an alignment object¶
Sequence features can be accessed via a containing Alignment
.
Annotation display on sequences¶
We can display annotations on sequences, writing to file.
We first make a sequence and add some annotations.