Annotations

Annotations with coordinates

For more extensive documentation about annotations see Advanced sequence handling.

Automated introduction from reading genbank files

We load a sample genbank file with plenty of features and grab the CDS features.

Customising annotation construction from reading a genbank file

You can write your own code to construct annotation objects. One reason you might do this is some genbank files do not have a /gene tag on gene related features, instead only possessing a /locus_tag. For illustrating the approach we only create annotations for CDS features. We write a custom callback function that uses the locus_tag as the Feature name.

Creating directly on a sequence

Via

add_annotation

add_feature

There are other annotation types.

Adding as a series or item-wise

Taking the union of annotations

Construct a pseudo-feature (cds) that’s a union of other features (exon1, exon2, exon3).

Getting annotation coordinates

These are useful for doing custom things, e.g. you could construct intron features using the below.

Annotations have shadows

A shadow is a span representing everything but the annotation.

Compare to the coordinates of the original.

Adding to a sequence member of an alignment

The following annotation is directly applied onto the sequence and so is in ungapped sequence coordinates.

Adding to an alignment

We add an annotation directly onto an alignment. In this example we add a Variable that can be displayed as a red line on a drawing. The resulting annotation (red_data here) is in alignment coordinates!

Slicing sequences and alignments by annotations

By a feature or coordinates returns same sequence span

Using the annotation object get_slice method returns the same thing.

Slicing by pseudo-feature or feature series

Warning

Slices are applied in order!

Slice series must not be overlapping

But get_region_covering_all resolves this, ensuring no overlaps.

You can slice an annotation itself

Sequence vs Alignment slicing

You can’t slice an alignment using an annotation from a sequence.

Copying annotations

You can copy annotations onto sequences with the same name, even if the length differs

but if the feature lies outside the sequence being copied to, you get a lost span

You can copy to a sequence with a different name, in a different alignment if the feature lies within the length

If the sequence is shorter, again you get a lost span.

Querying

You need to get a corresponding annotation projected into alignment coordinates via a query.

Querying produces objects only valid for their source

Querying for absent annotation

You get back an empty list, and slicing with this returns an empty sequence.

Querying features that span gaps in alignments

If you query for a feature from a sequence, it’s alignment coordinates may be discontinuous.

Note

The T opposite the gap is missing since this approach only returns positions directly corresponding to the feature.

as_one_span unifies features with discontinuous alignment coordinates

To get positions spanned by a feature, including gaps, use as_one_span.

Behaviour of annotations on nucleic acid sequences

Reverse complementing a sequence does not reverse annotations, that is they retain the reference to the frame for which they were defined.

Masking annotated regions

We mask the CDS regions.

The above sequence could then have positions filtered so no position with the ambiguous character ‘?’ was present.

Masking annotated regions on alignments

We mask exon’s on an alignment.

These also persist through reverse complement operations.

You can take mask of the shadow

What features of a certain type are available?

Getting all features of a type, or everything but that type

The annotation methods get_region_covering_all and get_shadow can be used to grab all the coding sequences or non-coding sequences in a DnaSequence object.

Getting sequence features when you have an alignment object

Sequence features can be accessed via a containing Alignment.

Annotation display on sequences

We can display annotations on sequences, writing to file.

We first make a sequence and add some annotations.