Navigate to
Using Scipio.
Training AUGUSTUS.
Predicting Genes.
AUGUSTUS-PPX.
Lab Session on Gene Prediction with AUGUSTUS
In this lab session we practice the most common Bioinformatics steps when predicting the protein-coding genes in
a eukaryotic genome with AUGUSTUS. We will assume the case of a "new"
genome, for which AUGUSTUS has not been trained before, but will use well-studied species as examples because
example data is readily available and visualization is easier.
Styles
Assignments are in this color. The lazy ones may go through very
fast through this tutorial by just reading these assignments and cutting and pasting the commands
that follow them (more or less).
Results are in this color.
[+]
Details are hidden...
You don't have to read this. If you get bored with the speed of the tutorial then you can read these details boxes.
Example Data
All example files are in the data directory. We recommend
you work directly in this directory.
Drosophila melanogaster (Exercises 1-5)
Human (Exercises 6)
For Cheaters: Result Files
You can use the files in the results directory to catch on if you are behind or to compare your results.
Exercise 1: Compile a Training Set
There are several typical options for creating a training set
to estimate the parameters of gene finders. We will here go through option 4:
Spliced alignments of protein sequences
We assume that we have a set of protein sequences of the same or a very closely related species and will use Scipio to infer the gene structures.
- Follow the tutorial on "Using Scipio to create a training set"
and create a training set genes.gb.
- Partition genes.gb into a training set and a holdout test setas described in 1.2 Split gene structure set....
Exercise 2: Train the Coding Regions of AUGUSTUS
Let's name our species "bug". Pretending that there was not already a parameters set of AUGUSTUS for
Drosophila (named "fly"), we will estimate the parameters from the training set.
- Create a meta parameters file for bug as described in 2. CREATE A META PARAMETERS FILE...
- Estimate the parameters using your training set as described in 3. MAKE AN INITIAL TRAINING
Exercise 3: Ab Initio Predict Genes in the Genome
- Predict the protein-coding parts of the genes
in a sample sequence of Drosophila melanogaster as described in
1. PREDICT GENES AB INITIO.
- Visualize your predicted genes as decribed in 2. MAKE A CUSTOM GENE PREDICTION TRACK....
Exercise 4: Prepare hints
Construct extrinsic evidence about genes from transcriptome data (ESTs and RNA-Seq) following
the intructions in 3. PREPARE HINTS.
Exercise 5: Predict Genes Using Hints
Structurally annotate an example sequence from Drosophila based on the hints from exercise 4 by
- setting the hint parameters
- predicting genes using hints
Exercise 6: Identify Members of a Protein Family
Use the new PPX-Extension of AUGUSTUS to find the gene structures
based on a multiple alignment of a protein family as described in
AUGUSTUS-PPX.