Navigate to Cactus alignments and assembly Hubs. AugCGP de novo. AugCGP with RNA-Seq. Annotation transfer with AugCGP. Combining RNA-Seq and annotation evidence. Cross-species consistency of gene sets.

Tutorial on comparative gene finding with AUGUSTUS-CGP

CGP (comparative gene prediction) is a recent extension of AUGUSTUS for clade annotation. It takes two or more genomes of related species and predicts all genes (only protein-coding and one isoform per gene) in all input genomes at the same time. AUGUSTUS-CGP can integrate the same types of evidence as in single-species mode for either a subset or all genomes. A genome alignment of the species is used to transfer evidence across genomes and to exploit evolution evidence for genes, e.g. by looking for conserved regions and regions that are under negative selection. For further reading, see König et al. (2016)

In this tutorial we practice the most common applications of AUGUSTUS-CGP: de novo gene finding (i.e. only the raw genomes are used and no extrinsic evidence), integration of RNA-Seq evidence and lifting over annotations from one (or more) species to the other species in the clade. AUGUSTUS-CGP has three mandatory inputs: the set of genomes, each in Multi-FASTA format, an alignment of the genomes and a phylogenetic tree in NEWICK format. You don't have a whole-genome alignment for your clade? No problem - the tutorial covers how a whole-genome alignment can be created with progressiveCactus. The output alignment of progressiveCactus is in HAL format, which can be displayed as Assembly Hubs in the Genome Browser. In one of the exercise, we will set up such an Assembly Hub and show how gene tracks can be uploaded for visualization. If the phylogeny of the species is not known, we recommend using a star-like tree with uniform branch lengths.

Styles

Assignments are in this color. The lazy ones may go through very fast through this tutorial by just reading these assignments and cutting and pasting the commands that follow them (more or less).

Results are in this color.

[+] Details are hidden...

Software Requirements

Example Data

Our example data set covers a 2 Megabase syntentic region in 8 vertebrates
species assembly genomic region
human hg38 chr16:186964-397118
mouse mm10 chr17:26104939-26283331
rat rn6 chr10:15470071-15570014
cow bosTau8 chr25:224163-380253
dog canFam3 chr6:40126702-40311429
rhesus rheMac3 chr20:149129-369768
rabbit monDom5 chr6:149454308-149994826
chicken galGal4 chr14:12108253-12258251

All files are in the data directory. We recommend you work directly in this directory.

For Cheaters: Result Files

You can use the files in the results directory to catch on if you are behind or to compare your results.

Exercise 1: Creating a whole-genome alignment

  1. Create a HAL alignment with progressiveCactus as described in 1. Running progressive Cactus
  2. Export HAL alignment to MAF and split alignment into overlapping chunks for parallel computing as described in 2. Export HAL alignment as MAF
  3. Build a comparative assembly hub
  4. Load the hub and browser the alignment
  5. Load a reference annotation to the hub

Exercise 2: De novo comparative gene finding with AUGUSTUS-CGP

[+] Prerequisites
  1. Load genomes into an SQLite database
  2. Run AUGUSTUS in CGP mode
  3. Merge gene predictions from parallel runs
  4. Upload gene predictions into the assembly hub

Exercise 3: RNA-Seq-based comparative gene finding with AUGUSTUS-CGP

[+] Prerequisites
  1. Generate hints from RNA-Seq data
  2. Load RNA-Seq hints into the database
  3. Prepare an extrinsic config file
  4. Run AUGUSTUS-CGP with RNA-Seq hints
  5. Merge gene predictions from parallel runs
  6. Upload gene predictions into the assembly hub

Exercise 4: Transferring annotations with AUGUSTUS-CGP

[+] Prerequisites
  1. Generate 'CDS' and 'intron' hints from annotations
  2. Load annotation hints into the database
  3. Prepare an extrinsic config file
  4. Run AUGUSTUS-CGP with annotation hints
  5. Merge gene predictions from parallel runs
  6. Upload gene predictions into the assembly hub

Exercise 5: Combining Annotation transfer and RNA-Seq-based prediction

[+] Prerequisites
  1. Create a database with RNA-Seq and annotation hints
  2. Prepare an extrinsic config file
  3. Run AUGUSTUS with RNA-Seq and annotation hints
  4. Merge gene predictions from parallel runs
  5. Upload gene predictions into the assembly hub

Exercise 6: Cross-species comparison of gene sets

For demonstration purposes, we use the output gene sets from Exercise 5. But you may as well use any other gene set from exercise 2, 3 or 4.

[+] Prerequisites
  1. Run HomGeneMapping to map coordinates between genomes
  2. Get familiar with the homGeneMapping output format by reading 2. homGeneMapping output explained