chr16 AUGUSTUS transcript 93718 96048 . + . jg6.t1
chr16 AUGUSTUS CDS 93718 93790 1.0 + 0 transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3M-1,5";
chr16 AUGUSTUS exon 93718 93790 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3,5";
chr16 AUGUSTUS start_codon 93718 93720 0.0 + . transcript_id "jg6.t1"; gene_id "jg6";
chr16 AUGUSTUS intron 93791 94782 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3E:M-6,5";
chr16 AUGUSTUS CDS 94783 94962 1.0 + 2 transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3M-1,0,1,2,5,6";
chr16 AUGUSTUS exon 94783 94962 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3,0,1,2,5,6";
chr16 AUGUSTUS intron 94963 95061 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3E:M-6,0,1,2E-2335,5,6E-53";
chr16 AUGUSTUS CDS 95062 95145 1.0 + 2 transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3M-1,0,2,5,6";
chr16 AUGUSTUS exon 95062 95145 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3,0,2,5,6";
chr16 AUGUSTUS intron 95146 95333 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3E:M-29,0,2E-1908,5,6E-65";
chr16 AUGUSTUS CDS 95334 95410 1.0 + 2 transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3M-1,0,1,2,5,6";
chr16 AUGUSTUS exon 95334 95410 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3,0,1,2,5,6";
chr16 AUGUSTUS intron 95411 95503 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3E:M-28,0,1,2E-1877,5,6E*-108";
chr16 AUGUSTUS CDS 95504 95567 1.0 + 0 transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3M-1,0,1,2,5";
chr16 AUGUSTUS exon 95504 95567 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3,0,1,2,5";
chr16 AUGUSTUS intron 95568 95651 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3E:M-18,0,1,2E-1823,5";
chr16 AUGUSTUS exon 95652 96048 0.0 + . transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3";
chr16 AUGUSTUS CDS 95652 95851 1.0 + 2 transcript_id "jg6.t1"; gene_id "jg6"; hgm_info "3M-1,0,1,2,5";
chr16 AUGUSTUS stop_codon 95849 95851 0.0 + . transcript_id "jg6.t1"; gene_id "jg6";
In the extended gtf format CDS, exon and intron features of T have an additional attribute 'hgm_info' in the last column that encodes a comma-separated list of
tuples of genome name, sources of evidence and multiplicity, e.g the tuple
2E-1908
encodes genome_name=galGal4 (see header), source=E and mult=1908.
The 'hgm_info' string of the second intron
hgm_info "3E:M-6,0,1,2E-2335,5,6E-53";
states that the introns is consistent with an intron in the gene sets of species 3 (=hg38), 0 (=bosTau8), 1 (=canFam3), 2 (=galGal4), 5 (=monDom5) and 6 (=rheMac3).
An intron/exon of species A is considered consistent with an intron/exon of species B, if both boundaries are aligned. CDS exons must
additionally be in the same reading frame to be consistent with one another.
In species 3,2 and 6 the intron is supported by RNA-Seq splice junctions (source E) with multiplicities 6, 2335 and 53, respectively.
Furthermore, there is even evidence from an existing annotation in species 3 (source M) for the intron.
If a source is followed by '*', the gene feature is not present in that particular species, although there is evidence for it, e.g.
hgm_info "...,6E*-108";
means that the gene feature is not present in the gene set of species 6, but has RNA-Seq support - a sign for a false negative in species 6, in particular if
the gene feature is present and has strong evidence in many of the other species.
# gene feature level:
#
#
# number/% of features with exact homologs in at least k other genomes:
#
#---------------------------------------------------------------------------------------------------
# k CDS Intr Exon Intr+Exon
#---------------------------------------------------------------------------------------------------
#
# 0 6 100.0% 5 100.0% 6 100.0% 11 100.0% *************************
# 1 6 100.0% 5 100.0% 5 83.3% 10 90.9% **********************
# 2 5 83.3% 4 80.0% 4 66.7% 8 72.7% ******************
# 3 5 83.3% 4 80.0% 4 66.7% 8 72.7% ******************
# 4 5 83.3% 4 80.0% 4 66.7% 8 72.7% ******************
# 5 2 33.3% 1 20.0% 2 33.3% 3 27.3% ******
# 6 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 7 0 0.0% 0 0.0% 0 0.0% 0 0.0%
#
# number/% of features supported by extrinsic evidence in at least k genomes :
#
#---------------------------------------------------------------------------------------------------
# k CDS Intr Exon Intr+Exon
#---------------------------------------------------------------------------------------------------
#
# 0 6 100.0% 5 100.0% 6 100.0% 11 100.0% *************************
# 1 6 100.0% 5 100.0% 0 0.0% 5 45.5% ***********
# 2 0 0.0% 4 80.0% 0 0.0% 4 36.4% *********
# 3 0 0.0% 3 60.0% 0 0.0% 3 27.3% ******
# 4 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 5 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 6 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 7 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 8 0 0.0% 0 0.0% 0 0.0% 0 0.0%
The gene feature level displays the cumulative sum of percentages of CDS/intron/exon features of the transcript T, that are
- consistent with CDS/intron/exon features in k other genomes (first table)
- are supported by hints in at least k genomes (second table)
Let's have a closer look at the first table. For k=3
#---------------------------------------------------------------------------------------------------
# k CDS Intr Exon Intr+Exon
#---------------------------------------------------------------------------------------------------
#
# 0 6 100.0% 5 100.0% 6 100.0% 11 100.0% *************************
# 1 6 100.0% 5 100.0% 5 83.3% 10 90.9% **********************
# 2 5 83.3% 4 80.0% 4 66.7% 8 72.7% ******************
# 3 5 83.3% 4 80.0% 4 66.7% 8 72.7% ******************
# 4 5 83.3% 4 80.0% 4 66.7% 8 72.7% ******************
# 5 2 33.3% 1 20.0% 2 33.3% 3 27.3% ******
# 6 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 7 0 0.0% 0 0.0% 0 0.0% 0 0.0%
we can see that
- 5 out of the 6 CDS exons (83.3%) are consistent with a CDS exon
- 4 out of the 5 introns (80.0%) are consistent with an intron
- 4 out of the 6 exons (66.7%) are consistent with an exon
in at least k=3 of the other gene sets.
For k=1, the second table shows that
#---------------------------------------------------------------------------------------------------
# k CDS Intr Exon Intr+Exon
#---------------------------------------------------------------------------------------------------
#
# 0 6 100.0% 5 100.0% 6 100.0% 11 100.0% *************************
# 1 6 100.0% 5 100.0% 0 0.0% 5 45.5% ***********
# 2 0 0.0% 4 80.0% 0 0.0% 4 36.4% *********
# 3 0 0.0% 3 60.0% 0 0.0% 3 27.3% ******
# 4 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 5 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 6 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 7 0 0.0% 0 0.0% 0 0.0% 0 0.0%
# 8 0 0.0% 0 0.0% 0 0.0% 0 0.0%
- 6 out of the 6 CDS exons (100.0%)
- 5 out of the 5 introns (100.0%)
are supported by evidence (any source) in at least k=1 of the gene sets.
Note that the database contains no 'exon' hints. Thus, none of the exon features have support.