[ensembl-dev] CDS and stop codons in Ensembl GTF releases

Wibowo Arindrarto bow at bow.web.id
Tue Jun 6 08:40:06 BST 2017


Dear Ensembl developers,

I was wondering what the rules are for stop codon and CDS features in GTF files released by Ensembl. Should stop codons be part of a CDS feature?

As far as I understand, they should not be (also referring to this Biostar post to which Ensembl replied: https://www.biostars.org/p/206362/). However, I stumbled on a transcript in the release 89 GTF (Homo sapiens) that has a stop codon included as part of a CDS. This is transcript ENST00000383070 and I am pasting the relevant columns below ($ grep ENST00000341290 {gtf} | cut -f1,3,4,5,7) :

Y ensembl_havana transcript 2786855 2787699 . -
Y ensembl_havana transcript 2786855 2787699 -
Y ensembl_havana exon 2786855 2787699 -
Y ensembl_havana CDS 2786989 2787603 -
Y ensembl_havana start_codon 2787601 2787603 -
Y ensembl_havana stop_codon 2786989 2786991 -
Y ensembl_havana five_prime_utr 2787604 2787699 -
Y ensembl_havana three_prime_utr 2786855 2786988 -

Y ensembl_havana three_prime_utr 2786855 2786988 . -

As you can see, the stop codon spans position 2786989 and 2786991, but position 2786989 onwards is already part of the (only) CDS of the transcript.

Would somebody clarify whether this is intentional or not?

With kind regards,
Wibowo Arindrarto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170606/29842690/attachment.html>


More information about the Dev mailing list