[ensembl-dev] Difference in CDS lengths from GTF or GFF files

Rainer Johannes Johannes.Rainer at eurac.edu
Mon Jan 18 13:37:57 GMT 2016


Dear all,

I’m a little puzzled at the moment by a difference in CDS length for transcripts I get when extracting that information from the Ensembl GTF files or from the Ensembl GFF files or the Perl API. In the end I think it’s just a matter of definition, but when I’m calculating the length of the CDS using Ensembl GTF files I get, for some transcripts, different results than when using the GFF files or the Perl API (using ->coding_region_start and ->coding_region_end); for GTF and GFF files I’m using the start and end coordinates of the CDS feature.
Now, the GTF files contain also the feature type stop_codon that apparently is downstream of the CDS (according to the GTF specification). I’m now puzzled whether I should consider the stop_codon in the CDS length calculation or not. In other words, does the stop codon belong to the CDS or not? According to the info from the GFF and the perl API it does…

thanks for any help and suggestions, 

cheers, jo


More information about the Dev mailing list