[ensembl-dev] Cuffmerge GFF Error: duplicate/invalid 'transcript'. Difference in ensembl GTF and GFF3 files are the cause?

Paul Klemm paul.klemm at googlemail.com
Wed Oct 26 14:31:18 BST 2016


Hi dev at ensembl members,

I investigate a problem regarding cuffmerge in combination with ensembl
GFF3 files and seek help in understanding the difference between the GFF3
and GTF files in ensembl.

I align RNA-Seq reads with HISAT2 to the reference genome and then derive
the transcriptome by running cufflinks with the Mus.Musculus e.86 release
*GFF3* file. When I run cuffmerge on these files it fails with an error GFF
Error: duplicate/invalid 'transcript' feature
ID=transcript:ENSMUST00000045689.

When I do the very same analysis with the Mus.Musculus e.86 release *GTF* file,
everything runs fine. I investigated theENSMUST00000045689 transcript and
indeed found differences between the GTF and GFF3 file! This is potentially
causing the problem in cufflinks.

The description of the difference and a fully functional minimal example
can be found in this repository: https://github.com/paulklemm/cuffmerge_bug.

My question is: Why is there a difference in the annotation between the
GFF3 and GTF files? I thought that it is the same information just stored
in different formats. That seems not to be the case.

I contributed the code to a recent bug report in the cufflinks repository,
which describes this problem:
https://github.com/cole-trapnell-lab/cufflinks/issues/77.

Thanks for the help.

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161026/3e668934/attachment.html>


More information about the Dev mailing list