[ensembl-dev] Cuffmerge GFF Error: duplicate/invalid 'transcript'. Difference in ensembl GTF and GFF3 files are the cause?
Paul Klemm
paul.klemm at googlemail.com
Wed Oct 26 14:31:18 BST 2016
Hi dev at ensembl members,
I investigate a problem regarding cuffmerge in combination with ensembl
GFF3 files and seek help in understanding the difference between the GFF3
and GTF files in ensembl.
I align RNA-Seq reads with HISAT2 to the reference genome and then derive
the transcriptome by running cufflinks with the Mus.Musculus e.86 release
*GFF3* file. When I run cuffmerge on these files it fails with an error GFF
Error: duplicate/invalid 'transcript' feature
ID=transcript:ENSMUST00000045689.
When I do the very same analysis with the Mus.Musculus e.86 release *GTF* file,
everything runs fine. I investigated theENSMUST00000045689 transcript and
indeed found differences between the GTF and GFF3 file! This is potentially
causing the problem in cufflinks.
The description of the difference and a fully functional minimal example
can be found in this repository: https://github.com/paulklemm/cuffmerge_bug.
My question is: Why is there a difference in the annotation between the
GFF3 and GTF files? I thought that it is the same information just stored
in different formats. That seems not to be the case.
I contributed the code to a recent bug report in the cufflinks repository,
which describes this problem:
https://github.com/cole-trapnell-lab/cufflinks/issues/77.
Thanks for the help.
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161026/3e668934/attachment.html>
More information about the Dev
mailing list