[ensembl-dev] Cuffmerge GFF Error: duplicate/invalid 'transcript'. Difference in ensembl GTF and GFF3 files are the cause?

mag mr6 at ebi.ac.uk
Tue Nov 1 13:48:09 GMT 2016


Hi Paul,

Thank you for reporting this.
We run genometools on all the generated GFF3 files, unfortunately it did 
not spot this error.

The differences I can see between the GTF and GFF3 files are the following:
- the transcript is listed with a source ensembl_havana in the GFF3 
file, havana in the GTF files.
All children features have source havana.
This looks like a bug in our pipeline and we will aim to fix this for 
next release.

- the transcript has type 'transcript' in the GTF file and type 
'NMD_transcript_variant' in the GFF3 file.
The GFF3 specifications allow us to use more fine-grained SO terms, as 
is the case here.
This should hopefully not break any downstream analyses, but it could be 
that some software do not use fully resolved sequence ontologies.
NMD_transcript_variant is a valid SO term: 
http://www.sequenceontology.org/browser/current_svn/term/SO:0001621

As mentioned, we will aim to fix the source discrepancy for next 
release, hopefully this will solve the problem for you.

 From the link you mentioned, it would also seem like the developers for 
cufflinks have patched a workaround for these data issues.


Regards,
Magali

On 26/10/2016 14:31, Paul Klemm wrote:
>
> Hi dev at ensembl members,
>
> I investigate a problem regarding cuffmerge in combination with 
> ensembl GFF3 files and seek help in understanding the difference 
> between the GFF3 and GTF files in ensembl.
>
> I align RNA-Seq reads with HISAT2 to the reference genome and then 
> derive the transcriptome by running cufflinks with the Mus.Musculus 
> e.86 release /GFF3/ file. When I run cuffmerge on these files it fails 
> with an error |GFF Error: duplicate/invalid 'transcript' feature 
> ID=transcript:ENSMUST00000045689|.
>
> When I do the very same analysis with the Mus.Musculus e.86 release 
> /GTF/ file, everything runs fine. I investigated 
> the|ENSMUST00000045689| transcript and indeed found differences 
> between the GTF and GFF3 file! This is potentially causing the problem 
> in cufflinks.
>
> The description of the difference and a fully functional minimal 
> example can be found in this repository: 
> https://github.com/paulklemm/cuffmerge_bug.
>
> My question is: Why is there a difference in the annotation between 
> the GFF3 and GTF files? I thought that it is the same information just 
> stored in different formats. That seems not to be the case.
>
> I contributed the code to a recent bug report in the cufflinks 
> repository, which describes this problem: 
> https://github.com/cole-trapnell-lab/cufflinks/issues/77.
>
> Thanks for the help.
>
> Paul
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161101/bd20ff6a/attachment.html>


More information about the Dev mailing list