[ensembl-dev] Triticum aestivum invalid GFF3

Hans Vasquez-Gross havasquezgross at ucdavis.edu
Thu Apr 17 23:16:16 BST 2014


Hello Ensembl Team,

I recently downloaded the recently updated GFF3 release (MIPS v22).  Thank
you for fixing the problems I mentioned in my previous email!

However, there is still one major problem that is causing parsers to die,
and a minor problem which will help the parsers.  The minor suggestion is
to start including the standard '###' strings on newlines for each logical
feature block for a given seq_id in the first colunmn.  This helps parsers
know that its the end of that feature block and allows them to clear memory
for the next section.  Since your GFF3 does not have any of these strings
on newlines, parsers have to keep everything in memory.

Second, there is scoping problems for the three_prime_UTR and
five_prime_UTR definitions.

Changing this line:
IWGSC_CSS_3AS_scaff_369935      .       five_prime_UTR  199     200     .
    -       .       Parent=Traes_3AS_775C097A2.1

to:
IWGSC_CSS_3AS_scaff_369935      .       five_prime_UTR  199     200     .
    -       .       Parent=transcript:Traes_3AS_775C097A2.1

fixes the problem.  After writing a py script to prefix 'transcript:' to
the parent string ID for both five_prime and three_prime UTRs, I was able
to successfully validate and load the GFF3s to standard visualization
tools.  If you would like to download the list of UTRs that have this
problem, here is the an error log that you can inspect:
http://169.237.215.34/ftp/temp/test_mips_gff3load.err (7.6MB) .

Cheers,
-Hans
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140417/fb8cec32/attachment.html>


More information about the Dev mailing list