[ensembl-dev] Sorting GFF?

James Allen jallen at ebi.ac.uk
Wed Apr 19 10:13:56 BST 2017


Hello,
I use a program in the GenomeTools suite, "gt gff3" (http://genometools.org/tools/gt_gff3.html). In addition to the -sort parameter, you will probably want -retainids (otherwise the Ensembl IDs in the 'ID' attribute will be replaced with arbitrary numbers). I usually also run the program with the -tidy option, which ensures a few fiddly niceties about the GFF3 spec are observed.

Finally, there is also a good GFF3 validator in the GenomeTools suite, "gff3validator" (doc: http://genometools.org/tools/gt_gff3validator.html; online version: http://genometools.org/cgi-bin/gff3validator.cgi).

Cheers,
James


On Tue, 18 Apr 2017 13:49:28 -0700
Ben Bimber <bbimber at gmail.com> wrote:

> Hello,
> 
> This is tangential to Ensembl; however, I hope the Ensembl dev might
> have some advice.  We're using a GFF provided by Ensembl; however, we
> transformed some of the reference names and would like to re-sort the
> GFF. GFFs have a non-trivial sorting behavior, with the lines
> normally grouped by Parent feature and then sorted on coordinate
> within that group.  The tool gffread is the closest I have found to a
> utility that can handle this; however, it omits lines lacking
> "Parent=", meaning some of the non-gene features from the Ensembl GFF
> are dropped.  Do you have any tools you use internally to work with
> your GFF files?
> 
> Thanks,
> Ben Bimber




More information about the Dev mailing list