[ensembl-dev] Sorting GFF?

Emily Perry emily at ebi.ac.uk
Wed Apr 19 13:40:44 BST 2017


Hi Ben

We're happy to answer tangential questions here on dev, as are many of 
our developer community. However, we may not always be the best people 
to ask for your bioinformatics questions. If you're not aware of it 
already, a really helpful place to ask general questions about 
bioinformatics tools is BioStars:

https://www.biostars.org/

As I said, we're happy to help where we can, but we might not always be 
able to. You'll find a wealth of different bioinformatic knowledge and 
experience on BioStars.

All the best

Emily


On 19/04/2017 10:13, James Allen wrote:
> Hello,
> I use a program in the GenomeTools suite, "gt gff3" (http://genometools.org/tools/gt_gff3.html). In addition to the -sort parameter, you will probably want -retainids (otherwise the Ensembl IDs in the 'ID' attribute will be replaced with arbitrary numbers). I usually also run the program with the -tidy option, which ensures a few fiddly niceties about the GFF3 spec are observed.
>
> Finally, there is also a good GFF3 validator in the GenomeTools suite, "gff3validator" (doc: http://genometools.org/tools/gt_gff3validator.html; online version: http://genometools.org/cgi-bin/gff3validator.cgi).
>
> Cheers,
> James
>
>
> On Tue, 18 Apr 2017 13:49:28 -0700
> Ben Bimber <bbimber at gmail.com> wrote:
>
>> Hello,
>>
>> This is tangential to Ensembl; however, I hope the Ensembl dev might
>> have some advice.  We're using a GFF provided by Ensembl; however, we
>> transformed some of the reference names and would like to re-sort the
>> GFF. GFFs have a non-trivial sorting behavior, with the lines
>> normally grouped by Parent feature and then sorted on coordinate
>> within that group.  The tool gffread is the closest I have found to a
>> utility that can handle this; however, it omits lines lacking
>> "Parent=", meaning some of the non-gene features from the Ensembl GFF
>> are dropped.  Do you have any tools you use internally to work with
>> your GFF files?
>>
>> Thanks,
>> Ben Bimber
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Dr Emily Perry (Pritchard)
Ensembl Outreach Project Leader

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK




More information about the Dev mailing list