[ensembl-dev] Loading Gene Info from GFF3 File

Kieron Taylor ktaylor at ebi.ac.uk
Wed Aug 31 13:51:22 BST 2016


Hi Said,

GFF3 is a flexible format (in a bad way), which makes import scripts not very portable. I expect any internal scripts we have for this task will not work for you.

It wouldn't be too hard to assemble one from API calls. Our API provides store methods on all feature types, and we have a simple but functional Bio::EnsEMBL::Utils::IO::GFFParser you could use to be going on with. We are working to gradually refactor all of our parsers, but it is far from complete. You may start seeing deprecated messages next year, at which point something better will be available.

http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Utils_1_1IO_1_1GFFParser.html
http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1GeneAdaptor.html#a6441dcbb164d48ea724c1458f729fb10

Perhaps also consider: http://search.cpan.org/dist/BioPerl/Bio/DB/SeqFeature/Store/GFF3Loader.pm



Regards,

Kieron

Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute






> On 24 Aug 2016, at 15:25, Aktas, Said <said.aktas at roche.com> wrote:
> 
> Hello,
> 
> We have gene information in GFF3 format and would like to load it into the Ensembl core database. Is there already a script available for this?
> 
> Best regards,
> Said
> 
> 
> -- 
> Said Aktas
> Data Scientist, pRED Informatics
> Roche Pharma Research and Early Development
> +41 44 755 79 23
> 
> Roche Innovation Center Zurich
> Roche Glycart AG
> Wagistrasse 18
> 8952 Schlieren
> Switzerland
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list