[ensembl-dev] alternatives

Emily Perry emily at ebi.ac.uk
Tue Feb 11 10:04:27 GMT 2020


Hi Yossi

There are no flat-files that contain the entire MySQL database. GFF3 files contain all the transcript, exon, UTR and CDS locations, linking them all together into genes, and the FASTA files contain all the sequences, but there is a lot of stuff in the core database, like GO terms, external references, protein domains, repeats, mapping between versions etc, which are not in these files. If all you need is the gene locations and features, then GFF3 and FASTA is perfectly sufficient.

All the best

Emily

> On 11 Feb 2020, at 05:19, Joseph Steinberger <joseph.steinberger at weizmann.ac.il> wrote:
> 
> Dear Community,
> 
> I would like to experiment with using flat files instead of the MySQL database. 
> 
> Which combination of species flatfiles is the equivalent of the species mysql core database?
> Do the species gff3 and fasta files exactly correspond to the species mysql core database  -  
> meaning, is all the data that is contained in the mysql core database, also contained in the combination of the gff3 and fasta files?
> 
> Sincerely,
> Yossi
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
--
Dr Emily Perry (Pritchard)
Ensembl Outreach Project Leader 
(she/her)

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory 
Wellcome Genome Campus
Hinxton
Cambridge
CB10 1SD
UK 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200211/b419151d/attachment.html>


More information about the Dev mailing list