[ensembl-dev] is SQL access available for Ensembl Species List and Table of assemblies?

Cook, Malcolm MEC at stowers.org
Wed Jun 9 00:21:01 BST 2021

Thanks Marc,

I’m so glad to have this pointed out to me, as well to find a handy graphical schema embedded in the README for https://github.com/Ensembl/ensembl-metadata

Duly armed, I can approximately reproduce the content of https://ftp.ensembl.org/pub/release-104/species_EnsemblVertebrates.txt  as:

mysql --host=ensembldb.ensembl.org --port=3306 --user=anonymous -A -B  -e '
display_name, scientific_name, strain,
d.name, o.taxonomy_id,
a.assembly_accession, a.assembly_name, a.assembly_default, a.assembly_ucsc,
dr.ensembl_version, dr.ensembl_genomes_version,
gdb.type, gdb.dbname, reference
from genome_annotation ga
join genome_database gdb using (genome_database_id)
join genome g on (gdb.genome_id = g.genome_id)
join assembly a using (assembly_id)
join division d on (g.division_id=d.division_id)
join data_release dr on (g.data_release_id=dr.data_release_id)
join data_release_database drdb on (dr.data_release_id=drdb.data_release_id and d.division_id = drdb.division_id)
join organism o using (organism_id)
ga.type = "genebuild_method"
and gdb.type = "core"
and drdb.type != "mart"
and dr.is_current = 1

I’ve been down Ensembl’s Perl API rabbit hole a few times with before, and for my current application, prefer just to sling the SQL.

Thanks again,

~Malcolm Cook

From: Dev <dev-bounces at ensembl.org> On Behalf Of mchakiachvili
Sent: Tuesday, June 8, 2021 10:49 AM
To: Ensembl developers list <dev at ensembl.org>
Subject: Re: [ensembl-dev] is SQL access available for Ensembl Species List and Table of assemblies?

ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.

Hi Malcom,

You'll need to access our metadata databases in order to retrieve these information, you can find the `ensembl_metadata_104` database on our public server:
`mysql --host=ensembldb.ensembl.org --port=3306 --user=anonymous`

Database hold a organism <-> genome (s) association per release, where you'll find your information.

To access programmatically I would invite you to use our dedicated  PERL API available here https://github.com/Ensembl/ensembl-metadata
If you only need to retrieve information for a particular release, please have a look at our set of useful script to retrieve data https://github.com/Ensembl/ensembl-metadata/tree/release/104/misc_scripts

More info about our PERL software stack available here https://www.ensembl.org/info/docs/index.html

Hope that will help you to sort things out.


On Sun, 2021-06-06 at 20:47 +0000, Cook, Malcolm wrote:
Dear Ensembl Devs,

I am looking for programmatic way to access in tabular form the data behind these two pages:

        [Species List](https://www.ensembl.org/info/about/species.html)
        [Table of assemblies](https://www.ensembl.org/info/website/archives/assembly.html)

Preferably a SQL query on remotely accessible mysql server.

Additionally, if it were possible to include the name used by UCSC for the assembly, if available.... (perhaps this is the same as asking for NCBI's alternate name)

Thanks for any tips and suggestions.

Malcolm Cook
Database Applications Manager
Stowers Institute for Medical Research
Kansas City, MO  USA    I

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/

Marc Chakiachvili

Ensembl Production Project Leader - Genomics Technology Infrastructure

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Cambridge CB10 1SD
United Kingdom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210608/eea27dee/attachment-0001.html>

More information about the Dev mailing list