[ensembl-dev] is SQL access available for Ensembl Species List and Table of assemblies?
Cook, Malcolm
MEC at stowers.org
Wed Jun 9 00:21:01 BST 2021
Thanks Marc,
I’m so glad to have this pointed out to me, as well to find a handy graphical schema embedded in the README for https://github.com/Ensembl/ensembl-metadata
Duly armed, I can approximately reproduce the content of https://ftp.ensembl.org/pub/release-104/species_EnsemblVertebrates.txt as:
mysql --host=ensembldb.ensembl.org --port=3306 --user=anonymous -A -B -e '
select
display_name, scientific_name, strain,
d.name, o.taxonomy_id,
a.assembly_accession, a.assembly_name, a.assembly_default, a.assembly_ucsc,
g.genebuild,
dr.ensembl_version, dr.ensembl_genomes_version,
gdb.type, gdb.dbname, reference
from genome_annotation ga
join genome_database gdb using (genome_database_id)
join genome g on (gdb.genome_id = g.genome_id)
join assembly a using (assembly_id)
join division d on (g.division_id=d.division_id)
join data_release dr on (g.data_release_id=dr.data_release_id)
join data_release_database drdb on (dr.data_release_id=drdb.data_release_id and d.division_id = drdb.division_id)
join organism o using (organism_id)
where
ga.type = "genebuild_method"
and gdb.type = "core"
and drdb.type != "mart"
and dr.is_current = 1
;’
I’ve been down Ensembl’s Perl API rabbit hole a few times with before, and for my current application, prefer just to sling the SQL.
Thanks again,
~Malcolm Cook
From: Dev <dev-bounces at ensembl.org> On Behalf Of mchakiachvili
Sent: Tuesday, June 8, 2021 10:49 AM
To: Ensembl developers list <dev at ensembl.org>
Subject: Re: [ensembl-dev] is SQL access available for Ensembl Species List and Table of assemblies?
ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.
Hi Malcom,
You'll need to access our metadata databases in order to retrieve these information, you can find the `ensembl_metadata_104` database on our public server:
`mysql --host=ensembldb.ensembl.org --port=3306 --user=anonymous`
Database hold a organism <-> genome (s) association per release, where you'll find your information.
To access programmatically I would invite you to use our dedicated PERL API available here https://github.com/Ensembl/ensembl-metadata
If you only need to retrieve information for a particular release, please have a look at our set of useful script to retrieve data https://github.com/Ensembl/ensembl-metadata/tree/release/104/misc_scripts
More info about our PERL software stack available here https://www.ensembl.org/info/docs/index.html
Hope that will help you to sort things out.
Marc
On Sun, 2021-06-06 at 20:47 +0000, Cook, Malcolm wrote:
Dear Ensembl Devs,
I am looking for programmatic way to access in tabular form the data behind these two pages:
[Species List](https://www.ensembl.org/info/about/species.html)
[Table of assemblies](https://www.ensembl.org/info/website/archives/assembly.html)
Preferably a SQL query on remotely accessible mysql server.
Additionally, if it were possible to include the name used by UCSC for the assembly, if available.... (perhaps this is the same as asking for NCBI's alternate name)
Thanks for any tips and suggestions.
Malcolm Cook
Database Applications Manager
Stowers Institute for Medical Research
Kansas City, MO USA I
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/
--
Marc Chakiachvili
Ensembl Production Project Leader - Genomics Technology Infrastructure
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210608/eea27dee/attachment.html>
More information about the Dev
mailing list