[ensembl-dev] Querying E. Coli strains from Ensembl Bacteria Release 12

Harold Rodriguez harold.rodriguez at utoronto.ca
Fri Jul 27 19:02:40 BST 2012


Hi Dan,

Thanks for your reply. We can't use the API at the moment as there's already code in place to pull records out using SQL queries for various organisms. Rewriting it would be a huge task. I will try out your suggestions and will post back if I have anymore questions.

Harold
________________________________________
From: Dan Staines [dstaines at ebi.ac.uk]
Sent: Thursday, July 26, 2012 5:18 PM
To: dev at ensembl.org; Harold Rodriguez
Subject: Re: [ensembl-dev] Querying E. Coli strains from Ensembl Bacteria Release 12

Hi Harold,

I'd strongly recommend the API as this takes care of everything for you
in a multispecies database. We can help if you need any advice on this.

If you cannot use the API, you need to use the meta table to identify
the entries with the correct species_id (for K12, I recall this is
species_id 1), and from there find the coordinate system for that
species. You can do this in a join across meta to coord_region to
seq_region thence to gene e.g.
select g.* from gene g join seq_region s using (seq_region_id) join
coord_system cs using (coord_system_id) where species_id=1;

or indeed join on species_id to meta and query by meta_key/meta_value
e.g. species.production_name/e_coli_k12 e.g.
select g.* from gene g join seq_region s using (seq_region_id) join
coord_system cs using (coord_system_id) join meta using (species_id)
where meta_key='species.production_name' and meta_value='e_coli_k12';

Hope this helps!

Dan.






More information about the Dev mailing list