[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.
Dan Sheppard
ds23 at sanger.ac.uk
Thu Dec 5 19:32:44 GMT 2013
Hi Thomas,
In case you are interested, the following should be equivalent to
Miguel's first query, but much faster. The reasons are rather tedious
and to do with execution plans (let me know in the unlikely event you
are interested!).
select homology.description,group_concat(stable_id)
from homology
join (select method_link_species_set_id
from method_link_species_set
where name = 'H.sap-M.mus orthologues') mlss
on homology.method_link_species_set_id =
mlss.method_link_species_set_id
join homology_member using (homology_id)
join member using (member_id)
group by homology_id;
As a rule of thumb, if you are using aggregate functions and grouping,
it is a good idea to execute those parts of a query which massively
restrict the set of things you are interested in, but themselves only
result in very few rows, inside sub-queries to ensure they are done
early. (In this case MySQL was doing some operations on all of homology
before the MLSS restriction was applied).
Dan.
On 04/12/13 17:55, Thomas Derrien wrote:
> Dear Miguell,
>
> Thank you very much for your answer!
> The second query is by far quicker than the first one (and also compared
> to what I did before) although I don’t have exactly the same length of
> time than you i.e
> 25166 rows in set (3 min 49.53 sec)
>
> But, I willl definitely use the way you extract ortholog information.
> Just by curiosity, did you index some specific fields that could explain
> the query time difference?
>
> Thanks again!
>
> Thomas
>
More information about the Dev
mailing list