[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.

Dan Sheppard ds23 at sanger.ac.uk
Thu Dec 5 19:32:44 GMT 2013


Hi Thomas,

In case you are interested, the following should be equivalent to 
Miguel's first query, but much faster. The reasons are rather tedious 
and to do with execution plans (let me know in the unlikely event you 
are interested!).

select homology.description,group_concat(stable_id)
   from homology
   join (select method_link_species_set_id
           from method_link_species_set
           where name = 'H.sap-M.mus orthologues') mlss
     on homology.method_link_species_set_id =
          mlss.method_link_species_set_id
   join homology_member using (homology_id)
   join member using (member_id)
  group by homology_id;

As a rule of thumb, if you are using aggregate functions and grouping, 
it is a good idea to execute those parts of a query which massively 
restrict the set of things you are interested in, but themselves only 
result in very few rows, inside sub-queries to ensure they are done 
early. (In this case MySQL was doing some operations on all of homology 
before the MLSS restriction was applied).

Dan.

On 04/12/13 17:55, Thomas Derrien wrote:
> Dear Miguell,
>
> Thank you very much for your answer!
> The second query is by far quicker than the first one (and also compared
> to what I did before) although I don’t have exactly the same length of
> time than you i.e
> 25166 rows in set (3 min 49.53 sec)
>
> But, I willl definitely use the way you extract ortholog information.
> Just by curiosity, did you index some specific fields that could explain
> the query time difference?
>
> Thanks again!
>
> Thomas
>




More information about the Dev mailing list