[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.

Miguel Pignatelli mp at ebi.ac.uk
Thu Dec 5 13:26:03 GMT 2013


Hi Thomas,

I'm not using nothing special here. Just the data (and indexes) we provide.
It is possible that you are hitting a higher network latency or our 
public servers are/were a bit busy at that moment.

Could you please try it again (make sure you use the correct 
method_link_species_set_id, since you are 'hardcoding' it) and let us 
know how it goes? (I would say that the query should be definitely 
faster than >3mins).

Cheers,

M;


On 04/12/13 17:55, Thomas Derrien wrote:
> Dear Miguell,
>
> Thank you very much for your answer!
> The second query is by far quicker than the first one (and also compared
> to what I did before) although I don’t have exactly the same length of
> time than you i.e
> 25166 rows in set (3 min 49.53 sec)
>
> But, I willl definitely use the way you extract ortholog information.
> Just by curiosity, did you index some specific fields that could explain
> the query time difference?
>
> Thanks again!
>
> Thomas
>
>
>
> Le 4 déc. 2013 à 17:47, Miguel Pignatelli <mp at ebi.ac.uk
> <mailto:mp at ebi.ac.uk>> a écrit :
>
>> Hi Thomas,
>>
>> You can use the method_link_species_set table to get the orthologues
>> you want:
>>
>> SELECT homology.description, GROUP_CONCAT(stable_id) FROM homology
>> JOIN homology_member USING(homology_id) JOIN member USING(member_id)
>> JOIN method_link_species_set USING(method_link_species_set_id) WHERE
>> name = 'H.sap-M.mus orthologues' GROUP BY homology_id;
>>
>> The query will be faster if you get include the
>> method_link_species_set_id directly:
>>
>> SELECT homology.description, GROUP_CONCAT(stable_id) FROM homology
>> JOIN homology_member USING(homology_id) JOIN member USING(member_id)
>> WHERE method_link_species_set_id = 29697 GROUP BY homology_id;
>> (25271 rows in 0.42 sec)
>>
>> Hope this helps,
>>
>> Cheers,
>>
>> M;
>>
>>
>> On 04/12/13 15:24, Thomas Derrien wrote:
>>> Dear EnsEMBLers,
>>>
>>> We have installed a local version of the Compara database (version 73)
>>> and one of our goal is to rapidly extract ortholog informations
>>> (stable_id, chr_name, chr_start, chr_end) between 2  species using mysql
>>> query. (I believe this topic has already been discussed but only via the
>>> API)
>>>
>>> I used this example query to get ortholog stable_ids information between
>>> human and mouse:
>>>
>>> mysql> SELECT m1.stable_id, m2.stable_id
>>>      FROM genome_db gdb1
>>>      LEFT JOIN member m1 USING (genome_db_id)
>>>      LEFT JOIN homology_member hm1 USING (member_id)
>>>      LEFT JOIN homology using (homology_id)
>>>      LEFT JOIN homology_member hm2 using (homology_id)
>>>      LEFT JOIN member m2 on (hm2.member_id = m2.member_id)
>>>      LEFT JOIN genome_db gdb2 on (m2.genome_db_id = gdb2.genome_db_id)
>>>      WHERE gdb1.name = 'homo_sapiens' and gdb2.name = 'mus_musculus'
>>> LIMIT 1;
>>>
>>> +-----------------+--------------------+
>>> | stable_id       | stable_id          |
>>> +-----------------+--------------------+
>>> | ENSG00000198888 | ENSMUSG00000064341 |
>>> +-----------------+--------------------+
>>> 1 row in set *(24.79 sec*)
>>>
>>>
>>> As you can see, it takes quite a bit of time using this query (which is
>>> also the case when I connect to ensembldb.ensembl.org
>>> <http://ensembldb.ensembl.org>
>>> <http://ensembldb.ensembl.org <http://ensembldb.ensembl.org/>>).
>>> Do you know if there is a clever (thus faster) way to extract our
>>> desired output? Do we need to go through the method_link_species_set
>>> table?
>>>
>>> Thanks in advance for your help!
>>>
>>> All the best,
>>>
>>> Thomas
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe
>>> info:http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog:http://www.ensembl.info/
>>>
>>
>> --
>>
>> Miguel Pignatelli, PhD
>>
>> Ensembl Developer - Comparative Genomics
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge - CB10 1SD - UK
>> Room A3-33
>> Phone + 44 (0) 1223 494 598
>> Fax   + 44 (0) 1223 494 468
>

-- 

Miguel Pignatelli, PhD

Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
Room A3-33
Phone + 44 (0) 1223 494 598
Fax   + 44 (0) 1223 494 468




More information about the Dev mailing list