[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.

Wed Dec 4 17:55:30 GMT 2013

Dear Miguell,

Thank you very much for your answer!
The second query is by far quicker than the first one (and also compared to what I did before) although I don’t have exactly the same length of time than you i.e
25166 rows in set (3 min 49.53 sec)

But, I willl definitely use the way you extract ortholog information. Just by curiosity, did you index some specific fields that could explain the query time difference?

Thanks again!

Thomas

Le 4 déc. 2013 à 17:47, Miguel Pignatelli <mp at ebi.ac.uk> a écrit :

> Hi Thomas,
> 
> You can use the method_link_species_set table to get the orthologues you want:
> 
> SELECT homology.description, GROUP_CONCAT(stable_id) FROM homology JOIN homology_member USING(homology_id) JOIN member USING(member_id) JOIN method_link_species_set USING(method_link_species_set_id) WHERE name = 'H.sap-M.mus orthologues' GROUP BY homology_id;
> 
> The query will be faster if you get include the method_link_species_set_id directly:
> 
> SELECT homology.description, GROUP_CONCAT(stable_id) FROM homology JOIN homology_member USING(homology_id) JOIN member USING(member_id) WHERE method_link_species_set_id = 29697 GROUP BY homology_id;
> (25271 rows in 0.42 sec)
> 
> Hope this helps,
> 
> Cheers,
> 
> M;
> 
> 
> On 04/12/13 15:24, Thomas Derrien wrote:
>> Dear EnsEMBLers,
>> 
>> We have installed a local version of the Compara database (version 73)
>> and one of our goal is to rapidly extract ortholog informations
>> (stable_id, chr_name, chr_start, chr_end) between 2  species using mysql
>> query. (I believe this topic has already been discussed but only via the
>> API)
>> 
>> I used this example query to get ortholog stable_ids information between
>> human and mouse:
>> 
>> mysql> SELECT m1.stable_id, m2.stable_id
>>      FROM genome_db gdb1
>>      LEFT JOIN member m1 USING (genome_db_id)
>>      LEFT JOIN homology_member hm1 USING (member_id)
>>      LEFT JOIN homology using (homology_id)
>>      LEFT JOIN homology_member hm2 using (homology_id)
>>      LEFT JOIN member m2 on (hm2.member_id = m2.member_id)
>>      LEFT JOIN genome_db gdb2 on (m2.genome_db_id = gdb2.genome_db_id)
>>      WHERE gdb1.name = 'homo_sapiens' and gdb2.name = 'mus_musculus'
>> LIMIT 1;
>> 
>> +-----------------+--------------------+
>> | stable_id       | stable_id          |
>> +-----------------+--------------------+
>> | ENSG00000198888 | ENSMUSG00000064341 |
>> +-----------------+--------------------+
>> 1 row in set *(24.79 sec*)
>> 
>> 
>> As you can see, it takes quite a bit of time using this query (which is
>> also the case when I connect to ensembldb.ensembl.org
>> <http://ensembldb.ensembl.org>).
>> Do you know if there is a clever (thus faster) way to extract our
>> desired output? Do we need to go through the method_link_species_set table?
>> 
>> Thanks in advance for your help!
>> 
>> All the best,
>> 
>> Thomas
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> -- 
> 
> Miguel Pignatelli, PhD
> 
> Ensembl Developer - Comparative Genomics
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK
> Room A3-33
> Phone + 44 (0) 1223 494 598
> Fax   + 44 (0) 1223 494 468

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131204/ff34bf8f/attachment.html>