[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.

Thomas Derrien thomas.derrien at univ-rennes1.fr
Wed Dec 4 15:24:31 GMT 2013


Dear EnsEMBLers,

We have installed a local version of the Compara database (version 73) and one of our goal is to rapidly extract ortholog informations (stable_id, chr_name, chr_start, chr_end) between 2  species using mysql query. (I believe this topic has already been discussed but only via the API)

I used this example query to get ortholog stable_ids information between human and mouse:

mysql> SELECT m1.stable_id, m2.stable_id
     FROM genome_db gdb1
     LEFT JOIN member m1 USING (genome_db_id)
     LEFT JOIN homology_member hm1 USING (member_id)
     LEFT JOIN homology using (homology_id)
     LEFT JOIN homology_member hm2 using (homology_id)
     LEFT JOIN member m2 on (hm2.member_id = m2.member_id)
     LEFT JOIN genome_db gdb2 on (m2.genome_db_id = gdb2.genome_db_id)
     WHERE gdb1.name = 'homo_sapiens' and gdb2.name = 'mus_musculus' LIMIT 1;

+-----------------+--------------------+
| stable_id       | stable_id          |
+-----------------+--------------------+
| ENSG00000198888 | ENSMUSG00000064341 | 
+-----------------+--------------------+
1 row in set (24.79 sec)


As you can see, it takes quite a bit of time using this query (which is also the case when I connect to ensembldb.ensembl.org).
Do you know if there is a clever (thus faster) way to extract our desired output? Do we need to go through the method_link_species_set table?

Thanks in advance for your help!

All the best,

Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131204/9d7e842c/attachment.html>


More information about the Dev mailing list