[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.
Thomas Derrien
thomas.derrien at univ-rennes1.fr
Wed Dec 4 15:24:31 GMT 2013
Dear EnsEMBLers,
We have installed a local version of the Compara database (version 73) and one of our goal is to rapidly extract ortholog informations (stable_id, chr_name, chr_start, chr_end) between 2 species using mysql query. (I believe this topic has already been discussed but only via the API)
I used this example query to get ortholog stable_ids information between human and mouse:
mysql> SELECT m1.stable_id, m2.stable_id
FROM genome_db gdb1
LEFT JOIN member m1 USING (genome_db_id)
LEFT JOIN homology_member hm1 USING (member_id)
LEFT JOIN homology using (homology_id)
LEFT JOIN homology_member hm2 using (homology_id)
LEFT JOIN member m2 on (hm2.member_id = m2.member_id)
LEFT JOIN genome_db gdb2 on (m2.genome_db_id = gdb2.genome_db_id)
WHERE gdb1.name = 'homo_sapiens' and gdb2.name = 'mus_musculus' LIMIT 1;
+-----------------+--------------------+
| stable_id | stable_id |
+-----------------+--------------------+
| ENSG00000198888 | ENSMUSG00000064341 |
+-----------------+--------------------+
1 row in set (24.79 sec)
As you can see, it takes quite a bit of time using this query (which is also the case when I connect to ensembldb.ensembl.org).
Do you know if there is a clever (thus faster) way to extract our desired output? Do we need to go through the method_link_species_set table?
Thanks in advance for your help!
All the best,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131204/9d7e842c/attachment.html>
More information about the Dev
mailing list