[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.

Miguel Pignatelli mp at ebi.ac.uk
Wed Dec 4 16:47:06 GMT 2013


Hi Thomas,

You can use the method_link_species_set table to get the orthologues you 
want:

SELECT homology.description, GROUP_CONCAT(stable_id) FROM homology JOIN 
homology_member USING(homology_id) JOIN member USING(member_id) JOIN 
method_link_species_set USING(method_link_species_set_id) WHERE name = 
'H.sap-M.mus orthologues' GROUP BY homology_id;

The query will be faster if you get include the 
method_link_species_set_id directly:

SELECT homology.description, GROUP_CONCAT(stable_id) FROM homology JOIN 
homology_member USING(homology_id) JOIN member USING(member_id) WHERE 
method_link_species_set_id = 29697 GROUP BY homology_id;
(25271 rows in 0.42 sec)

Hope this helps,

Cheers,

M;


On 04/12/13 15:24, Thomas Derrien wrote:
> Dear EnsEMBLers,
>
> We have installed a local version of the Compara database (version 73)
> and one of our goal is to rapidly extract ortholog informations
> (stable_id, chr_name, chr_start, chr_end) between 2  species using mysql
> query. (I believe this topic has already been discussed but only via the
> API)
>
> I used this example query to get ortholog stable_ids information between
> human and mouse:
>
> mysql> SELECT m1.stable_id, m2.stable_id
>       FROM genome_db gdb1
>       LEFT JOIN member m1 USING (genome_db_id)
>       LEFT JOIN homology_member hm1 USING (member_id)
>       LEFT JOIN homology using (homology_id)
>       LEFT JOIN homology_member hm2 using (homology_id)
>       LEFT JOIN member m2 on (hm2.member_id = m2.member_id)
>       LEFT JOIN genome_db gdb2 on (m2.genome_db_id = gdb2.genome_db_id)
>       WHERE gdb1.name = 'homo_sapiens' and gdb2.name = 'mus_musculus'
> LIMIT 1;
>
> +-----------------+--------------------+
> | stable_id       | stable_id          |
> +-----------------+--------------------+
> | ENSG00000198888 | ENSMUSG00000064341 |
> +-----------------+--------------------+
> 1 row in set *(24.79 sec*)
>
>
> As you can see, it takes quite a bit of time using this query (which is
> also the case when I connect to ensembldb.ensembl.org
> <http://ensembldb.ensembl.org>).
> Do you know if there is a clever (thus faster) way to extract our
> desired output? Do we need to go through the method_link_species_set table?
>
> Thanks in advance for your help!
>
> All the best,
>
> Thomas
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 

Miguel Pignatelli, PhD

Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
Room A3-33
Phone + 44 (0) 1223 494 598
Fax   + 44 (0) 1223 494 468




More information about the Dev mailing list