[ensembl-dev] Fastest mySQL query to retrieve all orthologs between 2 species.

Olson, Andrew olson at cshl.edu
Wed Dec 4 16:26:36 GMT 2013


Hi Thomas,
This sounds like a job for biomart (www.ensembl.org/biomart/martview<http://www.ensembl.org/biomart/martview>). It is possible to get the human mouse orthologs in a few clicks, but if you really want database access it is possible, but I would suggest using programmatic methods (biomart can generate perl code for you). After a little digging around in the ensembl_mart_73 database on martdb.ensembl.org<http://martdb.ensembl.org> (port 5316), I was able to reconstruct your query. I'm not familiar with the mart database schema and it looks like the column names might not be stable from release to release, so once again, it is probably safer to use the biomart web site or API.

Andrew

select stable_id_4016_r2,stable_id_4016_r1 from mmusculus_gene_ensembl__homolog_hsap__dm where description_4014='ortholog_one2one' limit 10;
+-------------------+--------------------+
| stable_id_4016_r2 | stable_id_4016_r1  |
+-------------------+--------------------+
| ENSG00000163584   | ENSMUSP00000043111 |
| ENSG00000182197   | ENSMUSP00000076505 |
| ENSG00000177427   | ENSMUSP00000018743 |
| ENSG00000128268   | ENSMUSP00000043077 |
| ENSG00000100075   | ENSMUSP00000003622 |
| ENSG00000068878   | ENSMUSP00000045460 |
| ENSG00000157693   | ENSMUSP00000076891 |
| ENSG00000198056   | ENSMUSP00000026461 |
| ENSG00000167378   | ENSMUSP00000036699 |
| ENSG00000141526   | ENSMUSP00000125846 |
+-------------------+--------------------+
10 rows in set (0.07 sec)


On Dec 4, 2013, at 10:24 AM, Thomas Derrien <thomas.derrien at univ-rennes1.fr<mailto:thomas.derrien at univ-rennes1.fr>> wrote:

Dear EnsEMBLers,

We have installed a local version of the Compara database (version 73) and one of our goal is to rapidly extract ortholog informations (stable_id, chr_name, chr_start, chr_end) between 2  species using mysql query. (I believe this topic has already been discussed but only via the API)

I used this example query to get ortholog stable_ids information between human and mouse:

mysql> SELECT m1.stable_id, m2.stable_id
     FROM genome_db gdb1
     LEFT JOIN member m1 USING (genome_db_id)
     LEFT JOIN homology_member hm1 USING (member_id)
     LEFT JOIN homology using (homology_id)
     LEFT JOIN homology_member hm2 using (homology_id)
     LEFT JOIN member m2 on (hm2.member_id = m2.member_id)
     LEFT JOIN genome_db gdb2 on (m2.genome_db_id = gdb2.genome_db_id)
     WHERE gdb1.name = 'homo_sapiens' and gdb2.name = 'mus_musculus' LIMIT 1;

+-----------------+--------------------+
| stable_id       | stable_id          |
+-----------------+--------------------+
| ENSG00000198888 | ENSMUSG00000064341 |
+-----------------+--------------------+
1 row in set (24.79 sec)


As you can see, it takes quite a bit of time using this query (which is also the case when I connect to ensembldb.ensembl.org<http://ensembldb.ensembl.org/>).
Do you know if there is a clever (thus faster) way to extract our desired output? Do we need to go through the method_link_species_set table?

Thanks in advance for your help!

All the best,

Thomas
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list