[ensembl-dev] all orthologs for species pair

Thu Oct 24 19:10:59 BST 2013

Dear Hardip,

I'm afraid there is nothing in the current API to help you. Your 
diagnostic is correct: getting the list of homologues is pretty fast, 
but getting the gene names (one by one) is very slow.

Gene trees have a preload() method that fetches all the leaves in a 
single operation. It significantly speeds up the process. I think it is 
now time to extend this mechanism to other objects in Compara.

We'll try to do this for the next release (due in December), and will 
let you (and the dev@ list) know about any progress.

Best regards,
Matthieu

On 24/10/13 00:12, Hardip Patel wrote:
> Dear All
>
> I would like to extract all orthologs for any given two species. I am
> using following code to extract required information.
>
> my $genome_db_adaptor = $reg->get_adaptor("Multi", "compara", "GenomeDB");
> my $homology_adaptor  = $reg->get_adaptor('Multi', 'compara', 'Homology');
> my $ref_genome_dbID =
> $genome_db_adaptor->fetch_by_registry_name($ref_species)->dbID();
>
> foreach my $subject_species (keys %species){
> my $subject_genome_dbID =
> $genome_db_adaptor->fetch_by_registry_name($subject_species)->dbID;
> print"Getting homologs for $ref_species (dbID=$ref_genome_dbID) and
> $subject_species (dbID=$subject_genome_dbID)\n";
> my @homologues =
> @{$homology_adaptor->fetch_all_by_genome_pair($ref_genome_dbID,$subject_genome_dbID)};
> foreach my $homology (@homologues){
> my @gene_members = @{$homology->gene_list()};
> printf "%s\t%s\t%s\n", $gene_members[0]->stable_id(),
> $gene_members[1]->stable_id(), $homology->description;
>    }
> }
>
> I am getting following results:
>
> ENSG00000199168ENSTBEG00000017892ortholog_one2many
> ENSG00000212161ENSTBEG00000020962apparent_ortholog_one2one
> ENSG00000252139ENSTBEG00000020505ortholog_one2many
> ENSG00000252139ENSTBEG00000020751ortholog_one2many
> ENSG00000252139ENSTBEG00000021130ortholog_one2many
> ENSG00000207639ENSTBEG00000017976ortholog_one2one
> ENSG00000252691ENSTBEG00000021086ortholog_one2one
>
> However this is extremely slow. I am guessing that once homologues are
> extracted for the pair (fast step), extraction of gene centric
> information one by one makes it slow in the following line of the code.
>
> foreachmy $homology (@homologues){}
>
> Could you please let me know if there is an alternate way to get the
> same information but quicker? I am mainly interested in
>
> ref_species_geneID, chromosome, start, end, strand, genename,
> subject_speciesgeneID, chromosome, start, end, strand, homologytype
>
> When I query for the same using biomart, it is very quick and I am
> hoping that I can do the same using API.
>
> Kind regards
>
> Hardip
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Matthieu Muffato, Ph.D.
Ensembl Developer and Ensembl Compara Manager
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom