[ensembl-dev] Restricting species considered for ortholog requests

Javier Herrero jherrero at ebi.ac.uk
Mon Feb 13 18:16:16 GMT 2012


Hi Jason

You are right, the API will return all the paralogues. It seems you are 
only interested in the in-paralogues. Defining in-paralogues require you 
to define the boundary between an in-paralogue and an out-paralogue. 
That boundary is typically set by an additional species, mouse in your 
case. In other words, you want all the human paralogues that are closer 
to your query gene than the closest mouse orthologue.

Our API doesn't currently support that kind of query. I would have 
suggested to look at the taxonomic annotation of the ancestral node 
linking the paralogues. I haven't tested it, but it seems you have find 
an alternative that works for you.

Kind regards

Javier

On 13/02/12 17:57, Jason Merkin wrote:
>
> Hi Javier
>
> Correct me if I am wrong but won't the paralog query give you the 
> three sets of genes in hsap? I would like to, for instance, get hsap1 
> without getting hsap2, hsap2', or hsap3.
>
> I wrote the following script to recursively get all homologies of all 
> types except paralog that I think should get and print out all of the 
> members of the gene family.
>
> $stable = shift;
>
> my %these_species;
>
> foreach (9606, 10090){
>
>         $these_species{$_} = 1;
>
> }
>
> my %relationships;
>
> foreach ("ortholog_one2one", "ortholog_one2many", "ortholog_many2one",
>
>         "ortholog_many2many", "possible_ortholog", 
> "apparent_ortholog_one2one"){
>
>         $relationships{$_} = 1;
>
> }
>
>
> my $homology_adaptor = $reg->get_adaptor("Compara", "compara", 
> "Homology");
>
> my $member_adaptor = $reg->get_adaptor('Multi', 'compara', 'Member');
>
> my %these_genes;
>
> my %query_used;
>
> single_gene($stable, \%relationships, \%these_species, 
> $member_adaptor, $homology_adaptor, \%query_used);
>
> while ( my ($key, $value) = each(%these_genes) ) {
>
>                 print "$key => $value\n";
>
>                             }
>
>
> sub single_gene
>
> {
>
>         #($stable, %relationships, %these_species, $member_adaptor, 
> $homology_adaptor)
>
>         my $this_stable = @_[0];
>
>         if ($query_used{$this_stable}){
>
>                 return;
>
>         }
>
>         $query_used{$this_stable} = 1;
>
>
>         my $member = 
> $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $this_stable);
>
>         if (defined $member){
>
>                 my $all_homologies = 
> $homology_adaptor->fetch_by_Member($member);
>
>                 foreach my $homology (@{$all_homologies}) {
>
>                         if ($relationships{$homology->description}){
>
>                                 foreach my $attr 
> (@{$homology->get_all_Member_Attribute}) {
>
>                                         my ($member, $attribute) = 
> @{$attr};
>
>                                         if 
> ($these_species{$member->taxon_id}){
>
>                                                 my $new_stable = 
> $member->stable_id;
>
>                                                 
> $these_genes{$new_stable} = 1;
>
>                                                 
> single_gene($new_stable, \%relationships, \%these_species,
>
>                                                         
> $member_adaptor, $homology_adaptor, \%query_used);
>
>
>                                         }
>
>                                 }
>
>                         }
>
>                 }
>
>         }
>
>         return;
>
> }
>
>
> On Feb 13, 2012 6:48 AM, "Javier Herrero" <jherrero at ebi.ac.uk 
> <mailto:jherrero at ebi.ac.uk>> wrote:
>
>     Hi Jason
>
>     You can use the HomologyAdaptor
>     (http://www.ensembl.org/info/docs/Doxygen/compara-api/classBio_1_1EnsEMBL_1_1Compara_1_1DBSQL_1_1HomologyAdaptor.html)
>     to get these relationships. You can try either the
>     fetch_all_by_Member_paired_species or the
>     fetch_all_by_Member_paired_species methods. For instance,
>
>     $homology_adaptor->fetch_all_by_Member_paired_species($hsap1_member,
>     "mus_musculus", "ENSEMBL_ORTHOLOGUES");
>
>     will return [$mmus1_member], and
>
>     $homology_adaptor->fetch_all_by_Member_paired_species($hsap2_member,
>     "mus_musculus", "ENSEMBL_ORTHOLOGUES");
>
>     will return [$mmus2_member, $mmus2'_member]. If you want to get
>     the intra-species paralogues as well, you can add:
>
>     $homology_adaptor->fetch_all_by_Member_paired_species($hsap2_member,
>     "homo_sapiens", "ENSEMBL_PARALOGUES");
>
>     I hope this helps
>
>     Javier
>
>     On 12/02/12 01:24, Jason Merkin wrote:
>>     Hello. I am trying to identify duplications that have occured
>>     within a group of species. I have gone through the tutorial and
>>     the mailing list archives and couldn't find anything on it. I
>>     will use the example on the webpage that explains the homology
>>     definitions
>>     (http://ensembl.genomics.org.cn:8058/info/docs/compara/homology_method.html)
>>     to illustrate what I am trying to do. Using just human and mouse,
>>     as on the diagram, I would like to query with Hsap1 and get the
>>     set of (Hsap1, Mmus1); query with Hsap2 and get (Hsap2, Hsap2',
>>     Mmus2, Mmus2'); and query with Hsap3 and get (Hsap3, Mmus3,
>>     Mmus3'). Is there a way to specify the homology type and, more
>>     importantly, restrict the species to be considered for definining
>>     the homology?
>>
>>     Thanks for any help,
>>     Jason Merkin
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     List admin (including subscribe/unsubscribe):http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>     -- 
>     Javier Herrero, PhD
>     Ensembl Coordinator and Ensembl Compara Project Leader
>     European Bioinformatics Institute (EMBL-EBI)
>     Wellcome Trust Genome Campus, Hinxton
>     Cambridge - CB10 1SD - UK
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     List admin (including subscribe/unsubscribe):
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120213/899341fa/attachment.html>


More information about the Dev mailing list