[ensembl-dev] Restricting species considered for ortholog requests

Matthieu Muffato muffato at ebi.ac.uk
Tue Feb 14 10:15:26 GMT 2012


Hi Jason

You can filter out out-paralogues by combining homology and taxonomy information.
Here is a sample script to do so.

# The adaptors
my $hom_a = $reg->get_adaptor('Multi', 'compara', 'Homology');
my $ncbi_a = $reg->get_adaptor('Multi', 'compara', 'NCBITaxon');

# Assuming your have a human gene, and the mouse defines the boundary between in-paralogues and out-paralogues
# The taxon_id of any member is in $member->taxon_id

my $spec1 = $ncbi_a->fetch_node_by_taxon_id(9606);

# You can also use the binomial name
my $spec2 = $ncbi_a->fetch_node_by_name("Mus musculus");

# The last common ancestor of $spec1 and $spec2 defines the boundary
my $lca =  $ncbi_a->fetch_first_shared_ancestor_indexed($spec1, $spec2);

[$all_hom would contain the homologies that you want to test]
foreach my $hom (@{$all_hom}) {

    # We only want within_species_paralog
    next unless $hom->description eq 'within_species_paralog';
    
    # The taxon where the homology "appeared"
    my $ancspec = $ncbi_a->fetch_node_by_name($hom->subtype);

    # Compares the homology taxon to the boundary
    next if $ancspec eq $ncbi_a->fetch_first_shared_ancestor_indexed($lca, $ancspec);

    [Here, you only have paralogies "younger" than the boundary: in-paralogues]
}

Hope this helps,
Matthieu

On 13/02/12 18:16, Javier Herrero wrote:
> Hi Jason
> 
> You are right, the API will return all the paralogues. It seems you are 
> only interested in the in-paralogues. Defining in-paralogues require you 
> to define the boundary between an in-paralogue and an out-paralogue. 
> That boundary is typically set by an additional species, mouse in your 
> case. In other words, you want all the human paralogues that are closer 
> to your query gene than the closest mouse orthologue.
> 
> Our API doesn't currently support that kind of query. I would have 
> suggested to look at the taxonomic annotation of the ancestral node 
> linking the paralogues. I haven't tested it, but it seems you have find 
> an alternative that works for you.
> 
> Kind regards
> 
> Javier
> 
> On 13/02/12 17:57, Jason Merkin wrote:
>>
>> Hi Javier
>>
>> Correct me if I am wrong but won't the paralog query give you the 
>> three sets of genes in hsap? I would like to, for instance, get hsap1 
>> without getting hsap2, hsap2', or hsap3.
>>
>> I wrote the following script to recursively get all homologies of all 
>> types except paralog that I think should get and print out all of the 
>> members of the gene family.
>>
>> $stable = shift;
>>
>> my %these_species;
>>
>> foreach (9606, 10090){
>>
>> $these_species{$_} = 1;
>>
>> }
>>
>> my %relationships;
>>
>> foreach ("ortholog_one2one", "ortholog_one2many", "ortholog_many2one",
>>
>> "ortholog_many2many", "possible_ortholog", "apparent_ortholog_one2one"){
>>
>> $relationships{$_} = 1;
>>
>> }
>>
>>
>> my $homology_adaptor = $reg->get_adaptor("Compara", "compara", 
>> "Homology");
>>
>> my $member_adaptor = $reg->get_adaptor('Multi', 'compara', 'Member');
>>
>> my %these_genes;
>>
>> my %query_used;
>>
>> single_gene($stable, \%relationships, \%these_species, 
>> $member_adaptor, $homology_adaptor, \%query_used);
>>
>> while ( my ($key, $value) = each(%these_genes) ) {
>>
>> print "$key => $value\n";
>>
>> }
>>
>>
>> sub single_gene
>>
>> {
>>
>> #($stable, %relationships, %these_species, $member_adaptor, 
>> $homology_adaptor)
>>
>> my $this_stable = @_[0];
>>
>> if ($query_used{$this_stable}){
>>
>> return;
>>
>> }
>>
>> $query_used{$this_stable} = 1;
>>
>>
>> my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", 
>> $this_stable);
>>
>> if (defined $member){
>>
>> my $all_homologies = $homology_adaptor->fetch_by_Member($member);
>>
>> foreach my $homology (@{$all_homologies}) {
>>
>> if ($relationships{$homology->description}){
>>
>> foreach my $attr (@{$homology->get_all_Member_Attribute}) {
>>
>> my ($member, $attribute) = @{$attr};
>>
>> if ($these_species{$member->taxon_id}){
>>
>> my $new_stable = $member->stable_id;
>>
>> $these_genes{$new_stable} = 1;
>>
>> single_gene($new_stable, \%relationships, \%these_species,
>>
>> $member_adaptor, $homology_adaptor, \%query_used);
>>
>>
>> }
>>
>> }
>>
>> }
>>
>> }
>>
>> }
>>
>> return;
>>
>> }
>>
>>
>> On Feb 13, 2012 6:48 AM, "Javier Herrero" <jherrero at ebi.ac.uk 
>> <mailto:jherrero at ebi.ac.uk>> wrote:
>>
>>     Hi Jason
>>
>>     You can use the HomologyAdaptor
>>     (http://www.ensembl.org/info/docs/Doxygen/compara-api/classBio_1_1EnsEMBL_1_1Compara_1_1DBSQL_1_1HomologyAdaptor.html)
>>     to get these relationships. You can try either the
>>     fetch_all_by_Member_paired_species or the
>>     fetch_all_by_Member_paired_species methods. For instance,
>>
>>     $homology_adaptor->fetch_all_by_Member_paired_species($hsap1_member,
>>     "mus_musculus", "ENSEMBL_ORTHOLOGUES");
>>
>>     will return [$mmus1_member], and
>>
>>     $homology_adaptor->fetch_all_by_Member_paired_species($hsap2_member,
>>     "mus_musculus", "ENSEMBL_ORTHOLOGUES");
>>
>>     will return [$mmus2_member, $mmus2'_member]. If you want to get
>>     the intra-species paralogues as well, you can add:
>>
>>     $homology_adaptor->fetch_all_by_Member_paired_species($hsap2_member,
>>     "homo_sapiens", "ENSEMBL_PARALOGUES");
>>
>>     I hope this helps
>>
>>     Javier
>>
>>     On 12/02/12 01:24, Jason Merkin wrote:
>>>     Hello. I am trying to identify duplications that have occured
>>>     within a group of species. I have gone through the tutorial and
>>>     the mailing list archives and couldn't find anything on it. I
>>>     will use the example on the webpage that explains the homology
>>>     definitions
>>>     (http://ensembl.genomics.org.cn:8058/info/docs/compara/homology_method.html)
>>>     to illustrate what I am trying to do. Using just human and mouse,
>>>     as on the diagram, I would like to query with Hsap1 and get the
>>>     set of (Hsap1, Mmus1); query with Hsap2 and get (Hsap2, Hsap2',
>>>     Mmus2, Mmus2'); and query with Hsap3 and get (Hsap3, Mmus3,
>>>     Mmus3'). Is there a way to specify the homology type and, more
>>>     importantly, restrict the species to be considered for definining
>>>     the homology?
>>>
>>>     Thanks for any help,
>>>     Jason Merkin
>>>




More information about the Dev mailing list