[ensembl-dev] Restricting species considered for ortholog requests
Matthieu Muffato
muffato at ebi.ac.uk
Tue Feb 14 10:15:26 GMT 2012
Hi Jason
You can filter out out-paralogues by combining homology and taxonomy information.
Here is a sample script to do so.
# The adaptors
my $hom_a = $reg->get_adaptor('Multi', 'compara', 'Homology');
my $ncbi_a = $reg->get_adaptor('Multi', 'compara', 'NCBITaxon');
# Assuming your have a human gene, and the mouse defines the boundary between in-paralogues and out-paralogues
# The taxon_id of any member is in $member->taxon_id
my $spec1 = $ncbi_a->fetch_node_by_taxon_id(9606);
# You can also use the binomial name
my $spec2 = $ncbi_a->fetch_node_by_name("Mus musculus");
# The last common ancestor of $spec1 and $spec2 defines the boundary
my $lca = $ncbi_a->fetch_first_shared_ancestor_indexed($spec1, $spec2);
[$all_hom would contain the homologies that you want to test]
foreach my $hom (@{$all_hom}) {
# We only want within_species_paralog
next unless $hom->description eq 'within_species_paralog';
# The taxon where the homology "appeared"
my $ancspec = $ncbi_a->fetch_node_by_name($hom->subtype);
# Compares the homology taxon to the boundary
next if $ancspec eq $ncbi_a->fetch_first_shared_ancestor_indexed($lca, $ancspec);
[Here, you only have paralogies "younger" than the boundary: in-paralogues]
}
Hope this helps,
Matthieu
On 13/02/12 18:16, Javier Herrero wrote:
> Hi Jason
>
> You are right, the API will return all the paralogues. It seems you are
> only interested in the in-paralogues. Defining in-paralogues require you
> to define the boundary between an in-paralogue and an out-paralogue.
> That boundary is typically set by an additional species, mouse in your
> case. In other words, you want all the human paralogues that are closer
> to your query gene than the closest mouse orthologue.
>
> Our API doesn't currently support that kind of query. I would have
> suggested to look at the taxonomic annotation of the ancestral node
> linking the paralogues. I haven't tested it, but it seems you have find
> an alternative that works for you.
>
> Kind regards
>
> Javier
>
> On 13/02/12 17:57, Jason Merkin wrote:
>>
>> Hi Javier
>>
>> Correct me if I am wrong but won't the paralog query give you the
>> three sets of genes in hsap? I would like to, for instance, get hsap1
>> without getting hsap2, hsap2', or hsap3.
>>
>> I wrote the following script to recursively get all homologies of all
>> types except paralog that I think should get and print out all of the
>> members of the gene family.
>>
>> $stable = shift;
>>
>> my %these_species;
>>
>> foreach (9606, 10090){
>>
>> $these_species{$_} = 1;
>>
>> }
>>
>> my %relationships;
>>
>> foreach ("ortholog_one2one", "ortholog_one2many", "ortholog_many2one",
>>
>> "ortholog_many2many", "possible_ortholog", "apparent_ortholog_one2one"){
>>
>> $relationships{$_} = 1;
>>
>> }
>>
>>
>> my $homology_adaptor = $reg->get_adaptor("Compara", "compara",
>> "Homology");
>>
>> my $member_adaptor = $reg->get_adaptor('Multi', 'compara', 'Member');
>>
>> my %these_genes;
>>
>> my %query_used;
>>
>> single_gene($stable, \%relationships, \%these_species,
>> $member_adaptor, $homology_adaptor, \%query_used);
>>
>> while ( my ($key, $value) = each(%these_genes) ) {
>>
>> print "$key => $value\n";
>>
>> }
>>
>>
>> sub single_gene
>>
>> {
>>
>> #($stable, %relationships, %these_species, $member_adaptor,
>> $homology_adaptor)
>>
>> my $this_stable = @_[0];
>>
>> if ($query_used{$this_stable}){
>>
>> return;
>>
>> }
>>
>> $query_used{$this_stable} = 1;
>>
>>
>> my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE",
>> $this_stable);
>>
>> if (defined $member){
>>
>> my $all_homologies = $homology_adaptor->fetch_by_Member($member);
>>
>> foreach my $homology (@{$all_homologies}) {
>>
>> if ($relationships{$homology->description}){
>>
>> foreach my $attr (@{$homology->get_all_Member_Attribute}) {
>>
>> my ($member, $attribute) = @{$attr};
>>
>> if ($these_species{$member->taxon_id}){
>>
>> my $new_stable = $member->stable_id;
>>
>> $these_genes{$new_stable} = 1;
>>
>> single_gene($new_stable, \%relationships, \%these_species,
>>
>> $member_adaptor, $homology_adaptor, \%query_used);
>>
>>
>> }
>>
>> }
>>
>> }
>>
>> }
>>
>> }
>>
>> return;
>>
>> }
>>
>>
>> On Feb 13, 2012 6:48 AM, "Javier Herrero" <jherrero at ebi.ac.uk
>> <mailto:jherrero at ebi.ac.uk>> wrote:
>>
>> Hi Jason
>>
>> You can use the HomologyAdaptor
>> (http://www.ensembl.org/info/docs/Doxygen/compara-api/classBio_1_1EnsEMBL_1_1Compara_1_1DBSQL_1_1HomologyAdaptor.html)
>> to get these relationships. You can try either the
>> fetch_all_by_Member_paired_species or the
>> fetch_all_by_Member_paired_species methods. For instance,
>>
>> $homology_adaptor->fetch_all_by_Member_paired_species($hsap1_member,
>> "mus_musculus", "ENSEMBL_ORTHOLOGUES");
>>
>> will return [$mmus1_member], and
>>
>> $homology_adaptor->fetch_all_by_Member_paired_species($hsap2_member,
>> "mus_musculus", "ENSEMBL_ORTHOLOGUES");
>>
>> will return [$mmus2_member, $mmus2'_member]. If you want to get
>> the intra-species paralogues as well, you can add:
>>
>> $homology_adaptor->fetch_all_by_Member_paired_species($hsap2_member,
>> "homo_sapiens", "ENSEMBL_PARALOGUES");
>>
>> I hope this helps
>>
>> Javier
>>
>> On 12/02/12 01:24, Jason Merkin wrote:
>>> Hello. I am trying to identify duplications that have occured
>>> within a group of species. I have gone through the tutorial and
>>> the mailing list archives and couldn't find anything on it. I
>>> will use the example on the webpage that explains the homology
>>> definitions
>>> (http://ensembl.genomics.org.cn:8058/info/docs/compara/homology_method.html)
>>> to illustrate what I am trying to do. Using just human and mouse,
>>> as on the diagram, I would like to query with Hsap1 and get the
>>> set of (Hsap1, Mmus1); query with Hsap2 and get (Hsap2, Hsap2',
>>> Mmus2, Mmus2'); and query with Hsap3 and get (Hsap3, Mmus3,
>>> Mmus3'). Is there a way to specify the homology type and, more
>>> importantly, restrict the species to be considered for definining
>>> the homology?
>>>
>>> Thanks for any help,
>>> Jason Merkin
>>>
More information about the Dev
mailing list