[ensembl-dev] Havana Gene in Compara

Nick Fankhauser lists at nyk.ch
Thu May 12 00:07:52 BST 2011


Hi!

I encontered a problem with this solution now. It works to translate my 
list of Vega stable_ids into Ensembl core stable_ids, but for all but 3 
of my 524 ncRNA genes it fails to get a member-adaptor!
Below is my code and further below the output until it find the first 
gene with a member adaptor as well as a mouse homology.
Is there any way to map the remaining IDs?

code:
--------------
Bio::EnsEMBL::Registry->load_registry_from_db(-host => 
'ensembldb.ensembl.org', -user => 'anonymous');
my $ncbi_taxonomy_id=10090; # mouse (mus musculus)

sub core_gene {
     my $vega_stable_id=shift;
     my $human_gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Homo 
sapiens", "core", "Gene");
     my $all_genes = 
$human_gene_adaptor->fetch_all_by_external_name($vega_stable_id);
     foreach my $gene (@$all_genes) {
         print "$vega_stable_id -> ",$gene->stable_id,"\n";
         my $member_adaptor = 
Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Member');
         my $member = 
$member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE",$gene->stable_id);
         next if !$member; # this is the case for 521 of 524 ncRNA Vega 
stable_ids
         my $homolog=mouse_homolog($member);
         next if !$homolog;
         print "Homolog: ",$homolog,"\n";
         return $homolog;
     }
}

sub mouse_homolog {
     my $member=shift;
     my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi', 
'compara', 'Homology');
     my $homologies = $homology_adaptor->fetch_all_by_Member($member);
     foreach my $homology (@{$homologies}) {
         foreach my $member_attribute 
(@{$homology->get_all_Member_Attribute}) {
             my ($m, $a) = @{$member_attribute};
             if ($m->taxon_id==$ncbi_taxonomy_id) {return $m->stable_id;}
         }
       }
}

my $fn=shift;
open (FILE,$fn);
while (<FILE>) {
     chomp();
     my $homolog=core_gene($_);
}
--------------

output:
--------------
OTTHUMG00000090443 -> ENSG00000224413
OTTHUMG00000150429 -> ENSG00000225007
OTTHUMG00000017909 -> ENSG00000223834
OTTHUMG00000016051 -> ENSG00000234519
OTTHUMG00000016040 -> ENSG00000234768
OTTHUMG00000151887 -> ENSG00000231704
OTTHUMG00000019290 -> ENSG00000226900
OTTHUMG00000032809 -> ENSG00000231078
OTTHUMG00000152824 -> ENSG00000235499
OTTHUMG00000151424 -> ENSG00000224128
OTTHUMG00000151425 -> ENSG00000230090
OTTHUMG00000085959 -> ENSG00000205673
OTTHUMG00000009304 -> ENSG00000231485
OTTHUMG00000020889 -> ENSG00000225565
OTTHUMG00000017195 -> ENSG00000215417
OTTHUMG00000040014 -> ENSG00000226828
OTTHUMG00000132462 -> ENSG00000131797
OTTHUMG00000020188 -> ENSG00000237372
OTTHUMG00000032039 -> ENSG00000225321
OTTHUMG00000153173 -> ENSG00000226791
OTTHUMG00000016038 -> ENSG00000235994
OTTHUMG00000017876 -> ENSG00000235824
OTTHUMG00000007961 -> ENSG00000225028
OTTHUMG00000002406 -> ENSG00000228327
OTTHUMG00000002402 -> ENSG00000237491
OTTHUMG00000150668 -> ENSG00000234928
OTTHUMG00000032149 -> ENSG00000227195
OTTHUMG00000020880 -> ENSG00000235106
OTTHUMG00000040716 -> ENSG00000230699
OTTHUMG00000013955 -> ENSG00000234232
OTTHUMG00000041633 -> ENSG00000233430
OTTHUMG00000086925 -> ENSG00000228709
OTTHUMG00000032037 -> ENSG00000234862
OTTHUMG00000058806 -> ENSG00000141028
OTTHUMG00000153079 -> ENSG00000226508
OTTHUMG00000007855 -> ENSG00000160062
Homolog: ENSMUSG00000028807

Thanks for any help!

Nick

On 05/10/2011 06:07 PM, muffato at ebi.ac.uk wrote:
> Hi Nick
>
> The only available sources for the Compara Member object are ENSEMBLPEP,
> ENSEMBLGENE, Uniprot/SWISSPROT, and Uniprot/SPTREMBL. With an Havana gene
> name, you can go first through the core database to retrieve the Ensembl
> stable gene id.
>
> my $human_gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Homo
> sapiens", "core", "Gene");
> my $all_genes =
> $human_gene_adaptor->fetch_all_by_external_name('OTTHUMG00000023246');
>
> ## For each of these genes...
> foreach my $gene (@$all_genes) {
>    ## Get the compara member
>    my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE",
> $gene->stable_id);
>    print $member->description,"\n";
> }
>
> Hope this helps,
> Matthieu
>
>> I'm trying to get a member adaptor for Havana Genes (ncRNAs), for
>> example OTTHUMG00000002858.
>> The following script works for Ensembl Genes, but how do I have to
>> formulate it for Havana Genes?
>>
>> #!/usr/bin/perl
>> use lib "/opt/ensembl/modules";
>> use lib "/opt/ensembl-compara/modules";
>> use strict;
>> use Bio::EnsEMBL::Registry;
>>
>> Bio::EnsEMBL::Registry->load_registry_from_db(-host =>
>> 'ensembldb.ensembl.org', -user =>  'anonymous');
>> my $member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>> 'compara', 'Member');
>> throw("Cannot connect to Compara") if (!$member_adaptor);
>>
>> # works:
>> my $member =
>> $member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE','ENSG00000004059');
>> die("Gene not found!") if (!$member);
>> print $member->description,"\n";
>>
>> # does not work:
>> my $member =
>> $member_adaptor->fetch_by_source_stable_id('HAVANAGENE','OTTHUMG00000002858');
>> die("Gene not found") if (!$member);
>> print $member->description,"\n";
>>
>> The program dies on the second "fetch_by_source_stable_id" for
>> OTTHUMG00000002858. How can I fetch an adaptor for it?
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>





More information about the Dev mailing list