[ensembl-dev] extracting human mouse between species out paralogs

Stamboulian, Mouses Hrag mstambou at indiana.edu
Thu Oct 8 23:31:25 BST 2015


Dear Matthieu,

Thanks a lot for the script. When I ran it I started to get output such as this (please find below). Also its taking very long time to print outputs, the script is running for hours. I that normal?

I have some questions for the script's output. The lines that are printing outputs like this: [10100/20667], I did not really understand what they mean. what Im assuming is that these are for the orthologous gene pairs?
am I safe to assume that? Also the other kind of output that Im getting: ENSGT00530000064989     ENSMUSP00000100418      ENSP00000439668 here I'm assuming that the first one is the gene tree ID and the next one is the paralogous protein ID for the mouse and the third one is the paralogue protein ID found in humans?

also one last thing. Could we modify the script such that we can display the output in this format: Tree ID       Mouse_Gene_ID        Mouse_protein_ID         Human_gene_ID      Human_protein ID   and paralogy_Confidence? 

if the paralogy confidence could not be inferred than that's fine.

Thanks a lot.    

[10100/20667]
[10200/20667]
[10300/20667]
[10400/20667]
[10500/20667]
[10600/20667]
[10700/20667]
[10800/20667]
[10900/20667]
[11000/20667]
[11100/20667]
[11200/20667]
[11300/20667]
ENSGT00770000120830     ENSMUSP00000051355      ENSP00000476742
[11400/20667]
[11500/20667]
ENSGT00530000064989     ENSMUSP00000100418      ENSP00000439668
ENSGT00530000064989     ENSMUSP00000100418      ENSP00000446309
ENSGT00530000064989     ENSMUSP00000143226      ENSP00000439668
ENSGT00530000064989     ENSMUSP00000143226      ENSP00000446309
ENSGT00530000064989     ENSMUSP00000136007      ENSP00000439668

________________________________________
From: dev-bounces at ensembl.org <dev-bounces at ensembl.org> on behalf of Matthieu Muffato <muffato at ebi.ac.uk>
Sent: Thursday, October 8, 2015 1:44 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] extracting human mouse between species out   paralogs

Dear Mouses,

Your description of BioMart and the API is correct but it doesn't work
because we don't store between-species paralogs in the databases.

A solution is to make a (more complicated) script that goes through all
the gene-trees and select in these all the human-mouse pairs that are
not orthologues. I attach a script that should work. Let me know if you
find any issues

Regards,
Matthieu, Ensembl Compara

On 08/10/15 00:29, Stamboulian, Mouses Hrag wrote:
> Hi,
>
>
> Im trying to extract the human mouse between species out-paralogs. I
> tried using the GUI through ensemble biomart however could not able to
> extract the needed data because when I select homo sapiens as my dataset
> and then select homologs as the attribute, in the paralogs sections I
> only have options to select the human paralogs (i.e. within species
> paralogs) however no options for between species paralogs were found.
>
>
> Furthermore I tried extracting the data through the perl API. In doing
> so I tried too modify this script (please find below). In doing so I
> tried to change the parameter at the bolded line in the code, in the
> fetch_by_method_link_type_registry_aliases(), by replacing the
> 'ENSEMBL_ORTHOLOGUES' by 'ENSEMBL_PARALOGUES' or 'ENSEMBLE_HOMOLOGUES'
> hoping it would return paralogs or all the homologs in general. However
> it failed to do that. I could not find what other parameters I could
> pass to this method instead of 'ENSEMBL_ORTHOLOGUES' , as I could not
> find it in your documentation present here:
> http://www.ensembl.org/info/docs/Doxygen/compara-api/classBio_1_1EnsEMBL_1_1Compara_1_1DBSQL_1_1MethodLinkSpeciesSetAdaptor.html#aeb42739559569b62ee3bfab6da764976?
>
>
> my question is what would be a script to retrieve such data?  Help
> please. Thank you
>
>
> use strict;
> use warnings;
>
> use Bio::EnsEMBL::Registry;
>
> ## Load the registry automatically
> my $reg = "Bio::EnsEMBL::Registry";
> $reg->load_registry_from_url('mysql://anonymous@ensembldb.ensembl.org');
>
> ## Get the compara mlss adaptor
> my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
> "MethodLinkSpeciesSet");
>
> ## Get the compara homology adaptor
> my $homology_adaptor = $reg->get_adaptor("Multi", "compara", "Homology");
>
> ## Species definition
> my $species1 = 'human';
> my $species2 = 'mouse';
>
> ## Get the MethodLinkSpeciesSet object describing the orthology between
> the two species
> *my $this_mlss =
> $mlss_adaptor->fetch_by_method_link_type_registry_aliases('ENSEMBL_ORTHOLOGUES',
> [$species1, $species2]);*
>
> ## Get all the homologues
> my $all_homologies =
> $homology_adaptor->fetch_all_by_MethodLinkSpeciesSet($this_mlss);
>
> ## For each homology
> my $count = 0;
> foreach my $this_homology (@{$all_homologies}) {
>
>    ## only keeps the one2one
>    if ($this_homology->description() eq 'ortholog_one2one') {
>      $count++;
>    }
> }
>
> print "There are $count 1-to-1 orthologues between $species1 and
> $species2\n";
>
> ## Alternative (shorter) version
> my $all_one2one =
> $homology_adaptor->fetch_all_by_MethodLinkSpeciesSet($this_mlss,
> -orthology_type => 'ortholog_one2one');
>
> print "It should be the same number as: ", scalar(@{$all_one2one}), "\n";
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

--
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468



More information about the Dev mailing list