[ensembl-dev] compara api: paralogous DNAs

Matthieu Muffato muffato at ebi.ac.uk
Mon Mar 25 12:58:15 GMT 2019


Hi Jinrui,

To configure the Registry on human GRCh38 (i.e. the default Ensembl 
database), use

Bio::EnsEMBL::Registry->load_registry_from_url('mysql://anonymous@ensembldb.ensembl.org');

And for GRCh37:

Bio::EnsEMBL::Registry->load_registry_from_url('mysql://anonymous@ensembldb.ensembl.org:3337');

You can either directly call this in your script, or put it in a 
registry configuration file named "configuration_file" since you are 
calling $registry->load_all("configuration_file");

There is more documentation about using and setting up the Registry at 
http://www.ensembl.org/info/docs/api/registry.html

Hope this helps,
Mattthieu

On 24/03/2019 16:04, Jin-Rui Xu wrote:
> Hi Matthieu,
>
> The script you provided helps a lot!
> For a genomic region in human, I need its alignment in another 
> species. Therefore, I need to know the species names. Here is the 
> short script, but it returns no names. My question is how to find the 
> species names used by the compara API? Another question is about the 
> genomic version, I think by default the perl API uses human genome 38, 
> but how to use 37 with the API?
> Many thanks!
> Jinrui
>
> useBio::EnsEMBL::Registry;
>
>
> my$registry= 'Bio::EnsEMBL::Registry';
>
>
> $registry->load_all("configuration_file");
>
> my at species_names= @{$registry->get_all_species() };
>
>
> print "@species_names\n";
>
>
>
> On Wed, Mar 20, 2019 at 1:58 PM Matthieu Muffato <muffato at ebi.ac.uk 
> <mailto:muffato at ebi.ac.uk>> wrote:
>
>     Hi Jinrui
>
>     Have a look back at my first email. There is an example URL to our
>     REST API server to get the alignment of a human region
>
>     If you want to use the Perl API, you can adapt
>     https://github.com/Ensembl/ensembl-presentation/blob/master/API/Compara/exercises/gab1.pl
>
>     Regards,
>     Matthieu
>
>     On 18/03/2019 16:53, Jin-Rui Xu wrote:
>>     Hi Matthieu,
>>
>>     I am going to use the human self-alignment to detect paralogous
>>     genomic regions (particularly non coding regions). But I can not
>>     find examples of API for this purpose. Could you pass me some
>>     scripts or examples where I can start? Say I have a human genomic
>>     coordinate, and want to find its paralogous regions and alignments.
>>     Many thanks.
>>     Jinrui
>>
>>     On Wed, Mar 13, 2019 at 8:37 AM Matthieu Muffato
>>     <muffato at ebi.ac.uk <mailto:muffato at ebi.ac.uk>> wrote:
>>
>>         Hi Jinrui
>>
>>         In all our pairwise alignments, we refine the LastZ alignment
>>         blocks with two steps called "chaining" and "netting" (see
>>         http://europepmc.org/articles/PMC4852398 and
>>         http://genomewiki.ucsc.edu/index.php/Chains_Nets for more
>>         information). What you get in our database is the product of
>>         these two steps.
>>         The netting phase is done on the reference species only, we
>>         don't do bidirectional netting. This means that there is very
>>         little overlap / nesting on the reference species (human in
>>         the case of the human vs * alignments). Overlap / nesting is
>>         allowed on the non-reference species, though. For instance,
>>         in the human-mouse alignments, there are 20,000 pairs of
>>         blocks that overlap on human, and 1,900,000 pairs of blocks
>>         that overlap on mouse.
>>
>>         So in this case, yes you can identify human paralogous
>>         regions 1) through the self-alignment and 2) through the
>>         human-mouse alignment (or any pairwise alignment that
>>         involves human) by finding human regions that align to the
>>         same region in the other species
>>
>>         Hope this helps,
>>
>>         Matthieu
>>
>>         On 11/03/2019 19:45, Jin-Rui Xu wrote:
>>>         Hi Matthieu,
>>>
>>>         Thank you very much for your email.
>>>
>>>         I am wondering in the human self alignment, one genomic
>>>         region may be mapped to multiple other regions. These
>>>         multiple hits also exist in e.g. human vs mouse genome
>>>         alignment.
>>>         Does ensembl provide all these multiple regions or just the
>>>         best one? Can these multiple hits achieved by compara perl API?
>>>
>>>         Thanks!
>>>         Jinrui
>>>
>>>
>>>
>>>
>>>         On Mon, Mar 11, 2019 at 3:05 PM Matthieu Muffato
>>>         <muffato at ebi.ac.uk <mailto:muffato at ebi.ac.uk>> wrote:
>>>
>>>             Dear Jinrui,
>>>
>>>             We have a human self-alignment, that has been computed
>>>             with LastZ and
>>>             identifies paralogous regions within the genome. You can
>>>             find the whole
>>>             alignment on the FTP
>>>             ftp://ftp.ensembl.org/pub/current_maf/ensembl-compara/pairwise_alignments/
>>>
>>>             but also query specific regions:
>>>             http://rest.ensembl.org/alignment/region/homo_sapiens/17:63997797-64000390:1?species_set=homo_sapiens;content-type=application/json;method=LASTZ_NET
>>>
>>>             Human is the only species for which we have a
>>>             self-alignment.
>>>
>>>             Kind regards,
>>>             Matthieu
>>>
>>>             On 09/03/2019 03:10, Jin-Rui Xu wrote:
>>>             > Hello,
>>>             >
>>>             > I just started learning the compara API. However, I am
>>>             still not sure
>>>             > whether it can address my questions. I am wondering if
>>>             someone could
>>>             > give me some guidance and example scripts. Here is my
>>>             question: (1) I
>>>             > want to identify all paralogous DNA fragments (not
>>>             neccessarily genes)
>>>             > in a genome. One genomic regions may have more than
>>>             one duplicate. (2)
>>>             > Then, I want to find in which of the other species,
>>>             the two paralogous
>>>             > DNAs have a common ancestor.
>>>             > Alternatively, I can focus on two genomic regions in a
>>>             genome to test
>>>             > if they are paralogous, and then which species has
>>>             their common
>>>             > ancestral DNA
>>>             > How could I get this done using compara API (version 95)?
>>>             >
>>>             > Many thanks!
>>>             >
>>>             > Jinrui
>>>
>>>             -- 
>>>             Matthieu Muffato, Ph.D.
>>>             Ensembl Compara and TreeFam Project Leader
>>>             European Bioinformatics Institute (EMBL-EBI)
>>>             European Molecular Biology Laboratory
>>>             Wellcome Trust Genome Campus, Hinxton
>>>             Cambridge, CB10 1SD, United Kingdom
>>>             Room  A3-145
>>>             Phone + 44 (0) 1223 49 4631
>>>             Fax   + 44 (0) 1223 49 4468
>>>
>>         -- 
>>         Matthieu Muffato, Ph.D.
>>         Ensembl Compara and TreeFam Project Leader
>>         European Bioinformatics Institute (EMBL-EBI)
>>         European Molecular Biology Laboratory
>>         Wellcome Trust Genome Campus, Hinxton
>>         Cambridge, CB10 1SD, United Kingdom
>>         Room  A3-145
>>         Phone + 44 (0) 1223 49 4631
>>         Fax   + 44 (0) 1223 49 4468
>>
>     -- 
>     Matthieu Muffato, Ph.D.
>     Ensembl Compara and TreeFam Project Leader
>     European Bioinformatics Institute (EMBL-EBI)
>     European Molecular Biology Laboratory
>     Wellcome Trust Genome Campus, Hinxton
>     Cambridge, CB10 1SD, United Kingdom
>     Room  A3-145
>     Phone + 44 (0) 1223 49 4631
>     Fax   + 44 (0) 1223 49 4468
>
-- 
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190325/a45b5615/attachment.html>


More information about the Dev mailing list