[ensembl-dev] compara api: paralogous DNAs
Matthieu Muffato
muffato at ebi.ac.uk
Mon Mar 25 12:58:15 GMT 2019
Hi Jinrui,
To configure the Registry on human GRCh38 (i.e. the default Ensembl
database), use
Bio::EnsEMBL::Registry->load_registry_from_url('mysql://anonymous@ensembldb.ensembl.org');
And for GRCh37:
Bio::EnsEMBL::Registry->load_registry_from_url('mysql://anonymous@ensembldb.ensembl.org:3337');
You can either directly call this in your script, or put it in a
registry configuration file named "configuration_file" since you are
calling $registry->load_all("configuration_file");
There is more documentation about using and setting up the Registry at
http://www.ensembl.org/info/docs/api/registry.html
Hope this helps,
Mattthieu
On 24/03/2019 16:04, Jin-Rui Xu wrote:
> Hi Matthieu,
>
> The script you provided helps a lot!
> For a genomic region in human, I need its alignment in another
> species. Therefore, I need to know the species names. Here is the
> short script, but it returns no names. My question is how to find the
> species names used by the compara API? Another question is about the
> genomic version, I think by default the perl API uses human genome 38,
> but how to use 37 with the API?
> Many thanks!
> Jinrui
>
> useBio::EnsEMBL::Registry;
>
>
> my$registry= 'Bio::EnsEMBL::Registry';
>
>
> $registry->load_all("configuration_file");
>
> my at species_names= @{$registry->get_all_species() };
>
>
> print "@species_names\n";
>
>
>
> On Wed, Mar 20, 2019 at 1:58 PM Matthieu Muffato <muffato at ebi.ac.uk
> <mailto:muffato at ebi.ac.uk>> wrote:
>
> Hi Jinrui
>
> Have a look back at my first email. There is an example URL to our
> REST API server to get the alignment of a human region
>
> If you want to use the Perl API, you can adapt
> https://github.com/Ensembl/ensembl-presentation/blob/master/API/Compara/exercises/gab1.pl
>
> Regards,
> Matthieu
>
> On 18/03/2019 16:53, Jin-Rui Xu wrote:
>> Hi Matthieu,
>>
>> I am going to use the human self-alignment to detect paralogous
>> genomic regions (particularly non coding regions). But I can not
>> find examples of API for this purpose. Could you pass me some
>> scripts or examples where I can start? Say I have a human genomic
>> coordinate, and want to find its paralogous regions and alignments.
>> Many thanks.
>> Jinrui
>>
>> On Wed, Mar 13, 2019 at 8:37 AM Matthieu Muffato
>> <muffato at ebi.ac.uk <mailto:muffato at ebi.ac.uk>> wrote:
>>
>> Hi Jinrui
>>
>> In all our pairwise alignments, we refine the LastZ alignment
>> blocks with two steps called "chaining" and "netting" (see
>> http://europepmc.org/articles/PMC4852398 and
>> http://genomewiki.ucsc.edu/index.php/Chains_Nets for more
>> information). What you get in our database is the product of
>> these two steps.
>> The netting phase is done on the reference species only, we
>> don't do bidirectional netting. This means that there is very
>> little overlap / nesting on the reference species (human in
>> the case of the human vs * alignments). Overlap / nesting is
>> allowed on the non-reference species, though. For instance,
>> in the human-mouse alignments, there are 20,000 pairs of
>> blocks that overlap on human, and 1,900,000 pairs of blocks
>> that overlap on mouse.
>>
>> So in this case, yes you can identify human paralogous
>> regions 1) through the self-alignment and 2) through the
>> human-mouse alignment (or any pairwise alignment that
>> involves human) by finding human regions that align to the
>> same region in the other species
>>
>> Hope this helps,
>>
>> Matthieu
>>
>> On 11/03/2019 19:45, Jin-Rui Xu wrote:
>>> Hi Matthieu,
>>>
>>> Thank you very much for your email.
>>>
>>> I am wondering in the human self alignment, one genomic
>>> region may be mapped to multiple other regions. These
>>> multiple hits also exist in e.g. human vs mouse genome
>>> alignment.
>>> Does ensembl provide all these multiple regions or just the
>>> best one? Can these multiple hits achieved by compara perl API?
>>>
>>> Thanks!
>>> Jinrui
>>>
>>>
>>>
>>>
>>> On Mon, Mar 11, 2019 at 3:05 PM Matthieu Muffato
>>> <muffato at ebi.ac.uk <mailto:muffato at ebi.ac.uk>> wrote:
>>>
>>> Dear Jinrui,
>>>
>>> We have a human self-alignment, that has been computed
>>> with LastZ and
>>> identifies paralogous regions within the genome. You can
>>> find the whole
>>> alignment on the FTP
>>> ftp://ftp.ensembl.org/pub/current_maf/ensembl-compara/pairwise_alignments/
>>>
>>> but also query specific regions:
>>> http://rest.ensembl.org/alignment/region/homo_sapiens/17:63997797-64000390:1?species_set=homo_sapiens;content-type=application/json;method=LASTZ_NET
>>>
>>> Human is the only species for which we have a
>>> self-alignment.
>>>
>>> Kind regards,
>>> Matthieu
>>>
>>> On 09/03/2019 03:10, Jin-Rui Xu wrote:
>>> > Hello,
>>> >
>>> > I just started learning the compara API. However, I am
>>> still not sure
>>> > whether it can address my questions. I am wondering if
>>> someone could
>>> > give me some guidance and example scripts. Here is my
>>> question: (1) I
>>> > want to identify all paralogous DNA fragments (not
>>> neccessarily genes)
>>> > in a genome. One genomic regions may have more than
>>> one duplicate. (2)
>>> > Then, I want to find in which of the other species,
>>> the two paralogous
>>> > DNAs have a common ancestor.
>>> > Alternatively, I can focus on two genomic regions in a
>>> genome to test
>>> > if they are paralogous, and then which species has
>>> their common
>>> > ancestral DNA
>>> > How could I get this done using compara API (version 95)?
>>> >
>>> > Many thanks!
>>> >
>>> > Jinrui
>>>
>>> --
>>> Matthieu Muffato, Ph.D.
>>> Ensembl Compara and TreeFam Project Leader
>>> European Bioinformatics Institute (EMBL-EBI)
>>> European Molecular Biology Laboratory
>>> Wellcome Trust Genome Campus, Hinxton
>>> Cambridge, CB10 1SD, United Kingdom
>>> Room A3-145
>>> Phone + 44 (0) 1223 49 4631
>>> Fax + 44 (0) 1223 49 4468
>>>
>> --
>> Matthieu Muffato, Ph.D.
>> Ensembl Compara and TreeFam Project Leader
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge, CB10 1SD, United Kingdom
>> Room A3-145
>> Phone + 44 (0) 1223 49 4631
>> Fax + 44 (0) 1223 49 4468
>>
> --
> Matthieu Muffato, Ph.D.
> Ensembl Compara and TreeFam Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus, Hinxton
> Cambridge, CB10 1SD, United Kingdom
> Room A3-145
> Phone + 44 (0) 1223 49 4631
> Fax + 44 (0) 1223 49 4468
>
--
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room A3-145
Phone + 44 (0) 1223 49 4631
Fax + 44 (0) 1223 49 4468
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190325/a45b5615/attachment.html>
More information about the Dev
mailing list