[ensembl-dev] compara api: paralogous DNAs

Matthieu Muffato muffato at ebi.ac.uk
Wed Mar 20 17:58:41 GMT 2019


Hi Jinrui

Have a look back at my first email. There is an example URL to our REST 
API server to get the alignment of a human region

If you want to use the Perl API, you can adapt 
https://github.com/Ensembl/ensembl-presentation/blob/master/API/Compara/exercises/gab1.pl

Regards,
Matthieu

On 18/03/2019 16:53, Jin-Rui Xu wrote:
> Hi Matthieu,
>
> I am going to use the human self-alignment to detect paralogous 
> genomic regions (particularly non coding regions). But I can not find 
> examples of API for this purpose. Could you pass me some scripts or 
> examples where I can start? Say I have a human genomic coordinate, and 
> want to find its paralogous regions and alignments.
> Many thanks.
> Jinrui
>
> On Wed, Mar 13, 2019 at 8:37 AM Matthieu Muffato <muffato at ebi.ac.uk 
> <mailto:muffato at ebi.ac.uk>> wrote:
>
>     Hi Jinrui
>
>     In all our pairwise alignments, we refine the LastZ alignment
>     blocks with two steps called "chaining" and "netting" (see
>     http://europepmc.org/articles/PMC4852398 and
>     http://genomewiki.ucsc.edu/index.php/Chains_Nets for more
>     information). What you get in our database is the product of these
>     two steps.
>     The netting phase is done on the reference species only, we don't
>     do bidirectional netting. This means that there is very little
>     overlap / nesting on the reference species (human in the case of
>     the human vs * alignments). Overlap / nesting is allowed on the
>     non-reference species, though. For instance, in the human-mouse
>     alignments, there are 20,000 pairs of blocks that overlap on
>     human, and 1,900,000 pairs of blocks that overlap on mouse.
>
>     So in this case, yes you can identify human paralogous regions 1)
>     through the self-alignment and 2) through the human-mouse
>     alignment (or any pairwise alignment that involves human) by
>     finding human regions that align to the same region in the other
>     species
>
>     Hope this helps,
>
>     Matthieu
>
>     On 11/03/2019 19:45, Jin-Rui Xu wrote:
>>     Hi Matthieu,
>>
>>     Thank you very much for your email.
>>
>>     I am wondering in the human self alignment, one genomic region
>>     may be mapped to multiple other regions. These multiple hits also
>>     exist in e.g. human vs mouse genome alignment.
>>     Does ensembl provide all these multiple regions or just the best
>>     one? Can these multiple hits achieved by compara perl API?
>>
>>     Thanks!
>>     Jinrui
>>
>>
>>
>>
>>     On Mon, Mar 11, 2019 at 3:05 PM Matthieu Muffato
>>     <muffato at ebi.ac.uk <mailto:muffato at ebi.ac.uk>> wrote:
>>
>>         Dear Jinrui,
>>
>>         We have a human self-alignment, that has been computed with
>>         LastZ and
>>         identifies paralogous regions within the genome. You can find
>>         the whole
>>         alignment on the FTP
>>         ftp://ftp.ensembl.org/pub/current_maf/ensembl-compara/pairwise_alignments/
>>
>>         but also query specific regions:
>>         http://rest.ensembl.org/alignment/region/homo_sapiens/17:63997797-64000390:1?species_set=homo_sapiens;content-type=application/json;method=LASTZ_NET
>>
>>         Human is the only species for which we have a self-alignment.
>>
>>         Kind regards,
>>         Matthieu
>>
>>         On 09/03/2019 03:10, Jin-Rui Xu wrote:
>>         > Hello,
>>         >
>>         > I just started learning the compara API. However, I am
>>         still not sure
>>         > whether it can address my questions. I am wondering if
>>         someone could
>>         > give me some guidance and example scripts. Here is my
>>         question: (1) I
>>         > want to identify all paralogous DNA fragments (not
>>         neccessarily genes)
>>         > in a genome. One genomic regions may have more than one
>>         duplicate. (2)
>>         > Then, I want to find in which of the other species, the two
>>         paralogous
>>         > DNAs have a common ancestor.
>>         > Alternatively, I can focus on two genomic regions in a
>>         genome to test
>>         > if they are paralogous, and then which species has their
>>         common
>>         > ancestral DNA
>>         > How could I get this done using compara API (version 95)?
>>         >
>>         > Many thanks!
>>         >
>>         > Jinrui
>>
>>         -- 
>>         Matthieu Muffato, Ph.D.
>>         Ensembl Compara and TreeFam Project Leader
>>         European Bioinformatics Institute (EMBL-EBI)
>>         European Molecular Biology Laboratory
>>         Wellcome Trust Genome Campus, Hinxton
>>         Cambridge, CB10 1SD, United Kingdom
>>         Room  A3-145
>>         Phone + 44 (0) 1223 49 4631
>>         Fax   + 44 (0) 1223 49 4468
>>
>     -- 
>     Matthieu Muffato, Ph.D.
>     Ensembl Compara and TreeFam Project Leader
>     European Bioinformatics Institute (EMBL-EBI)
>     European Molecular Biology Laboratory
>     Wellcome Trust Genome Campus, Hinxton
>     Cambridge, CB10 1SD, United Kingdom
>     Room  A3-145
>     Phone + 44 (0) 1223 49 4631
>     Fax   + 44 (0) 1223 49 4468
>
-- 
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190320/72da0d41/attachment.html>


More information about the Dev mailing list