[ensembl-dev] compara api: paralogous DNAs

Jin-Rui Xu jrxu.bioinf at gmail.com
Mon Mar 18 16:53:32 GMT 2019


Hi Matthieu,

I am going to use the human self-alignment to detect paralogous genomic
regions (particularly non coding regions). But I can not find examples of
API for this purpose. Could you pass me some scripts or examples where I
can start? Say I have a human genomic coordinate, and want to find its
paralogous regions and alignments.
Many thanks.
Jinrui

On Wed, Mar 13, 2019 at 8:37 AM Matthieu Muffato <muffato at ebi.ac.uk> wrote:

> Hi Jinrui
>
> In all our pairwise alignments, we refine the LastZ alignment blocks with
> two steps called "chaining" and "netting" (see
> http://europepmc.org/articles/PMC4852398 and
> http://genomewiki.ucsc.edu/index.php/Chains_Nets for more information).
> What you get in our database is the product of these two steps.
> The netting phase is done on the reference species only, we don't do
> bidirectional netting. This means that there is very little overlap /
> nesting on the reference species (human in the case of the human vs *
> alignments). Overlap / nesting is allowed on the non-reference species,
> though. For instance, in the human-mouse alignments, there are 20,000 pairs
> of blocks that overlap on human, and 1,900,000 pairs of blocks that overlap
> on mouse.
>
> So in this case, yes you can identify human paralogous regions 1) through
> the self-alignment and 2) through the human-mouse alignment (or any
> pairwise alignment that involves human) by finding human regions that align
> to the same region in the other species
>
> Hope this helps,
>
> Matthieu
> On 11/03/2019 19:45, Jin-Rui Xu wrote:
>
> Hi Matthieu,
>
> Thank you very much for your email.
>
> I am wondering in the human self alignment, one genomic region may be
> mapped to multiple other regions. These multiple hits also exist in e.g.
> human vs mouse genome alignment.
> Does ensembl provide all these multiple regions or just the best one? Can
> these multiple hits achieved by compara perl API?
>
> Thanks!
> Jinrui
>
>
>
>
>
> On Mon, Mar 11, 2019 at 3:05 PM Matthieu Muffato <muffato at ebi.ac.uk>
> wrote:
>
>> Dear Jinrui,
>>
>> We have a human self-alignment, that has been computed with LastZ and
>> identifies paralogous regions within the genome. You can find the whole
>> alignment on the FTP
>> ftp://ftp.ensembl.org/pub/current_maf/ensembl-compara/pairwise_alignments/
>> but also query specific regions:
>>
>> http://rest.ensembl.org/alignment/region/homo_sapiens/17:63997797-64000390:1?species_set=homo_sapiens;content-type=application/json;method=LASTZ_NET
>>
>> Human is the only species for which we have a self-alignment.
>>
>> Kind regards,
>> Matthieu
>>
>> On 09/03/2019 03:10, Jin-Rui Xu wrote:
>> > Hello,
>> >
>> > I just started learning the compara API. However, I am still not sure
>> > whether it can address my questions. I am wondering if someone could
>> > give me some guidance and example scripts. Here is my question: (1) I
>> > want to identify all paralogous DNA fragments (not neccessarily genes)
>> > in a genome. One genomic regions may have more than one duplicate. (2)
>> > Then, I want to find in which of the other species, the two paralogous
>> > DNAs have a common ancestor.
>> > Alternatively, I can focus on two genomic regions in a genome to test
>> > if they are paralogous, and then which species has their common
>> > ancestral DNA
>> > How could I get this done using compara API (version 95)?
>> >
>> > Many thanks!
>> >
>> > Jinrui
>>
>> --
>> Matthieu Muffato, Ph.D.
>> Ensembl Compara and TreeFam Project Leader
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge, CB10 1SD, United Kingdom
>> Room  A3-145
>> Phone + 44 (0) 1223 49 4631
>> Fax   + 44 (0) 1223 49 4468
>>
>> --
> Matthieu Muffato, Ph.D.
> Ensembl Compara and TreeFam Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus, Hinxton
> Cambridge, CB10 1SD, United Kingdom
> Room  A3-145
> Phone + 44 (0) 1223 49 4631
> Fax   + 44 (0) 1223 49 4468
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190318/70c77f4a/attachment.html>


More information about the Dev mailing list