[ensembl-dev] get conservation of a region based on multi-species genomic alignment

Matthieu Muffato muffato at ebi.ac.uk
Tue Jun 16 14:59:44 BST 2015


Hi Julien,

On 11/06/15 10:03, Julien Roux wrote:
> Hi Matthieu and thanks for your reply
>> We have experience with GERP, and we routinely run it on the 39-way
>> and other sets (the ConservationScore object you've found in our API).
>> We don't do it for primates though, so you could use our API to dig
>> conservation between all *mammals*, not only primates.
>> The score() method of GenomicAlignBlock returns the LASTZ score of the
>> aligned regions. It's not a measure of conservation
> Ok, so if I want to dig up conservation across all mammals for a given
> alignment block, would the following command work?
> foreach my $genomic_align_block( @{ $genomic_align_blocks }) {
>    my $restricted_gab =
> $genomic_align_block->restrict_between_reference_positions($seq_region_start,
> $seq_region_end);
>    my $conservation =
> $conservation_score->genomic_align_block($restricted_gab);
> }
>

Can you have a look at 
https://github.com/Ensembl/ensembl-compara/blob/release/80/scripts/examples/dna_getConservationScores.pl 
?

There is a ConvervationScore adaptor, and it has a 
fetch_all_by_MethodLinkSpeciesSet_Slice() method that works like in the 
GenomicalignBlock adaptor. It returns an array of ConservationScore objects.

The MethodLinkSpeciesSet object is of type "GERP_CONSERVATION_SCORE" and 
on the species-set "mammals"


>> I think the main question is: do you want a score that reflects the
>> conservation of the region amongst the 8 species ? Or something that
>> tells how much the human sequence is shared by the other primates ?
>> (GERP is of the 1st category, for instance).
> Actually, rather the second category (how much the human sequence is
> shared by the other primates?)
>> Then how do you want to interpret the conservation score ? Similarity
>> between the primates, or presence of an ancestral state ? We could
>> imagine that a human region is not conserved amongst all primates, but
>> still is the same as in the last common ancestor of primates / mammals
> That is an interesting option too. Do you infer the the ancestral
> sequences for a genomic alignment block? How to access it?
> I could calculate the % similarity between the human sequence and the
> ancestral primate sequence
> Cheers
> Julin
>>


To get the ancestral sequence, you will need to link the 
GenomicAlignBlock object to GenomicAlignTree. Leaves of a 
GenomicAlignTree are pieces of DNA from the extant species. Internal 
nodes represent reconstructed, ancestral, sequences

Your problem is similar to the "AgeOfBase" track that we have recentyl 
developed. See the implementation at 
https://github.com/Ensembl/ensembl-compara/blob/release/77/modules/Bio/EnsEMBL/Compara/RunnableDB/BaseAge/BaseAge.pm#L121

In short
- L140: get all the GenomicAlignTree objects for the query region
- L160: iterate over list of trees
- L190: iterate over the internal nodes of a given tree
- L195+203: get the ancestral sequence of this node

Let me know if you have any further questions

Matthieu




More information about the Dev mailing list