[ensembl-dev] API question about Multi Alignments (technical!)

Fri Jan 25 13:06:20 GMT 2013

Hi Marc, you should be able to get that informion directly from the slice 
object (obtained from the $align_slice_slice object, below).

for example (I'm looking at a small region on human chr7)

#######

my $method_link_species_set =
  $method_link_species_set_adaptor->fetch_by_method_link_type_species_set_name($alignment_type,$set_of_species);

my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, 'core', 'Slice');

my $align_slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Multi","compara","AlignSlice");

my $query_slice = $slice_adaptor->fetch_by_region('chromosome','7', 113726353, 113726692 );

my $alignSlice = $align_slice_adaptor->fetch_by_Slice_MethodLinkSpeciesSet( $query_slice, $method_link_species_set );

foreach my $align_slice_slice(@{ $alignSlice->get_all_Slices }){
  my $seq = $align_slice_slice->seq;
  next unless $seq=~/\w/;
  my($slice, $position) = $align_slice_slice->get_original_seq_region_position();
  print $slice->name, "\n";
}

########

output:
chromosome:GRCh37:7:113726353:113726692:1
chromosome:GRCm38:6:14901447:14901787:1
chromosome:oryCun2:7:31354211:31354521:-1
chromosome:CHIMP2.1.4:7:115561357:115561698:1
chromosome:gorGor3.1:7:111957332:111957674:1
chromosome:PPYG2:7:110575906:110576246:1
chromosome:MMUL_1:3:151447053:151447375:1
chromosome:C_jacchus3.2.1:8:82905928:82906267:1
chromosome:CanFam3.1:14:53311047:53311423:1
chromosome:EquCab2:4:71503687:71504047:1
chromosome:BROADO5:8:168670777:168671257:1

Hope that helps,
Stephen.

On Fri, 25 Jan 2013, Marc Hoeppner wrote:

> Dear EnsEMBL team,
> 
>  I am currently trying to convert the 19 vertebrate PECAN alignment into something resembling a pairwise synteny map of sorts (I know there are ways to get the actual
> pairwise synteny, but I have something else I wish to test on this data). To elaborate:
> 
> The goal would be to get information on which regions are aligned between human and any of the other 18 species – with base-precision for the start and stop of these
> alignments in gemomic coordinates. The output should look like:
> 
> Human_chr human_chr_start human_chr_end target_chr target_chr_start target_chr_end score target_chr_strand
> 
>  However, as it turns out this is apparently not as straight-forward as I had hoped.  The main steps, I suspect, would be:
> 
> Deconstruct the aligned sequences (AlignSlice) into their underlying AlignSlice::Slices
> 
> Determine the genomic coordinates of the underlying genomic slices (many genomic slices can make up one AlignSlice::Slice, possibly separated by larger GAP slices)
> 
> Convert these genomic coordinates back to the AlignSlice::Slice (to get their position in the fake alignment coordinate system)
> 
> Translate these alignment coordinates to human genomic coordinates via the human AlignSlice::Slice
> 
> However, turns out that for each genomic slice, I bascially seem to end up with the global coordinates of the AlignSlice::Slice I started from. This would make sense if
> each AlignSlice::Slice would consist of exactly one genomic_align and this genomic_align would align full-length to the human query. Doubtful, I think.
> 
> Clearly, I am missing something, somewhere ;) Perhaps there is some padding going on that I can’t seem to get rid of?
> 
> I have attached the script if anyone has an idea of how to do this correctly.
> 
>  
> 
> Cheers,
> 
>  
> 
> Marc
> 
> 
>