[ensembl-dev] API question about Multi Alignments (technical!)
Stephen Fitzgerald
stephenf at ebi.ac.uk
Fri Jan 25 13:06:20 GMT 2013
Hi Marc, you should be able to get that informion directly from the slice
object (obtained from the $align_slice_slice object, below).
for example (I'm looking at a small region on human chr7)
#######
my $method_link_species_set =
$method_link_species_set_adaptor->fetch_by_method_link_type_species_set_name($alignment_type,$set_of_species);
my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, 'core', 'Slice');
my $align_slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Multi","compara","AlignSlice");
my $query_slice = $slice_adaptor->fetch_by_region('chromosome','7', 113726353, 113726692 );
my $alignSlice = $align_slice_adaptor->fetch_by_Slice_MethodLinkSpeciesSet( $query_slice, $method_link_species_set );
foreach my $align_slice_slice(@{ $alignSlice->get_all_Slices }){
my $seq = $align_slice_slice->seq;
next unless $seq=~/\w/;
my($slice, $position) = $align_slice_slice->get_original_seq_region_position();
print $slice->name, "\n";
}
########
output:
chromosome:GRCh37:7:113726353:113726692:1
chromosome:GRCm38:6:14901447:14901787:1
chromosome:oryCun2:7:31354211:31354521:-1
chromosome:CHIMP2.1.4:7:115561357:115561698:1
chromosome:gorGor3.1:7:111957332:111957674:1
chromosome:PPYG2:7:110575906:110576246:1
chromosome:MMUL_1:3:151447053:151447375:1
chromosome:C_jacchus3.2.1:8:82905928:82906267:1
chromosome:CanFam3.1:14:53311047:53311423:1
chromosome:EquCab2:4:71503687:71504047:1
chromosome:BROADO5:8:168670777:168671257:1
Hope that helps,
Stephen.
On Fri, 25 Jan 2013, Marc Hoeppner wrote:
> Dear EnsEMBL team,
>
> I am currently trying to convert the 19 vertebrate PECAN alignment into something resembling a pairwise synteny map of sorts (I know there are ways to get the actual
> pairwise synteny, but I have something else I wish to test on this data). To elaborate:
>
> The goal would be to get information on which regions are aligned between human and any of the other 18 species – with base-precision for the start and stop of these
> alignments in gemomic coordinates. The output should look like:
>
> Human_chr human_chr_start human_chr_end target_chr target_chr_start target_chr_end score target_chr_strand
>
> However, as it turns out this is apparently not as straight-forward as I had hoped. The main steps, I suspect, would be:
>
> Deconstruct the aligned sequences (AlignSlice) into their underlying AlignSlice::Slices
>
> Determine the genomic coordinates of the underlying genomic slices (many genomic slices can make up one AlignSlice::Slice, possibly separated by larger GAP slices)
>
> Convert these genomic coordinates back to the AlignSlice::Slice (to get their position in the fake alignment coordinate system)
>
> Translate these alignment coordinates to human genomic coordinates via the human AlignSlice::Slice
>
> However, turns out that for each genomic slice, I bascially seem to end up with the global coordinates of the AlignSlice::Slice I started from. This would make sense if
> each AlignSlice::Slice would consist of exactly one genomic_align and this genomic_align would align full-length to the human query. Doubtful, I think.
>
> Clearly, I am missing something, somewhere ;) Perhaps there is some padding going on that I can’t seem to get rid of?
>
> I have attached the script if anyone has an idea of how to do this correctly.
>
>
>
> Cheers,
>
>
>
> Marc
>
>
>
More information about the Dev
mailing list