[ensembl-dev] Retrieving EPO ancestral sequences with compara perl API

Minggao Liang m.liang at mail.utoronto.ca
Wed Sep 29 17:41:12 BST 2021


Hello,

I was wondering if it is possible to extract ancestral sequences from the EPO MSA such as those that are returned in the browser when looking up a multiple alignment for a given region, ie.

Ancestral sequences 15 › ((((((homo_sapiens,(pan_paniscus,pan_troglodytes)),gorilla_gorilla),pongo_abelii),(chlorocebus_sabaeus,((macaca_fascicularis,macaca_mulatta),(papio_anubis,theropithecus_gelada)))),((microtus_ochrogaster,((mus_caroli,mus_pahari),rattus_norvegicus)),oryctolagus_cuniculus)),(((((((bos_grunniens,(bos_indicus_hybrid,(bos_tauruse-05,bos_taurus))),(ovis_aries,ovis_aries_rambouillet)),cervus_hanglu_yarkandensis),physeter_catodon),(catagonus_wagneri,sus_scrofa)),(((canis_lupus_dingoe-05,(canis_lupus_familiaris,canis_lupus_familiarisgreatdanee-05)e-05),(ursus_thibetanus_thibetanus,zalophus_californianus)),(((felis_catus,lynx_canadensis),(panthera_leo,panthera_pardus)),suricata_suricatta))),equus_caballus));

We are currently using the compara perl API (for ensembl/compara v102) to retrieve aligned (gapped) sequences from genomic_align objects as below.
Is there any way to retrieve ancestral the above ancestral sequences in similar fashion? If so, which mysql tables need to be installed?

my $method_link_species_set_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'MethodLinkSpeciesSet');
my $method_link_species_set = $method_link_species_set_adaptor->fetch_by_method_link_type_species_set_name("EPO", "mammals");

my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor('mus_musculus', 'core', 'Slice');
my $genomic_align_block_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'GenomicAlignBlock');

# peak information taken from a bed file of ChiP-seq peaks
my $query_slice = $slice_adaptor->fetch_by_region('toplevel', $peak_chr, $peak_start, $peak_end);
my $genomic_align_blocks = $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $query_slice);

foreach my $this_genomic_align_block (@$genomic_align_blocks) {
   my $genomic_align_array = $this_genomic_align_block->genomic_align_array();
   foreach my $genomic_align (@$genomic_align_array) {
      if ((List::MoreUtils::first_index { $_ eq $genomic_align->genome_db()->name() } @dbnames_used) != -1) {
      # Loops thru each species present in the genomic_align and only process if matching a predefined list of species used
      # example @dbnames: canis_lupus_familiaris,homo_sapiens,mus_musculus
         my $genome_id=$genomic_align->genome_db()->name();
         $orthologous_sequences{$query_peak}->{$genome_id} = $genomic_align->aligned_sequence();

Thanks in advance!
Minggao (Michael) Liang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210929/cd968249/attachment.html>


More information about the Dev mailing list