[ensembl-dev] Another API question about Alignments betweenmouse / human

Kathryn Beal kbeal at ebi.ac.uk
Mon Jan 28 16:03:45 GMT 2013


Hi,
How about:

    foreach my $aln_slice (@{$align_slice->get_all_Slices()}) {
        my $slices = $aln_slice->get_all_underlying_Slices;
        foreach my $this_slice (@$slices) {
            print $aln_slice->genome_db->name . ":"  . $this_slice->seq_region_name . ":" . $this_slice->start . ":" . $this_slice->end . ":" . $this_slice->strand . "\n";
        }
    }

This gives the following output:

mus_musculus:2:28403186:28403879:1
homo_sapiens:9:100149814:100149843:-1
homo_sapiens:GL000220.1:11378:11383:-1
homo_sapiens:9:135978315:135979072:-1

Cheers
Kathryn

Kathryn Beal, PhD
European Bioinformatics Institute  (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK 
Tel. +44 (0)1223 494458
www.ensembl.org

On 28 Jan 2013, at 15:40, Eduardo Andrés León wrote:

> umm, but I lost the coordinates (which I really need them)
> 
> 
> On 28 Jan 2013, at 14:35, "Kathryn Beal" <kbeal at ebi.ac.uk> wrote:
> 
>> Hi,
>> You can use the AlignSlice to get the alignment, i.e. add the following lines:
>> 
>> my $align_slice_adaptor =
>>       Bio::EnsEMBL::Registry->get_adaptor("Multi", "compara", "AlignSlice");
>> 
>> my $align_slice = $align_slice_adaptor->fetch_by_Slice_MethodLinkSpeciesSet($source_org_slice, $methodLinkSpeciesSet, 'expanded', 'restrict');
>> 
>> print $alignIO $align_slice->get_SimpleAlign;
>> 
>> I also used the "clustalw" as the format for AlignIO.
>> 
>> I hope that helps,
>> Cheers
>> Kathryn
>> 
>> Kathryn Beal, PhD
>> European Bioinformatics Institute  (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, UK 
>> Tel. +44 (0)1223 494458
>> www.ensembl.org
>> 
>> On 28 Jan 2013, at 10:27, Eduardo Andrés León wrote:
>> 
>>> Dear all,
>>> 	I'm trying to match the mouse sequence(2-28403186:28403879)  into the human genome using ensembl 67.
>>> 
>>> Using the web, I've got the following :
>>> 
>>> http://may2012.archive.ensembl.org/Mus_musculus/Location/Compara_Alignments?align=410&db=core&r=2%3A28403186-28403879
>>> 
>>> mus_musculus:2 > 	chromosome:NCBIM37:2:28403186:28403879:1
>>> homo_sapiens:9 > 	chromosome:GRCh37:9:100149814:100149843:-1
>>> supercontig:GRCh37:GL000220.1:11378:11383:-1
>>> chromosome:GRCh37:9:135978315:135979072:-1
>>> 
>>> 
>>> 
>>> But when I use the API, I obtain more than 55 fragments (attached as a zip file) :
>>> 
>>> <alignment.3012.17691580863.txt.zip>
>>> 
>>> The code for extracting the data is the following :
>>> 
>>> getAlignMent(2,28403186,28403879);
>>> 
>>> 	sub getAlignMent{
>>> 		my ($source_org_chr,$source_org_start,$source_org_end)=@_;
>>> 
>>> 		#Auto-configure the registry
>>> 		Bio::EnsEMBL::Registry->load_registry_from_db(
>>> 			-host=>"ensembldb.cnio.es",
>>> 			-user=>"ensembl");
>>> 
>>> 
>>> 		# Get the Compara Adaptor for MethodLinkSpeciesSets
>>> 		my $method_link_species_set_adaptor =
>>> 		    Bio::EnsEMBL::Registry->get_adaptor(
>>> 		      "Multi", "compara", "MethodLinkSpeciesSet");
>>> 
>>> 		# Get the MethodLinkSpecieSet for source_org-mouse lastz-net alignments
>>> 		my $methodLinkSpeciesSet = $method_link_species_set_adaptor->
>>> 			fetch_by_method_link_type_registry_aliases("BLASTZ_NET", ["mouse", "human"]);
>>> 
>>> 		# Define the start and end positions for the alignment
>>> 		# Get the source_org *core* Adaptor for Slices
>>> 		my $source_org_slice_adaptor =
>>> 		    Bio::EnsEMBL::Registry->get_adaptor(
>>> 		      "mouse", "core", "Slice");
>>> 
>>> 		# Get the slice corresponding to the region of interest
>>> 		my $source_org_slice = $source_org_slice_adaptor->fetch_by_region(
>>> 		    "chromosome", $source_org_chr, $source_org_start, $source_org_end);
>>> 
>>> 		# Get the Compara Adaptor for GenomicAlignBlocks
>>> 		my $genomic_align_block_adaptor =
>>> 		    Bio::EnsEMBL::Registry->get_adaptor(
>>> 		      "Multi", "compara", "GenomicAlignBlock");
>>> 
>>> 		# The fetch_all_by_MethodLinkSpeciesSet_Slice() returns a ref.
>>> 		# to an array of GenomicAlingBlock objects (source_org is the reference species) 
>>> 		my $all_genomic_align_blocks = $genomic_align_block_adaptor->
>>> 		    fetch_all_by_MethodLinkSpeciesSet_Slice(
>>> 		        $methodLinkSpeciesSet, $source_org_slice, undef, undef, "restrict");
>>> 
>>> 		# set up an AlignIO to format SimpleAlign output
>>> 		my $outputAl="alignment." . rand(10000) . ".txt";
>>> 		open(OUT,">$outputAl") || die "3 $!\n";
>>> 		my $alignIO = Bio::AlignIO->newFh(-interleaved => 0,
>>> 		                                  -fh => \*OUT,
>>> 		                                  -format => 'pfam',
>>> 		                                  -idlength => 20);
>>> 
>>> 		# print the restricted alignments
>>> 		if (scalar(@{$all_genomic_align_blocks})==0){
>>> 			open(NMR,">>chr$source_org_chr\_No_mapping_regions.txt") || die "$!\n";
>>> 			print NMR "$source_org_chr\t$source_org_start\t$source_org_end\n";
>>> 			close NMR;
>>> 			return();
>>> 		}
>>> 		else{
>>> 			foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>>> 				print $alignIO $genomic_align_block->get_SimpleAlign;
>>> 			}
>>> 			close OUT;
>>> 		}
>>> 	}
>>> 
>>> This same happens with other segments, but not all of them.
>>> 
>>> So, can anybody tell me how to extract the same records the web shows ?
>>> 
>>> Regards and thanks in advance !
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> ===================================================
> Eduardo Andrés León
> Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 5054/3063)
> e-mail: eandres at cnio.es        Fax: (+34) 91 224 69 76
> Unidad de Bioinformática       Bioinformatics Unit
> Centro Nacional de Investigaciones Oncológicas
> C.P.: 28029                Zip Code: 28029
> C/. Melchor Fernández Almagro, 3    Madrid (Spain)
> http://bioinfo.cnio.es	http://bioinfo.cnio.es/people/eandres
> ===================================================
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130128/f6d66a2d/attachment.html>


More information about the Dev mailing list