[ensembl-dev] retrieving sequence data from previous assemblies using fetch_by_region method

Andy Yates ayates at ebi.ac.uk
Thu Oct 9 10:43:09 BST 2014


Hi,

Sorry for the late reply. We are considering this. For the moment you should continue to use the grch37 database server. This is our official way of supporting sequence retrieval from the old human assembly.

Regards,

Andy

------------
Andrew Yates - Ensembl Support Coordinator
European Molecular Biology Laboratory
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andrewyatz
http://www.ensembl.org/

On 7 Oct 2014, at 11:13, Duarte Molha <duartemolha at gmail.com> wrote:

> Any chance you might consider implementing this?
> I think it would be very useful to be able to retrieve the underlying sequence in previous assemblies since the transform method can already give us the coordinates of any given feature on a older assembly.
> 
> 
> =========================
>      Duarte Miguel Paulo Molha      
>          http://about.me/duarte         
> =========================
> 
> On Tue, Oct 7, 2014 at 8:53 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
> Hi Duarte,
> 
> You can access GRCh37 from ensembldb.ensembl.org port number 3337. Ensembl databases currently only hold the contigs and assembly for a single assembly. That's why when you try to get sequence for GRCh37 in the GRCh38 database you get N's back
> 
> Hope this helps,
> 
> Andy
> 
> ------------
> Andrew Yates - Ensembl Support Coordinator
> European Molecular Biology Laboratory
> European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andrewyatz
> http://www.ensembl.org/
> 
> On 6 Oct 2014, at 17:23, Duarte Molha <duartemolha at gmail.com> wrote:
> 
> > Dear developers
> >
> > I have the latest API downloaded and I would want to create a scritp that could retrieve sequence information from a specified assembly
> >
> > so I have made a script that tries to accomplish this:
> >
> > so ...assuming this coordinates:
> >
> > chr: 5
> > from: 112043202
> > to: 112046226
> > strand = 1
> > assembly = GRCh38
> >
> >     my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chrom, $from, $to, $strand, $assembly );
> >
> >
> >         $seq        = $slice->seq();
> >
> > I can retrieve the dna sequence:
> >
> > >CHR5-112043202-112046226       chr5:112043202-112046226
> > AGTATATAATCACAT..............CTAAAAGCAAACA
> >
> >
> > However, if I give it the variables:
> >
> > chr: 5
> > from: 112043202
> > to: 112046226
> > strand = 1
> > assembly = GRCh37 or NCBI36
> >
> > i get :
> >
> > >CHR5-112043202-112046226       chr5:112043202-112046226
> > NNNNNNNNN.........NNNNNNNNNNNNNNNN
> >
> >
> > How can I get the correct underlying sequence?
> >
> > Best regards
> >
> > Duarte
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list