[ensembl-dev] retrieving sequence data from previous assemblies using fetch_by_region method

mag mr6 at ebi.ac.uk
Wed Dec 10 10:24:30 GMT 2014


Hi Duarte,

Historically, Ensembl only supports one assembly for a species at a 
given time.

Storing sequence for more than one assembly could lead to undesired 
consequences.
For example, contigs being shared between assemblies, we will end up 
with contigs which are not part of the current assembly, but necessary 
to provide sequence for the previous assembly.

We are aware though that this could prove incredibly useful.
To test those changes, we have loaded the additional data in the human 
database used for the REST server.
It is now possible to retrieve sequence for any past or present human 
assembly (back to NCBI34)

http://rest.ensembl.org/sequence/region/human/2:3118382-3127806?content-type=text/x-fasta
or
http://rest.ensembl.org/sequence/region/human/2:3118382-3127806?content-type=text/x-fasta;coord_system_version=GRCh38
will return the sequence for chromosome 2 in GRCh38

while
http://rest.ensembl.org/sequence/region/human/2:3118382-3127806?content-type=text/x-fasta;coord_system_version=GRCh37
will return the sequence for chromosome 2 in GRCh37

Please try it out and let us know if you find any issues with this.

If this is successfull, we will consider extending this to the main 
human databases, which means you would be able to retrieve sequence for 
previous assemblies via the API.


Regards,
Magali

On 07/10/2014 11:13, Duarte Molha wrote:
> Any chance you might consider implementing this?
> I think it would be very useful to be able to retrieve the underlying 
> sequence in previous assemblies since the transform method can already 
> give us the coordinates of any given feature on a older assembly.
>
>
> =========================
>      Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On Tue, Oct 7, 2014 at 8:53 AM, Andy Yates <ayates at ebi.ac.uk 
> <mailto:ayates at ebi.ac.uk>> wrote:
>
>     Hi Duarte,
>
>     You can access GRCh37 from ensembldb.ensembl.org
>     <http://ensembldb.ensembl.org> port number 3337. Ensembl databases
>     currently only hold the contigs and assembly for a single
>     assembly. That's why when you try to get sequence for GRCh37 in
>     the GRCh38 database you get N's back
>
>     Hope this helps,
>
>     Andy
>
>     ------------
>     Andrew Yates - Ensembl Support Coordinator
>     European Molecular Biology Laboratory
>     European Bioinformatics Institute
>     Wellcome Trust Genome Campus
>     Hinxton, Cambridge
>     CB10 1SD, United Kingdom
>     Tel: +44-(0)1223-492538 <tel:%2B44-%280%291223-492538>
>     Fax: +44-(0)1223-494468 <tel:%2B44-%280%291223-494468>
>     Skype: andrewyatz
>     http://www.ensembl.org/
>
>     On 6 Oct 2014, at 17:23, Duarte Molha <duartemolha at gmail.com
>     <mailto:duartemolha at gmail.com>> wrote:
>
>     > Dear developers
>     >
>     > I have the latest API downloaded and I would want to create a
>     scritp that could retrieve sequence information from a specified
>     assembly
>     >
>     > so I have made a script that tries to accomplish this:
>     >
>     > so ...assuming this coordinates:
>     >
>     > chr: 5
>     > from: 112043202
>     > to: 112046226
>     > strand = 1
>     > assembly = GRCh38
>     >
>     >     my $slice = $slice_adaptor->fetch_by_region( 'chromosome',
>     $chrom, $from, $to, $strand, $assembly );
>     >
>     >
>     >         $seq        = $slice->seq();
>     >
>     > I can retrieve the dna sequence:
>     >
>     > >CHR5-112043202-112046226  chr5:112043202-112046226
>     > AGTATATAATCACAT..............CTAAAAGCAAACA
>     >
>     >
>     > However, if I give it the variables:
>     >
>     > chr: 5
>     > from: 112043202
>     > to: 112046226
>     > strand = 1
>     > assembly = GRCh37 or NCBI36
>     >
>     > i get :
>     >
>     > >CHR5-112043202-112046226  chr5:112043202-112046226
>     > NNNNNNNNN.........NNNNNNNNNNNNNNNN
>     >
>     >
>     > How can I get the correct underlying sequence?
>     >
>     > Best regards
>     >
>     > Duarte
>     > _______________________________________________
>     > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     > Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     > Ensembl Blog: http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141210/e84555e7/attachment.html>


More information about the Dev mailing list