[ensembl-dev] SQL query to retrieve gene sequence...

Rob Sargent rob.sargent at utah.edu
Tue Dec 16 16:40:36 GMT 2014


A function/stored-procedure using a recursive CTE might be the way for 
Steve to go.

On 12/16/2014 09:28 AM, Andrew Yates wrote:
> Hey Steve,
>
> The problem with using the database is that sequence is not stored 
> against the top-level sequences annotation is held against. Instead 
> sequence is held against the contig sequence regions which requires 
> descending through the assembly table an unspecified number of times 
> (once for each mapping e.g. chromosome -> supercontig -> contig).
>
> I would seriously *not* recommend doing this. Not only do you have to 
> deal with descending down the assembly but also having to think about 
> concatenating the sequence & paying attention to the orientation of 
> assembly. Instead you could use the Perl API (probably not an option 
> considering you’re a Python guy), BioMart (you can access unspliced 
> gene sequence quite easily), the REST API or download the full genome 
> sequence from FTP and doing subslices. The faindex index tool from 
> htslib/samtools is pretty good at extracting arbitrary sequence from 
> very large FASTA files.
>
> Andy
>
> ------------
> Andrew Yates - Ensembl Support Coordinator
> European Molecular Biology Laboratory
> European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andrewyatz
> http://www.ensembl.org/
>
>> On 16 Dec 2014, at 16:15, Steve Moss <gawbul at gmail.com 
>> <mailto:gawbul at gmail.com>> wrote:
>>
>> Dear EnsEMBL Dev,
>>
>> I'm trying to write a raw SQL query to retrieve the sequence for the 
>> human BRCA2 gene to compare different methods of accessing EnsEMBL 
>> data. I'm currently doing the following, but getting an empty set.
>>
>> SELECT SUBSTRING(sequence, g.seq_region_start, g.seq_region_end)
>> FROM dna d
>> JOIN gene g
>> ON d.seq_region_id = g.seq_region_id
>> WHERE g.stable_id="ENSG00000139618"
>>
>> What am I missing? I think I'm falling short on working out the 
>> coord. system mapping stuff. Any pointers to help in fixing please?
>>
>> Cheers,
>>
>> Steve
>>
>> -- 
>> Steve Moss
>> about.me/gawbul
>> Steve Moss on about.me
>>
>> <http://about.me/gawbul>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: 
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/71940f4a/attachment.html>


More information about the Dev mailing list