[ensembl-dev] SQL query to retrieve gene sequence...

Rob Sargent rob.sargent at utah.edu
Tue Dec 16 17:08:06 GMT 2014


I don't know which server you are using but I had these do amazing 
things, and amazingly fast. Try it, you might like it. :)

rjs

On 12/16/2014 09:58 AM, Andrew Yates wrote:
> Hi Rob,
>
> I won’t ever say it cannot be done just in the database. More I don’t 
> think it’ll be as performant or as easy as the alternatives myself & 
> Kieron suggested :).
>
> Andy
>
> ------------
> Andrew Yates - Ensembl Support Coordinator
> European Molecular Biology Laboratory
> European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andrewyatz
> http://www.ensembl.org/
>
>> On 16 Dec 2014, at 16:40, Rob Sargent <rob.sargent at utah.edu 
>> <mailto:rob.sargent at utah.edu>> wrote:
>>
>> A function/stored-procedure using a recursive CTE might be the way 
>> for Steve to go.
>>
>> On 12/16/2014 09:28 AM, Andrew Yates wrote:
>>> Hey Steve,
>>>
>>> The problem with using the database is that sequence is not stored 
>>> against the top-level sequences annotation is held against. Instead 
>>> sequence is held against the contig sequence regions which requires 
>>> descending through the assembly table an unspecified number of times 
>>> (once for each mapping e.g. chromosome -> supercontig -> contig).
>>>
>>> I would seriously *not* recommend doing this. Not only do you have 
>>> to deal with descending down the assembly but also having to think 
>>> about concatenating the sequence & paying attention to the 
>>> orientation of assembly. Instead you could use the Perl API 
>>> (probably not an option considering you’re a Python guy), BioMart 
>>> (you can access unspliced gene sequence quite easily), the REST API 
>>> or download the full genome sequence from FTP and doing subslices. 
>>> The faindex index tool from htslib/samtools is pretty good at 
>>> extracting arbitrary sequence from very large FASTA files.
>>>
>>> Andy
>>>
>>> ------------
>>> Andrew Yates - Ensembl Support Coordinator
>>> European Molecular Biology Laboratory
>>> European Bioinformatics Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton, Cambridge
>>> CB10 1SD, United Kingdom
>>> Tel: +44-(0)1223-492538
>>> Fax: +44-(0)1223-494468
>>> Skype: andrewyatz
>>> http://www.ensembl.org/
>>>
>>>> On 16 Dec 2014, at 16:15, Steve Moss <gawbul at gmail.com 
>>>> <mailto:gawbul at gmail.com>> wrote:
>>>>
>>>> Dear EnsEMBL Dev,
>>>>
>>>> I'm trying to write a raw SQL query to retrieve the sequence for 
>>>> the human BRCA2 gene to compare different methods of accessing 
>>>> EnsEMBL data. I'm currently doing the following, but getting an 
>>>> empty set.
>>>>
>>>> SELECT SUBSTRING(sequence, g.seq_region_start, g.seq_region_end)
>>>> FROM dna d
>>>> JOIN gene g
>>>> ON d.seq_region_id = g.seq_region_id
>>>> WHERE g.stable_id="ENSG00000139618"
>>>>
>>>> What am I missing? I think I'm falling short on working out the 
>>>> coord. system mapping stuff. Any pointers to help in fixing please?
>>>>
>>>> Cheers,
>>>>
>>>> Steve
>>>>
>>>> -- 
>>>> Steve Moss
>>>> about.me/gawbul
>>>> Steve Moss on about.me
>>>>
>>>> <http://about.me/gawbul>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> Posting guidelines and subscribe/unsubscribe info: 
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog:http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: 
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/f868a4a2/attachment.html>


More information about the Dev mailing list