[ensembl-dev] SQL query to retrieve gene sequence...
Rob Sargent
rob.sargent at utah.edu
Tue Dec 16 17:08:06 GMT 2014
I don't know which server you are using but I had these do amazing
things, and amazingly fast. Try it, you might like it. :)
rjs
On 12/16/2014 09:58 AM, Andrew Yates wrote:
> Hi Rob,
>
> I won’t ever say it cannot be done just in the database. More I don’t
> think it’ll be as performant or as easy as the alternatives myself &
> Kieron suggested :).
>
> Andy
>
> ------------
> Andrew Yates - Ensembl Support Coordinator
> European Molecular Biology Laboratory
> European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andrewyatz
> http://www.ensembl.org/
>
>> On 16 Dec 2014, at 16:40, Rob Sargent <rob.sargent at utah.edu
>> <mailto:rob.sargent at utah.edu>> wrote:
>>
>> A function/stored-procedure using a recursive CTE might be the way
>> for Steve to go.
>>
>> On 12/16/2014 09:28 AM, Andrew Yates wrote:
>>> Hey Steve,
>>>
>>> The problem with using the database is that sequence is not stored
>>> against the top-level sequences annotation is held against. Instead
>>> sequence is held against the contig sequence regions which requires
>>> descending through the assembly table an unspecified number of times
>>> (once for each mapping e.g. chromosome -> supercontig -> contig).
>>>
>>> I would seriously *not* recommend doing this. Not only do you have
>>> to deal with descending down the assembly but also having to think
>>> about concatenating the sequence & paying attention to the
>>> orientation of assembly. Instead you could use the Perl API
>>> (probably not an option considering you’re a Python guy), BioMart
>>> (you can access unspliced gene sequence quite easily), the REST API
>>> or download the full genome sequence from FTP and doing subslices.
>>> The faindex index tool from htslib/samtools is pretty good at
>>> extracting arbitrary sequence from very large FASTA files.
>>>
>>> Andy
>>>
>>> ------------
>>> Andrew Yates - Ensembl Support Coordinator
>>> European Molecular Biology Laboratory
>>> European Bioinformatics Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton, Cambridge
>>> CB10 1SD, United Kingdom
>>> Tel: +44-(0)1223-492538
>>> Fax: +44-(0)1223-494468
>>> Skype: andrewyatz
>>> http://www.ensembl.org/
>>>
>>>> On 16 Dec 2014, at 16:15, Steve Moss <gawbul at gmail.com
>>>> <mailto:gawbul at gmail.com>> wrote:
>>>>
>>>> Dear EnsEMBL Dev,
>>>>
>>>> I'm trying to write a raw SQL query to retrieve the sequence for
>>>> the human BRCA2 gene to compare different methods of accessing
>>>> EnsEMBL data. I'm currently doing the following, but getting an
>>>> empty set.
>>>>
>>>> SELECT SUBSTRING(sequence, g.seq_region_start, g.seq_region_end)
>>>> FROM dna d
>>>> JOIN gene g
>>>> ON d.seq_region_id = g.seq_region_id
>>>> WHERE g.stable_id="ENSG00000139618"
>>>>
>>>> What am I missing? I think I'm falling short on working out the
>>>> coord. system mapping stuff. Any pointers to help in fixing please?
>>>>
>>>> Cheers,
>>>>
>>>> Steve
>>>>
>>>> --
>>>> Steve Moss
>>>> about.me/gawbul
>>>> Steve Moss on about.me
>>>>
>>>> <http://about.me/gawbul>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog:http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/f868a4a2/attachment.html>
More information about the Dev
mailing list