[ensembl-dev] SQL query to retrieve gene sequence...

Ed Gray gray_ed at hotmail.com
Tue Dec 16 21:42:12 GMT 2014


Plus, a SQL-expert is pretty hard to find in bioinformatics.  But I do agree
with Rob that the fastest systems use SQL stored procedures.  They can be
made to be blazingly fast, BTDT, but I hail originally from commercial
software development.

 

Having said that, I use the ensembl api for accessing ensembl data, when in
Rome, do as the Romans.  But I always wondered what was all the way under
the hood, just standard SQL and perl.  

 

I guess the bottom line is at the base of ensembl is it sql executes or sql
selects, updates and inserts?

 

 

From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of
Rob Sargent
Sent: Tuesday, December 16, 2014 12:08 PM
To: dev at ensembl.org
Subject: Re: [ensembl-dev] SQL query to retrieve gene sequence...

 

I don't know which server you are using but I had these do amazing things,
and amazingly fast. Try it, you might like it. :)

rjs

On 12/16/2014 09:58 AM, Andrew Yates wrote:

Hi Rob, 

 

I won't ever say it cannot be done just in the database. More I don't think
it'll be as performant or as easy as the alternatives myself & Kieron
suggested :).

 

Andy

 

------------
Andrew Yates - Ensembl Support Coordinator
European Molecular Biology Laboratory
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andrewyatz
http://www.ensembl.org/ 

 

On 16 Dec 2014, at 16:40, Rob Sargent <rob.sargent at utah.edu> wrote:

 

A function/stored-procedure using a recursive CTE might be the way for Steve
to go.

On 12/16/2014 09:28 AM, Andrew Yates wrote:

Hey Steve, 

 

The problem with using the database is that sequence is not stored against
the top-level sequences annotation is held against. Instead sequence is held
against the contig sequence regions which requires descending through the
assembly table an unspecified number of times (once for each mapping e.g.
chromosome -> supercontig -> contig). 

 

I would seriously *not* recommend doing this. Not only do you have to deal
with descending down the assembly but also having to think about
concatenating the sequence & paying attention to the orientation of
assembly. Instead you could use the Perl API (probably not an option
considering you're a Python guy), BioMart (you can access unspliced gene
sequence quite easily), the REST API or download the full genome sequence
from FTP and doing subslices. The faindex index tool from htslib/samtools is
pretty good at extracting arbitrary sequence from very large FASTA files.

 

Andy

 

------------
Andrew Yates - Ensembl Support Coordinator
European Molecular Biology Laboratory
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andrewyatz
http://www.ensembl.org/ 

 

On 16 Dec 2014, at 16:15, Steve Moss <gawbul at gmail.com> wrote:

 

Dear EnsEMBL Dev, 

 

I'm trying to write a raw SQL query to retrieve the sequence for the human
BRCA2 gene to compare different methods of accessing EnsEMBL data. I'm
currently doing the following, but getting an empty set.

 

SELECT SUBSTRING(sequence, g.seq_region_start, g.seq_region_end)

FROM dna d

JOIN gene g

ON d.seq_region_id = g.seq_region_id

WHERE g.stable_id="ENSG00000139618"

 

What am I missing? I think I'm falling short on working out the coord.
system mapping stuff. Any pointers to help in fixing please?

 

Cheers,


Steve

 

-- 


 


Steve Moss

about.me/gawbul


 Steve Moss on about.me
<http://d13pix9kaak6wt.cloudfront.net/signature/colorbar.png> 


 

_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info:
http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/







_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info:
http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

 

_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info:
http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

 






_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info:
http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/24a08656/attachment.html>


More information about the Dev mailing list