[ensembl-dev] Dev Digest, Vol 54, Issue 16

Steve Moss gawbul at gmail.com
Tue Dec 16 20:22:50 GMT 2014


Dear Kieron,

Thanks for the detailed email. Just what I needed! Yes, I have included
those other options too, but hadn't thought about the FTP downloads. Will
look at those too, thanks!

I had looked at the assembly table and indeed seen 13,471 records. It makes
much more sense now thinking about it in terms of unassembled contigs.

I was hoping to include the query as a baseline benchmark to compare
against. I'm guessing it would need some optimisation though. Will have a
think about how I can best do that. Will have a place with DBI->trace() too.

Thanks again!

Cheers,

Steve

On 16 December 2014 at 16:48, <dev-request at ensembl.org> wrote:
>
> Message: 2
> Date: Tue, 16 Dec 2014 16:40:44 +0000
> From: Kieron Taylor <ktaylor at ebi.ac.uk>
> Subject: Re: [ensembl-dev] SQL query to retrieve gene sequence...
> To: Ensembl developers list <dev at ensembl.org>
> Message-ID: <5490608C.1080001 at ebi.ac.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dear Steve,
>
> Firstly, since you're benchmarking, make sure you include our REST
> service, BioMart, and maybe even slicing up FTP downloads. We generally
> recommend the API for this kind of thing as it deals with the schema for
> you, which is non-trivial.
>
> For the SQL, your query needs to be a lot more complicated. Some
> seq_region_ids correspond to single contigs, while others are
> assemblages of several. For this you need the assembly table, which maps
> out the components that are joined together to give the sequence of that
> 'top level' seq_region. Remember that our sequence is built up from the
> output of sequencing facilities, and our data structures reflect that.
>
> If you feed a seq_region_id of 131541 into the assembly table, you'll
> see just how many parts combine to form chromosome 13. The primary
> transcript of BRCA2 crosses over the edge of at least two of those many
> contigs, thus you must subselect from several seq_regions to give you
> multiple bits of sequence to concatenate together.
>
> At this stage, you may decide you'd rather not write the query. It may
> make sense to use a DBI trace on our Perl API to get all of the queries
> that are run during the call of $gene->seq .
>
> Regards,
>
> Kieron Taylor
> Ensembl Core
>


-- 

Steve Moss
about.me/gawbul
[image: Steve Moss on about.me]
  <http://about.me/gawbul>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141216/d80e4d9a/attachment.html>


More information about the Dev mailing list