[ensembl-dev] SQL query to retrieve gene sequence...

Kieron Taylor ktaylor at ebi.ac.uk
Tue Dec 16 16:40:44 GMT 2014


Dear Steve,

Firstly, since you're benchmarking, make sure you include our REST 
service, BioMart, and maybe even slicing up FTP downloads. We generally 
recommend the API for this kind of thing as it deals with the schema for 
you, which is non-trivial.

For the SQL, your query needs to be a lot more complicated. Some 
seq_region_ids correspond to single contigs, while others are 
assemblages of several. For this you need the assembly table, which maps 
out the components that are joined together to give the sequence of that 
'top level' seq_region. Remember that our sequence is built up from the 
output of sequencing facilities, and our data structures reflect that.

If you feed a seq_region_id of 131541 into the assembly table, you'll 
see just how many parts combine to form chromosome 13. The primary 
transcript of BRCA2 crosses over the edge of at least two of those many 
contigs, thus you must subselect from several seq_regions to give you 
multiple bits of sequence to concatenate together.

At this stage, you may decide you'd rather not write the query. It may 
make sense to use a DBI trace on our Perl API to get all of the queries 
that are run during the call of $gene->seq .

Regards,

Kieron Taylor
Ensembl Core



On 16/12/2014 16:15, Steve Moss wrote:
> Dear EnsEMBL Dev,
>
> I'm trying to write a raw SQL query to retrieve the sequence for the
> human BRCA2 gene to compare different methods of accessing EnsEMBL data.
> I'm currently doing the following, but getting an empty set.
>
> SELECT SUBSTRING(sequence, g.seq_region_start, g.seq_region_end)
> FROM dna d
> JOIN gene g
> ON d.seq_region_id = g.seq_region_id
> WHERE g.stable_id="ENSG00000139618"
>
> What am I missing? I think I'm falling short on working out the coord.
> system mapping stuff. Any pointers to help in fixing please?
>
> Cheers,
>
> Steve
>
> --
> Steve Moss
> about.me/gawbul
> Steve Moss on about.me
>
> <http://about.me/gawbul>
>





More information about the Dev mailing list