[ensembl-dev] SQL query to retrieve gene sequence...
Kieron Taylor
ktaylor at ebi.ac.uk
Tue Dec 16 16:40:44 GMT 2014
Dear Steve,
Firstly, since you're benchmarking, make sure you include our REST
service, BioMart, and maybe even slicing up FTP downloads. We generally
recommend the API for this kind of thing as it deals with the schema for
you, which is non-trivial.
For the SQL, your query needs to be a lot more complicated. Some
seq_region_ids correspond to single contigs, while others are
assemblages of several. For this you need the assembly table, which maps
out the components that are joined together to give the sequence of that
'top level' seq_region. Remember that our sequence is built up from the
output of sequencing facilities, and our data structures reflect that.
If you feed a seq_region_id of 131541 into the assembly table, you'll
see just how many parts combine to form chromosome 13. The primary
transcript of BRCA2 crosses over the edge of at least two of those many
contigs, thus you must subselect from several seq_regions to give you
multiple bits of sequence to concatenate together.
At this stage, you may decide you'd rather not write the query. It may
make sense to use a DBI trace on our Perl API to get all of the queries
that are run during the call of $gene->seq .
Regards,
Kieron Taylor
Ensembl Core
On 16/12/2014 16:15, Steve Moss wrote:
> Dear EnsEMBL Dev,
>
> I'm trying to write a raw SQL query to retrieve the sequence for the
> human BRCA2 gene to compare different methods of accessing EnsEMBL data.
> I'm currently doing the following, but getting an empty set.
>
> SELECT SUBSTRING(sequence, g.seq_region_start, g.seq_region_end)
> FROM dna d
> JOIN gene g
> ON d.seq_region_id = g.seq_region_id
> WHERE g.stable_id="ENSG00000139618"
>
> What am I missing? I think I'm falling short on working out the coord.
> system mapping stuff. Any pointers to help in fixing please?
>
> Cheers,
>
> Steve
>
> --
> Steve Moss
> about.me/gawbul
> Steve Moss on about.me
>
> <http://about.me/gawbul>
>
More information about the Dev
mailing list