[ensembl-dev] Help with querying the homo_sapiens_variation_72_37 database.

Jayaraman, Pushkala pjayaraman at mcw.edu
Thu Aug 29 05:38:59 BST 2013


Hello,
Im currently a developer at the Rat Genome database, Human and Molecular Genetics Center, MCW. I've currently been assigned a project wherein the PIs are referencing  Ensembl gene pages and gene sequence and variant information. The first step in my application pipeline is to get the sequence for the gene of interest and all the variation consequences within that genic region.
For eg. Here:
http://www.ensembl.org/Homo_sapiens/Gene/Variation_Gene/Table?g=ENSG00000139618;r=13:32889611-32973805#ALL_tablePanel


now I have access to your MYSQL database for homo_sapiens_variation_72_37 and also have a database dump ( since we thought creating a local copy would make more sense) . im using the useastdb.ensembl.org port 5306.
The problem arises when I try and simulate a query from the database that gives me the variants exactly like they are on the gene report page above.
For the same gene using the homo_sapiens_variation_72_37 schema, I have the following test query:

select vf.variation_name, vf.seq_region_id, vf.seq_region_start, vf.source_id, s.name, vf.minor_allele_freq, tv.feature_stable_id, tv.allele_string, tv.consequence_types
from homo_sapiens_variation_72_37.variation_feature vf, homo_sapiens_variation_72_37.transcript_variation tv,
homo_sapiens_variation_72_37.source s
WHERE
s.source_id=vf.source_id and
vf.seq_region_id=27513 and
vf.seq_region_start>=32889611 and
vf.seq_region_end<=32973805 and
vf.variation_feature_id=tv.variation_feature_id and
tv.feature_stable_id in (
select t.stable_id from homo_sapiens_core_72_37.transcript t
where t.gene_id=609208
)
order by tv.feature_stable_id;


where seq_region_id is Chr 13 and region start corresponds to the start of the variant.
Even then my count of the data is only 13270 while the web page gives me their count as 13584.

The thing is that I need to get in touch with any developer at Ensembl who knows this stuff well and can point me in the direction creating the correct  query to get the same number of variation consequences as the webpage.

Please do let me know if you can help me with this, or if you know anyone who can help me out with this..
Since this is just the first stage of the project, im looking for a bunch of solid examples wherein my query and the results on the webpage are spot-on correct.
Hope you guys have a good rest of your summer!

Pushkala Jayaraman
Programmer/Analyst - Rat Genome Database
Human and Molecular Genetics Center
Medical College of Wisconsin
414-955-2229
http://rgd.mcw.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130829/fae45ffb/attachment.html>


More information about the Dev mailing list