[ensembl-dev] Help with querying the homo_sapiens_variation_72_37 database.

Will McLaren wm2 at ebi.ac.uk
Thu Aug 29 09:56:17 BST 2013


Hello,

Your query is almost there; the issue is that you are restricting the
results by coordinates to the boundaries of the gene.

The consequence table that you are referring to includes variants
classified as "Upstream gene variant" and "Downstream gene variant" - these
are variants that fall within 5kb of the transcript boundaries.

If you drop the following lines from your statement you should get the
correct result:

*vf.seq_region_id=27513 and*

*vf.seq_region_start>=32889611 and*

*vf.seq_region_end<=32973805 and*

One further source of discrepancy may be failed variants - by default we do
not show variants that have been flagged as failed. Depending on whether
you want to include these, you may want to left join to failed_variation
via vf.variation_id to check the failed status.

Hope this helps!

Will McLaren
Ensembl Variation


On 29 August 2013 05:38, Jayaraman, Pushkala <pjayaraman at mcw.edu> wrote:

>  Hello, ****
>
> Im currently a developer at the Rat Genome database, Human and Molecular
> Genetics Center, MCW. I’ve currently been assigned a project wherein the
> PIs are referencing  Ensembl gene pages and gene sequence and variant
> information. The first step in my application pipeline is to get the
> sequence for the gene of interest and all the variation consequences within
> that genic region. ****
>
> For eg. Here:****
>
>
> http://www.ensembl.org/Homo_sapiens/Gene/Variation_Gene/Table?g=ENSG00000139618;r=13:32889611-32973805#ALL_tablePanel
> ****
>
> ** **
>
> ** **
>
> now I have access to your MYSQL database for homo_sapiens_variation_72_37
> and also have a database dump ( since we thought creating a local copy
> would make more sense) . im using the useastdb.ensembl.org port 5306. ****
>
> The problem arises when I try and simulate a query from the database that
> gives me the variants exactly like they are on the gene report page above.
> ****
>
> For the same gene using the homo_sapiens_variation_72_37 schema, I have
> the following test query:****
>
> ** **
>
> *select vf.variation_name, vf.seq_region_id, vf.seq_region_start,
> vf.source_id, s.name, vf.minor_allele_freq, tv.feature_stable_id,
> tv.allele_string, tv.consequence_types  *
>
> *from homo_sapiens_variation_72_37.variation_feature vf,
> homo_sapiens_variation_72_37.transcript_variation tv, *
>
> *homo_sapiens_variation_72_37.source s*
>
> *WHERE *
>
> *s.source_id=vf.source_id and*
>
> *vf.seq_region_id=27513 and *
>
> *vf.seq_region_start>=32889611 and *
>
> *vf.seq_region_end<=32973805 and*
>
> *vf.variation_feature_id=tv.variation_feature_id and *
>
> *tv.feature_stable_id in (*
>
> *select t.stable_id from homo_sapiens_core_72_37.transcript t*
>
> *where t.gene_id=609208*
>
> *)*
>
> *order by tv.feature_stable_id;*
>
> ** **
>
> ** **
>
> where seq_region_id is Chr 13 and region start corresponds to the start of
> the variant. ****
>
> Even then my count of the data is only 13270 while the web page gives me
> their count as 13584. ****
>
> ** **
>
> The thing is that I need to get in touch with any developer at Ensembl who
> knows this stuff well and can point me in the direction creating the
> correct  query to get the same number of variation consequences as the
> webpage. ****
>
> ** **
>
> Please do let me know if you can help me with this, or if you know anyone
> who can help me out with this.. ****
>
> Since this is just the first stage of the project, im looking for a bunch
> of solid examples wherein my query and the results on the webpage are
> spot-on correct. ****
>
> Hope you guys have a good rest of your summer!****
>
> ** **
>
> Pushkala Jayaraman****
>
> Programmer/Analyst - Rat Genome Database****
>
> Human and Molecular Genetics Center****
>
> Medical College of Wisconsin****
>
> 414-955-2229****
>
> http://rgd.mcw.edu****
>
> ** **
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130829/b1641b67/attachment.html>


More information about the Dev mailing list