[ensembl-dev] querying a copy of the ensembl human variation 72_37
Jayaraman, Pushkala
pjayaraman at mcw.edu
Thu Sep 5 22:17:42 BST 2013
Makes sense..
:)
Thanks Andy!
-----Original Message-----
From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Andy Yates
Sent: Thursday, September 05, 2013 3:51 PM
To: Ensembl developers list
Cc: Ensembl developers list
Subject: Re: [ensembl-dev] querying a copy of the ensembl human variation 72_37
Hi
The main change was to switch from a dependent sub query to an outer join into failed_variation. The dependent subquery is evaluated for each row in the outer query. The IS NULL assertion ensures we only return the rows which failed to join into failed_variants i.e. the variants you want
Andy
Sent from my mobile.
On 5 Sep 2013, at 21:04, "Jayaraman, Pushkala" <pjayaraman at mcw.edu> wrote:
> This works!!
> Aaah.. I see your modified query..
>
> AND fv.variation_id IS NULL;
>
> I guess that's where I was going wrong..
> Thanks again!!
>
>
> Pushkala
>
> -----Original Message-----
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On
> Behalf Of Andy Yates
> Sent: Thursday, September 05, 2013 3:01 PM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] querying a copy of the ensembl human
> variation 72_37
>
> Hi,
>
> If you have loaded the human variation database using the files in ftp://ftp.ensembl.org/pub/release-72/mysql/homo_sapiens_variation_72_37/ and used the homo_sapiens_variation_72_37.sql.gz file to load your schema there should be no problems. All table structures and indexes required for Ensembl are specified in these files.
>
> However I think you just need to make some changes to your query to
> make it faster
>
> select
> vf.variation_name, vf.seq_region_id, vf.seq_region_start,
> vf.source_id, vf.minor_allele_freq, tv.feature_stable_id,
> tv.allele_string, tv.consequence_types, s.name FROM variation_feature
> vf JOIN transcript_variation tv USING (variation_feature_id) JOIN
> source s USING (source_id) LEFT JOIN failed_variation fv ON
> (vf.variation_id = fv.variation_id) WHERE tv.feature_stable_id IN
> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST0000047009
> 4','ENST00000533776','ENST00000544455')
> AND fv.variation_id IS NULL;
>
> This comes back with 13,584 rows in ~13 seconds on ensembldb.ensembl.org *. Can you try the re-written query on your DB please.
>
> Andy
>
> ------------
> Andrew Yates - Ensembl Core Software Project Leader European
> Bioinformatics Institute (EMBL-EBI) European Molecular Biology
> Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> http://www.ensembl.org/
>
> On 5 Sep 2013, at 20:38, "Jayaraman, Pushkala" <pjayaraman at mcw.edu> wrote:
>
>> Heres my query:
>>
>> select
>> vf.variation_name, vf.seq_region_id, vf.seq_region_start,
>> vf.source_id, s.name, vf.minor_allele_freq, tv.feature_stable_id,
>> tv.allele_string, tv.consequence_types from variation_feature vf,
>> transcript_variation tv, source s WHERE s.source_id=vf.source_id and
>> vf.variation_feature_id=tv.variation_feature_id and
>> --tv.feature_stable_id ='ENST00000544455'
>> tv.feature_stable_id in
>> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST000004700
>> 9
>> 4','ENST00000533776','ENST00000544455')
>> and
>> vf.variation_id not in
>> (
>> select
>> vf.variation_id
>> from
>> variation_feature vf,
>> transcript_variation tv,
>> failed_variation fv
>> where
>> fv.variation_id=vf.variation_id
>> and
>> vf.variation_feature_id=tv.variation_feature_id
>> and
>> --tv.feature_stable_id ='ENST00000544455'
>> tv.feature_stable_id in
>> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST000004700
>> 9
>> 4','ENST00000533776','ENST00000544455')
>> )
>>
>>
>>
>> When I try this on my local copy.. it runs for 5 mins with no result!
>>
>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On
>> Behalf Of Jayaraman, Pushkala
>> Sent: Thursday, September 05, 2013 2:25 PM
>> To: dev at ensembl.org
>> Subject: [ensembl-dev] querying a copy of the ensembl human variation
>> 72_37
>>
>> Hello,
>> I seem to be having trouble querying the failed_variation table I
>> cannot even seem to get count of the rows..
>> Is there any additional indexes you guys apply to the failed_variation table?
>>
>> Pushkala Jayaraman
>> Programmer/Analyst - Rat Genome Database Human and Molecular Genetics
>> Center Medical College of Wisconsin
>> 414-955-2229
>> http://rgd.mcw.edu
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
_______________________________________________
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list