[ensembl-dev] querying a copy of the ensembl human variation 72_37
Andy Yates
ayates at ebi.ac.uk
Thu Sep 5 21:00:34 BST 2013
Hi,
If you have loaded the human variation database using the files in ftp://ftp.ensembl.org/pub/release-72/mysql/homo_sapiens_variation_72_37/ and used the homo_sapiens_variation_72_37.sql.gz file to load your schema there should be no problems. All table structures and indexes required for Ensembl are specified in these files.
However I think you just need to make some changes to your query to make it faster
select
vf.variation_name, vf.seq_region_id, vf.seq_region_start, vf.source_id, vf.minor_allele_freq, tv.feature_stable_id, tv.allele_string, tv.consequence_types, s.name
FROM variation_feature vf
JOIN transcript_variation tv USING (variation_feature_id)
JOIN source s USING (source_id)
LEFT JOIN failed_variation fv ON (vf.variation_id = fv.variation_id)
WHERE tv.feature_stable_id IN ('ENST00000380152','ENST00000530893','ENST00000528762','ENST00000470094','ENST00000533776','ENST00000544455')
AND fv.variation_id IS NULL;
This comes back with 13,584 rows in ~13 seconds on ensembldb.ensembl.org *. Can you try the re-written query on your DB please.
Andy
------------
Andrew Yates - Ensembl Core Software Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
http://www.ensembl.org/
On 5 Sep 2013, at 20:38, "Jayaraman, Pushkala" <pjayaraman at mcw.edu> wrote:
> Heres my query:
>
> select
> vf.variation_name, vf.seq_region_id, vf.seq_region_start, vf.source_id,
> s.name, vf.minor_allele_freq, tv.feature_stable_id, tv.allele_string, tv.consequence_types
> from
> variation_feature vf,
> transcript_variation tv,
> source s
> WHERE
> s.source_id=vf.source_id and
> vf.variation_feature_id=tv.variation_feature_id and
> --tv.feature_stable_id ='ENST00000544455'
> tv.feature_stable_id in
> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST00000470094','ENST00000533776','ENST00000544455')
> and
> vf.variation_id not in
> (
> select
> vf.variation_id
> from
> variation_feature vf,
> transcript_variation tv,
> failed_variation fv
> where
> fv.variation_id=vf.variation_id
> and
> vf.variation_feature_id=tv.variation_feature_id
> and
> --tv.feature_stable_id ='ENST00000544455'
> tv.feature_stable_id in
> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST00000470094','ENST00000533776','ENST00000544455')
> )
>
>
>
> When I try this on my local copy.. it runs for 5 mins with no result!
>
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Jayaraman, Pushkala
> Sent: Thursday, September 05, 2013 2:25 PM
> To: dev at ensembl.org
> Subject: [ensembl-dev] querying a copy of the ensembl human variation 72_37
>
> Hello,
> I seem to be having trouble querying the failed_variation table
> I cannot even seem to get count of the rows..
> Is there any additional indexes you guys apply to the failed_variation table?
>
> Pushkala Jayaraman
> Programmer/Analyst - Rat Genome Database
> Human and Molecular Genetics Center
> Medical College of Wisconsin
> 414-955-2229
> http://rgd.mcw.edu
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list