[ensembl-dev] querying a copy of the ensembl human variation 72_37

Andy Yates ayates at ebi.ac.uk
Thu Sep 5 21:00:34 BST 2013


Hi,

If you have loaded the human variation database using the files in ftp://ftp.ensembl.org/pub/release-72/mysql/homo_sapiens_variation_72_37/ and used the homo_sapiens_variation_72_37.sql.gz file to load your schema there should be no problems. All table structures and indexes required for Ensembl are specified in these files. 

However I think you just need to make some changes to your query to make it faster

select
vf.variation_name, vf.seq_region_id, vf.seq_region_start, vf.source_id, vf.minor_allele_freq, tv.feature_stable_id, tv.allele_string, tv.consequence_types, s.name
FROM variation_feature vf 
JOIN transcript_variation tv USING (variation_feature_id)
JOIN source s USING (source_id)
LEFT JOIN failed_variation fv ON (vf.variation_id = fv.variation_id)
WHERE tv.feature_stable_id IN ('ENST00000380152','ENST00000530893','ENST00000528762','ENST00000470094','ENST00000533776','ENST00000544455')
AND fv.variation_id IS NULL;

This comes back with 13,584 rows in ~13 seconds on ensembldb.ensembl.org *. Can you try the re-written query on your DB please.

Andy

------------
Andrew Yates - Ensembl Core Software Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
http://www.ensembl.org/

On 5 Sep 2013, at 20:38, "Jayaraman, Pushkala" <pjayaraman at mcw.edu> wrote:

> Heres my query:
>  
> select
> vf.variation_name, vf.seq_region_id, vf.seq_region_start, vf.source_id,
> s.name, vf.minor_allele_freq, tv.feature_stable_id, tv.allele_string, tv.consequence_types 
> from
> variation_feature vf,
> transcript_variation tv,
> source s
> WHERE
> s.source_id=vf.source_id and
> vf.variation_feature_id=tv.variation_feature_id and
> --tv.feature_stable_id ='ENST00000544455'
> tv.feature_stable_id in
> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST00000470094','ENST00000533776','ENST00000544455')
> and
> vf.variation_id not in
> (
> select
> vf.variation_id
> from
> variation_feature vf,
> transcript_variation tv,
> failed_variation fv
> where
> fv.variation_id=vf.variation_id
> and
> vf.variation_feature_id=tv.variation_feature_id
> and
> --tv.feature_stable_id ='ENST00000544455'
> tv.feature_stable_id in
> ('ENST00000380152','ENST00000530893','ENST00000528762','ENST00000470094','ENST00000533776','ENST00000544455')
> )
>  
>  
>  
> When I try this on my local copy.. it runs for 5 mins with no result!
>  
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Jayaraman, Pushkala
> Sent: Thursday, September 05, 2013 2:25 PM
> To: dev at ensembl.org
> Subject: [ensembl-dev] querying a copy of the ensembl human variation 72_37
>  
> Hello,
> I seem to be having trouble querying the failed_variation table
> I cannot even seem to get count of the rows..
> Is there any additional indexes you guys apply to the failed_variation table?
>  
> Pushkala Jayaraman
> Programmer/Analyst - Rat Genome Database
> Human and Molecular Genetics Center
> Medical College of Wisconsin
> 414-955-2229
> http://rgd.mcw.edu
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list