[ensembl-dev] getting watson snps

Andrea Edwards edwardsa at cs.man.ac.uk
Sun Dec 12 14:02:28 GMT 2010


Hi

The dump files would be great but I am also retreiving lots of other 
information about the snps with the snps and that might not necessarily 
be in your dump file so i think i have to try other options too.

This is what i have tried so far to get the watson snps and not getting 
anywhere fast :)

1. Written perl script to download them from ensembl human variation 
database. This works but will take over a month to get all the snps at 
the rate at which it seems to be running and i imagine you'll block my  
ip address if i leave it running :) Plus I can't leave it a month anyway.

2. I've tried to install the human variation database locally but that 
also seems to be having problems. It has been installing the allele 
table now for 3 days i think. It is running on a very slow machine but 
there are far bigger tables than the allele table so i dread to think 
how long they will take. I tried to get access to a better machine but i 
wasn't give enough hard disk space but perhaps that will solve the 
problem! How long should it take to install the human variation database 
(roughly) on a 64 bit linux machine with 2 gig of ram and intel xeon @ 
2.27GHz? Will it take hours or days?

Is there anything else i can try. I do appreciate that the dataset is 
vast and these things will be slow? Perhaps the answer is simply a 
faster machine to install the local database and I am looking into this.

I have already looked at getting the snps from dbsnp or directly from 
source but i need to get information associated with the snps so will 
have the same problems i think of retreiving the associated data even if 
i got the 'raw snps' by other means

Many thanks

On 09/12/2010 16:53, Fiona Cunningham wrote:
>   Dear Andrea,
>
> We will look into producing the dump file of all SNPs in Watson for
> the next release which should make your life easier. Biomart is really
> best suited to specific queries and so we should provide dump files
> where large amounts of information across the entire genome is
> required.
>
> Fiona
>
> ------------------------------------------------------
> Fiona Cunningham
> Ensembl Variation Project Leader, EBI
> www.ensembl.org
> www.lrg-sequence.org
> t: 01223 494612 || e: fiona at ebi.ac.uk
>
>
>
> On 9 December 2010 13:46, Andrea Edwards<edwardsa at cs.man.ac.uk>  wrote:
>> Dear all
>>
>> I've tried downloading watson snps from biomart by a) the whole set and b)
>> chromosome by chromosome and i can't get the data. I have tried requesting
>> the data by email (no email received) and direct download (download starts
>> but at a rate of 1kb per second and times out after about 12 hours/10 mb
>> downloaded).
>>
>> I have written a script to get the watson snps via the perl api but that is
>> running and taking hours so I am scared I will get my ip blocked! There are
>> 3 million snps and it took an hour to get 3000 i think
>>
>> I was thinking of getting the human databases directly but i am awaiting a
>> new machine and totally out of disk space. Does anyone you know how big the
>> human core and variation databases are when installed?
>>
>> thanks a lot
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>





More information about the Dev mailing list