[ensembl-dev] gtf of watson snps

Graham Ritchie grsr at ebi.ac.uk
Mon Jan 10 17:46:02 GMT 2011


Hi Andrea,

The GVF dump will include all variations, i.e. SNPs, indels and anything else we have. Release 62 will probably not be out until early April.

I'm surprised that running the VEP on 150k SNPs takes days though, have you downloaded a recent version because we recently made some performance improvements? (You should probably check out the head code though to avoid the missing STOP_LOST bug discussed on the mailing list, or wait until the 61 API is released).

Cheers,

Graham


On 10 Jan 2011, at 17:34, Andrea Edwards wrote:

> Hi
> 
> Thanks for a quick reply.
> 
> It's a speed issue really. I've tried 3 approaches:
> 1) getting the consequences for the watson snps using the perl api (this looked like it would have taken weeks to run)
> 2) getting the consequences using the variation schema directly to hopefully avoid a few time consuming joins
> 3) downloading snps from gff file and then use will mclaren's snp effect predictor program to get the consequences (in case his code runs quicker than mine as he obviously has more experience in this field than i do)
> 
> In all 3 approaches i was getting the consequences for multi-allelic snps and also getting all the consequences (not the just the display consequence) for each allele.  I'll be quite interested to see how your data model compares to mine when it is released actually. Perhaps I'll change mine to copy yours :)
> 
> Anyhow, the problem i am having is simply the time it is taking due to the volume of data involved. Even using approach 3 and looking at just the approximately 150,000 snps in exons it was taking days. And then of course there's the issue you've spent a week running a program and you had a bug :) So if someone has already done it then.........
> 
> When is release 62 due by the way.
> 
> At present my 'pipline' has only been ran on snps but I'm going to try it on indels after that.  Will there be a GVF dump of watson include just snps or indels too?
> 
> many thanks
> 
> On 10/01/2011 17:16, Graham Ritchie wrote:
>> Hi Andrea,
>> 
>> We will be making a GVF dump of all Watson SNPs available for the next ensembl release (61). This will not include consequences until the following release 62 because GVF requires allele-specific consequences which the current consequence pipeline cannot provide, and the new one will not be ready for 61. I can probably create such a file before then (but after release 61) for you though, because I need to test the GVF generation with the new code.
>> 
>> You should be able to fetch all the information you need from the API already though?
>> 
>> Cheers,
>> 
>> Graham
>> 
>> 
>> On 10 Jan 2011, at 17:07, Andrea Edwards wrote:
>> 
>>> Hello
>>> 
>>> I recently saw this link of a download available from ensembl
>>> 
>>> http://galaxy.fml.mpg.de/library_common/ldda_info?library_id=2f94e8ae9edff68a&show_deleted=False&cntrller=library&folder_id=2f94e8ae9edff68a&use_panels=False&id=39b9f5f151019d2f
>>> 
>>> I was wondering if you had anything similar available for watson snps?
>>> 
>>> I was also wondering if you had anything like this available which provides the consequences of the watson snps? I know i can run the watson snps through Will McLarens excellent snp effect predictor program but i wondered if you had already done it and made the results available.
>>> 
>>> thanks
>>> 
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at ensembl.org
>>> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list