[ensembl-dev] Getting to grips with the variation API - retrieving unusual SNPs

Andrea Edwards edwardsa at cs.man.ac.uk
Tue Oct 26 12:22:36 BST 2010


Hi

I'm trying to get to grips with the variation API and there are some 
things I don't know how to do so as they seem to be looking at variation 
from a slightly different perspective. I want to find arbitary SNPs that 
match certain criteria

Examples

How would i retrieve SNPs that affect genes on the reverse strand of a 
chromosome?

How do I find a SNP that affects a gene on both the forward and the 
reverse strand?

How would I find SNPs that are in more than one exon in the same gene ?
(e.g. the gene runs from bases 1-1000 and has exon 1 at bases 100-200, 
exon 2 at bases 300 to 400, exon 3 at bases 500-600and another exon 4 
from intron retention at bases 100-400. If a SNP is at base 150 it is in 
exons 1 and 4)

How would I find SNPs that are present in exons from different genes 
(e.g. gene 1 has an exon from 1000-5000 and gene 2 has an exon from 
2000-8000: a SNP at bp 3000 is in exons from 2 different genes)

Another more obscure example is to find SNPs in exons that have 
transcripts that are translated in different reading frames. I didn't 
even know an exon could have different reading frames but apparently its 
not as rare as people thought!

I want to find 'unusual' SNPs so I can generate a set of test data from 
them. I was thinking the only way I could do this would be to iterate 
over every SNP in the database and then find its associated genes and 
exons and transcripts and filter the SNPs that match. I imagine EBI 
wouldn't want that type of load on its server so I will install a local 
copy. How long might a script like that take to run on a typical desktop PC?

Many thanks





More information about the Dev mailing list