[ensembl-dev] unmapped/un-displayable SNP from dbsnp

Neil Walker neil.walker at cimr.cam.ac.uk
Fri Oct 8 16:33:34 BST 2010

Hi all

let's say this *is* a triallelic SNP. Then you've got 2 problems:

[1] dbSNP appears to have or used to have trouble automatically reverse
complementing a triallelic SNP;

[2] my /guess/ is that most of the population diversity information was
derived from an Affymetrix or Illumina chip - it is on the Illumina
HumanHap550v3 GWAS chip for example - and these can only measure 2 alleles.

This will record genotypes /as if/ a third allele is not present, and
this information then hits the databases "confirming" that there are
only 2 alleles, while it is confirming only that 2 alleles were measured.

However, any person having the unmeasured allele will either be missing
a genotype, or appear to be homozygous for their other allele depending
on genotype calling algorithm.

Which means, if you're looking up Ensembl and/or dbSNP because you have
interesting results at rs1053738, then you should doubt them.


> Although it says on the mentioned dbSNP page:
> RefSNP Alleles:	A/G
> it also says:
> NM_021964.2:c.1749G>A
> NM_021964.2:c.1749G>C
> NM_021964.2:c.1749G>T
> and further down on the page it again says:
> CCG => CCA
> CCG => CCC
> CCG => CCT
> But the in the Population diversity part again only A and G are
> mentioned as alleles.
> So, I get the feeling dbSNP is messing up things here ....
> Cheers,
> Bert
> On Fri, Oct 8, 2010 at 3:54 PM, Kim Brugger <kim.brugger at easih.ac.uk> wrote:
>> Hi
>> If you look at the dbsnp page for this snp it is only two alleles A/G for
>> this snp, so it looks like the counting of alleles is faulty. Furthermore
>> the SNP is represented in the 1000 genomes data, and other datasets I deem
>> trustworthy.
>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1053738
>> Thanks for explaining how/why this filtering is done.
>> Cheers,
>> Kim
>> On 08/10/10 14:14, Graham Ritchie wrote:
>>> Hi Kim,
>>> This SNP has *more than* 3 alleles, and we have taken the decision to fail
>>> all such SNPs, we debated this decision internally recently and Paul
>>> concluded as follows:
>>> "These are still far, far more likely to be errors than real.  While some
>>> probably exist, true SNPs with all four alleles require very complex
>>> selection pressures to remain in the population and so this number is simply
>>> never likely to grow to "many SNPs."  In fact, the word quadallelic does not
>>> return any results in Pubmed.
>>> This does not mean that it will never happen, only that it is very, very
>>> rare.  Note that we don't fail triallelic SNPs, which are also rare and
>>> enriched for error."
>>> Hope this makes sense. If you have example of SNPs that don't appear for
>>> other reasons then please let us know. We do track all SNPs we fail and the
>>> reason for doing so in the failed_variation table of the variation database.
>>> Cheers,
>>> Graham
>>> On 8 Oct 2010, at 13:54, Kim Brugger wrote:
>>>> Hi
>>>> I am looking for the rs1053738 snp. When I do a search on the ensembl-web
>>>> it is found and it exists with 2 synonyms, but if I want to display I am
>>>> told it was not mapped as the variation has 3 alleles.
>>>> The SNP should be located at  3:124951820-124951821. I have a large set
>>>> of snps that I cannot find either with the ensembl-web or using the api.
>>>> Cheers,
>>>> Kim

Neil Walker                         email: neil.walker at cimr.cam.ac.uk
JDRF/WT Diabetes and Inflammation   tel: +44 (0)1223 763210
	Laboratory		    fax: +44 (0)1223 762102
Cambridge, UK                    http://www-gene.cimr.cam.ac.uk/todd/

More information about the Dev mailing list