[ensembl-dev] unmapped/un-displayable SNP from dbsnp
Neil Walker
neil.walker at cimr.cam.ac.uk
Fri Oct 8 16:33:34 BST 2010
Hi all
let's say this *is* a triallelic SNP. Then you've got 2 problems:
[1] dbSNP appears to have or used to have trouble automatically reverse
complementing a triallelic SNP;
[2] my /guess/ is that most of the population diversity information was
derived from an Affymetrix or Illumina chip - it is on the Illumina
HumanHap550v3 GWAS chip for example - and these can only measure 2 alleles.
This will record genotypes /as if/ a third allele is not present, and
this information then hits the databases "confirming" that there are
only 2 alleles, while it is confirming only that 2 alleles were measured.
However, any person having the unmeasured allele will either be missing
a genotype, or appear to be homozygous for their other allele depending
on genotype calling algorithm.
Which means, if you're looking up Ensembl and/or dbSNP because you have
interesting results at rs1053738, then you should doubt them.
Cheers
Neil
> Although it says on the mentioned dbSNP page:
>
> RefSNP Alleles: A/G
>
> it also says:
>
> NM_021964.2:c.1749G>A
> NM_021964.2:c.1749G>C
> NM_021964.2:c.1749G>T
>
> and further down on the page it again says:
>
> CCG => CCA
> CCG => CCC
> CCG => CCT
>
> But the in the Population diversity part again only A and G are
> mentioned as alleles.
>
> So, I get the feeling dbSNP is messing up things here ....
>
> Cheers,
> Bert
>
>
> On Fri, Oct 8, 2010 at 3:54 PM, Kim Brugger <kim.brugger at easih.ac.uk> wrote:
>> Hi
>>
>> If you look at the dbsnp page for this snp it is only two alleles A/G for
>> this snp, so it looks like the counting of alleles is faulty. Furthermore
>> the SNP is represented in the 1000 genomes data, and other datasets I deem
>> trustworthy.
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1053738
>>
>> Thanks for explaining how/why this filtering is done.
>>
>> Cheers,
>>
>> Kim
>>
>> On 08/10/10 14:14, Graham Ritchie wrote:
>>> Hi Kim,
>>>
>>> This SNP has *more than* 3 alleles, and we have taken the decision to fail
>>> all such SNPs, we debated this decision internally recently and Paul
>>> concluded as follows:
>>>
>>> "These are still far, far more likely to be errors than real. While some
>>> probably exist, true SNPs with all four alleles require very complex
>>> selection pressures to remain in the population and so this number is simply
>>> never likely to grow to "many SNPs." In fact, the word quadallelic does not
>>> return any results in Pubmed.
>>>
>>> This does not mean that it will never happen, only that it is very, very
>>> rare. Note that we don't fail triallelic SNPs, which are also rare and
>>> enriched for error."
>>>
>>> Hope this makes sense. If you have example of SNPs that don't appear for
>>> other reasons then please let us know. We do track all SNPs we fail and the
>>> reason for doing so in the failed_variation table of the variation database.
>>>
>>> Cheers,
>>>
>>> Graham
>>>
>>>
>>> On 8 Oct 2010, at 13:54, Kim Brugger wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> I am looking for the rs1053738 snp. When I do a search on the ensembl-web
>>>> it is found and it exists with 2 synonyms, but if I want to display I am
>>>> told it was not mapped as the variation has 3 alleles.
>>>>
>>>> The SNP should be located at 3:124951820-124951821. I have a large set
>>>> of snps that I cannot find either with the ensembl-web or using the api.
>>>>
>>>> Cheers,
>>>>
>>>> Kim
--
---------------------------------------------------------------------
Neil Walker email: neil.walker at cimr.cam.ac.uk
JDRF/WT Diabetes and Inflammation tel: +44 (0)1223 763210
Laboratory fax: +44 (0)1223 762102
Cambridge, UK http://www-gene.cimr.cam.ac.uk/todd/
---------------------------------------------------------------------
More information about the Dev
mailing list