[ensembl-dev] unmapped/un-displayable SNP from dbsnp

Kim Brugger kim.brugger at easih.ac.uk
Mon Oct 11 15:00:09 BST 2010


Hi

I have had a further look at my data.

I have selected a set of snps that are not present in the ensembl 
database and should not contain any novel snps (shared between multiple 
unrelated, geographical distinct families). When I look in a locally 
quickly hacked dbsnp I can assign a dbsnp id to 223 of 258 snps.

True a lot in this list contains +3 alleles, but when looking at the 
dbsnp page, the odd alleles originate from dubious data-sources. And 
then there is the list of snps that does not fall with in this category: 
rs3116816, rs2516393, rs2074470 etc. I know that I found an odd snp page 
last Friday that can explain the faulty filtering, but this is not the 
case with the latter two.

I suggest that one easy solution would be to add check and see if the 
snps with +3 alleles are found in the 1000 genomes data, if it is 
include it into ensembl.

Cheers,

Kim




On 08/10/10 16:37, Kim Brugger wrote:
> On 08/10/10 16:23, Graham Ritchie wrote:
>> Hi Kim,
>>
>> Hmm, this does seem to be an odd case. If you look at the dbSNP entry 
>> on this page:
>>
>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&cmd=search&term=+rs1053738 
>>
>>
>> it does appear to have 4 alleles, but on the page you link the 
>> "RefSNP Alleles" are listed as only A/G, but as Bert pointed out the 
>> HGVS names are inconsistent with this.
> Actually one mRNA states that G>{A,C,T} at one position, which is 
> quite a spectacular, and clearly a bug.
>> This SNP only had 2 alleles in dbSNP 130, and can be seen in ensembl 
>> version 58 here:
>>
>> http://may2010.archive.ensembl.org/Homo_sapiens/Variation/Summary?v=rs1053738;vdb=variation 
>>
>>
>> It is possible that dbSNP have since (partially) corrected the 
>> webpage, but when we did the last import (from dbDNP 131) it was 
>> reported as having 4 alleles.
>>    Hopefully this will be resolved in the next release of dbSNP which 
>> will then filter through to ensembl (probably in release 62). We'll 
>> certainly take it up with them.
> So that will be sometime in one year+ time? As this is now a major 
> issue with for my data analysis I will investigate further. I have a 
> gut feeling that this is a more than a lucky shot.
>
> Cheers,
>
> Kim
>
>> Cheers,
>>
>> Graham
>>
>>
>> On 8 Oct 2010, at 15:54, Kim Brugger wrote:
>>
>>> Hi
>>>
>>> If you look at the dbsnp page for this snp it is only two alleles 
>>> A/G for this snp, so it looks like the counting of alleles is 
>>> faulty. Furthermore the SNP is represented in the 1000 genomes data, 
>>> and other datasets I deem trustworthy.
>>>
>>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1053738
>>>
>>> Thanks for explaining how/why this filtering is done.
>>>
>>> Cheers,
>>>
>>> Kim
>>>
>>> On 08/10/10 14:14, Graham Ritchie wrote:
>>>> Hi Kim,
>>>>
>>>> This SNP has *more than* 3 alleles, and we have taken the decision 
>>>> to fail all such SNPs, we debated this decision internally recently 
>>>> and Paul concluded as follows:
>>>>
>>>> "These are still far, far more likely to be errors than real.  
>>>> While some probably exist, true SNPs with all four alleles require 
>>>> very complex selection pressures to remain in the population and so 
>>>> this number is simply never likely to grow to "many SNPs."  In 
>>>> fact, the word quadallelic does not return any results in Pubmed.
>>>>
>>>> This does not mean that it will never happen, only that it is very, 
>>>> very rare.  Note that we don't fail triallelic SNPs, which are also 
>>>> rare and enriched for error."
>>>>
>>>> Hope this makes sense. If you have example of SNPs that don't 
>>>> appear for other reasons then please let us know. We do track all 
>>>> SNPs we fail and the reason for doing so in the failed_variation 
>>>> table of the variation database.
>>>>
>>>> Cheers,
>>>>
>>>> Graham
>>>>
>>>>
>>>> On 8 Oct 2010, at 13:54, Kim Brugger wrote:
>>>>
>>>>
>>>>> Hi
>>>>>
>>>>> I am looking for the rs1053738 snp. When I do a search on the 
>>>>> ensembl-web it is found and it exists with 2 synonyms, but if I 
>>>>> want to display I am told it was not mapped as the variation has 3 
>>>>> alleles.
>>>>>
>>>>> The SNP should be located at  3:124951820-124951821. I have a 
>>>>> large set of snps that I cannot find either with the ensembl-web 
>>>>> or using the api.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Kim
>>>>>
>>>>> -- 
>>>>> ==========================================================
>>>>> Kim Brugger
>>>>> EASIH, University of Cambridge
>>>>> www.easih.ac.uk
>>>>> ==========================================================
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list
>>>>> Dev at ensembl.org
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>
>>>>
>>>
>>> -- 
>>> ==========================================================
>>> Kim Brugger
>>> EASIH, University of Cambridge
>>> www.easih.ac.uk
>>> ==========================================================
>
>


-- 
==========================================================
Kim Brugger
EASIH, University of Cambridge
www.easih.ac.uk
==========================================================





More information about the Dev mailing list