[ensembl-dev] unmapped/un-displayable SNP from dbsnp

Graham Ritchie grsr at ebi.ac.uk
Tue Oct 12 10:45:38 BST 2010


Hi Kim,

The other SNPs you mention were all failed for other reasons, specifically:

rs3116816 - maps to more than 3 locations
rs2516393 - no mapping (though in this case we suspect there is a display bug, in fact this SNP had multiple mappings and was failed for this reason)
rs2074470 - maps to more than 3 locations

We fail SNPs for these reasons to filter out some of the noise in dbSNP, and to try to improve the accuracy of our annotations.

For ensembl 59, which was based on dbSNP 131, we imported the mappings directly from dbSNP (i.e. we did not map the SNPs ourselves) and so there shouldn't be any room for disagreement. However, it seems that the dbSNP webpages have been updated with information from dbSNP 132 (according to the "updated in build" field in the RefSNP table) and so the ensembl and dbSNP webpages are somewhat out of sync.

We will certainly consider giving special priority to particular sources, such as 1000 Genomes, in the future. If you have any other suggestions on how we can improve our filtering we would be interested to hear them. 

Cheers,

Graham


On 11 Oct 2010, at 15:00, Kim Brugger wrote:

> Hi
> 
> I have had a further look at my data.
> 
> I have selected a set of snps that are not present in the ensembl database and should not contain any novel snps (shared between multiple unrelated, geographical distinct families). When I look in a locally quickly hacked dbsnp I can assign a dbsnp id to 223 of 258 snps.
> 
> True a lot in this list contains +3 alleles, but when looking at the dbsnp page, the odd alleles originate from dubious data-sources. And then there is the list of snps that does not fall with in this category: rs3116816, rs2516393, rs2074470 etc. I know that I found an odd snp page last Friday that can explain the faulty filtering, but this is not the case with the latter two.
> 
> I suggest that one easy solution would be to add check and see if the snps with +3 alleles are found in the 1000 genomes data, if it is include it into ensembl.
> 
> Cheers,
> 
> Kim
> 
> 
> 
> 
> On 08/10/10 16:37, Kim Brugger wrote:
>> On 08/10/10 16:23, Graham Ritchie wrote:
>>> Hi Kim,
>>> 
>>> Hmm, this does seem to be an odd case. If you look at the dbSNP entry on this page:
>>> 
>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&cmd=search&term=+rs1053738 
>>> 
>>> it does appear to have 4 alleles, but on the page you link the "RefSNP Alleles" are listed as only A/G, but as Bert pointed out the HGVS names are inconsistent with this.
>> Actually one mRNA states that G>{A,C,T} at one position, which is quite a spectacular, and clearly a bug.
>>> This SNP only had 2 alleles in dbSNP 130, and can be seen in ensembl version 58 here:
>>> 
>>> http://may2010.archive.ensembl.org/Homo_sapiens/Variation/Summary?v=rs1053738;vdb=variation 
>>> 
>>> It is possible that dbSNP have since (partially) corrected the webpage, but when we did the last import (from dbDNP 131) it was reported as having 4 alleles.
>>>   Hopefully this will be resolved in the next release of dbSNP which will then filter through to ensembl (probably in release 62). We'll certainly take it up with them.
>> So that will be sometime in one year+ time? As this is now a major issue with for my data analysis I will investigate further. I have a gut feeling that this is a more than a lucky shot.
>> 
>> Cheers,
>> 
>> Kim
>> 
>>> Cheers,
>>> 
>>> Graham
>>> 
>>> 
>>> On 8 Oct 2010, at 15:54, Kim Brugger wrote:
>>> 
>>>> Hi
>>>> 
>>>> If you look at the dbsnp page for this snp it is only two alleles A/G for this snp, so it looks like the counting of alleles is faulty. Furthermore the SNP is represented in the 1000 genomes data, and other datasets I deem trustworthy.
>>>> 
>>>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1053738
>>>> 
>>>> Thanks for explaining how/why this filtering is done.
>>>> 
>>>> Cheers,
>>>> 
>>>> Kim
>>>> 
>>>> On 08/10/10 14:14, Graham Ritchie wrote:
>>>>> Hi Kim,
>>>>> 
>>>>> This SNP has *more than* 3 alleles, and we have taken the decision to fail all such SNPs, we debated this decision internally recently and Paul concluded as follows:
>>>>> 
>>>>> "These are still far, far more likely to be errors than real.  While some probably exist, true SNPs with all four alleles require very complex selection pressures to remain in the population and so this number is simply never likely to grow to "many SNPs."  In fact, the word quadallelic does not return any results in Pubmed.
>>>>> 
>>>>> This does not mean that it will never happen, only that it is very, very rare.  Note that we don't fail triallelic SNPs, which are also rare and enriched for error."
>>>>> 
>>>>> Hope this makes sense. If you have example of SNPs that don't appear for other reasons then please let us know. We do track all SNPs we fail and the reason for doing so in the failed_variation table of the variation database.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Graham
>>>>> 
>>>>> 
>>>>> On 8 Oct 2010, at 13:54, Kim Brugger wrote:
>>>>> 
>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> I am looking for the rs1053738 snp. When I do a search on the ensembl-web it is found and it exists with 2 synonyms, but if I want to display I am told it was not mapped as the variation has 3 alleles.
>>>>>> 
>>>>>> The SNP should be located at  3:124951820-124951821. I have a large set of snps that I cannot find either with the ensembl-web or using the api.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Kim
>>>>>> 
>>>>>> -- 
>>>>>> ==========================================================
>>>>>> Kim Brugger
>>>>>> EASIH, University of Cambridge
>>>>>> www.easih.ac.uk
>>>>>> ==========================================================
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Dev mailing list
>>>>>> Dev at ensembl.org
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> ==========================================================
>>>> Kim Brugger
>>>> EASIH, University of Cambridge
>>>> www.easih.ac.uk
>>>> ==========================================================
>> 
>> 
> 
> 
> -- 
> ==========================================================
> Kim Brugger
> EASIH, University of Cambridge
> www.easih.ac.uk
> ==========================================================





More information about the Dev mailing list