[ensembl-dev] unmapped/un-displayable SNP from dbsnp

Pablo Marin-Garcia pg4 at sanger.ac.uk
Mon Oct 11 17:11:24 BST 2010


And related with dbSNP odities and QCs:


[note this is build 36 but in the ==data== section I put build 37 examples from 
the web]
+----------------+------+------------------+----------------+-------------
| Variation_name | chr  | seq_region_start | allele_string | map_weight | 
+----------------+------+------------------+----------------+-------------
| rs2334386      | 22   |         14430353 | G/T           |          2 | 
| rs56342815     | 22   |         14430353 | G/T           |          1 |

In my data,  I am filtering as untrusted SNPs all the ones that map more than 
once to reduce genotyping errors. I came across by chance with these 
two previous SNPs.

In this case this two snps rs2334386, rs56342815 fall at the same position but 
has different weight and rs2334386 does not have a mapping in dbSNP to the 
reference builds but ensembl map it twice (chr22 and 14 see below).

==questions==

a) Has rs2334386 two mappings because was not a mapping from dbSNP and ensmebl 
mapped it finding two positions, and rs2334386 has only one because ensembl 
trusted the dbSNP mapping? Any other explanation?


b) Does ensembl have code to make rs synonyms by chr position in order to 
filter or QC these snps?

==data==
build 37:
[rs2334386]

<ens> two mappings
http://www.ensembl.org/Homo_sapiens/Variation/Summary?v=rs2334386;vdb=variation
14:19792648 (reverse strand)  	Jump to region in detail
22:16050353 (forward strand) 	Jump to region in detail

<sbSNP> not ref genome 1 mapping
But dbSNP only 1 location and not in the reference build:
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=2334386
37.1	22	135211	HuRef
36.3	22	135211	alt_assembly_8

[rs56342815]

<ens> one mapping
http://www.ensembl.org/Homo_sapiens/Variation/Summary?r=22:16049853-16050853;v=rs56342815;vdb=variation;vf=11442271
22:16050353 (forward strand)  	Jump to region in detail
<dbSNP> 1 mapping in ref genome
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=56342815
37.1	22	16050353   GRCh37
37.1	22	135211	   HuRef
36.3	22	14430353   ref_assembly
36.3	22	135211	   alt_assembly_8

   -Pablo

On Mon, 11 Oct 2010, Kim Brugger wrote:

> Hi
>
> I have had a further look at my data.
>
> I have selected a set of snps that are not present in the ensembl database 
> and should not contain any novel snps (shared between multiple unrelated, 
> geographical distinct families). When I look in a locally quickly hacked 
> dbsnp I can assign a dbsnp id to 223 of 258 snps.
>
> True a lot in this list contains +3 alleles, but when looking at the dbsnp 
> page, the odd alleles originate from dubious data-sources. And then there is 
> the list of snps that does not fall with in this category: rs3116816, 
> rs2516393, rs2074470 etc. I know that I found an odd snp page last Friday 
> that can explain the faulty filtering, but this is not the case with the 
> latter two.
>
> I suggest that one easy solution would be to add check and see if the snps 
> with +3 alleles are found in the 1000 genomes data, if it is include it into 
> ensembl.
>
> Cheers,
>
> Kim
>
>
>
>
> On 08/10/10 16:37, Kim Brugger wrote:
>> On 08/10/10 16:23, Graham Ritchie wrote:
>>> Hi Kim,
>>> 
>>> Hmm, this does seem to be an odd case. If you look at the dbSNP entry on 
>>> this page:
>>> 
>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&cmd=search&term=+rs1053738 
>>> 
>>> it does appear to have 4 alleles, but on the page you link the "RefSNP 
>>> Alleles" are listed as only A/G, but as Bert pointed out the HGVS names 
>>> are inconsistent with this.
>> Actually one mRNA states that G>{A,C,T} at one position, which is quite a 
>> spectacular, and clearly a bug.
>>> This SNP only had 2 alleles in dbSNP 130, and can be seen in ensembl 
>>> version 58 here:
>>> 
>>> http://may2010.archive.ensembl.org/Homo_sapiens/Variation/Summary?v=rs1053738;vdb=variation 
>>> 
>>> It is possible that dbSNP have since (partially) corrected the webpage, 
>>> but when we did the last import (from dbDNP 131) it was reported as having 
>>> 4 alleles.
>>>    Hopefully this will be resolved in the next release of dbSNP which will 
>>> then filter through to ensembl (probably in release 62). We'll certainly 
>>> take it up with them.
>> So that will be sometime in one year+ time? As this is now a major issue 
>> with for my data analysis I will investigate further. I have a gut feeling 
>> that this is a more than a lucky shot.
>> 
>> Cheers,
>> 
>> Kim
>> 
>>> Cheers,
>>> 
>>> Graham
>>> 
>>> 
>>> On 8 Oct 2010, at 15:54, Kim Brugger wrote:
>>> 
>>>> Hi
>>>> 
>>>> If you look at the dbsnp page for this snp it is only two alleles A/G for 
>>>> this snp, so it looks like the counting of alleles is faulty. Furthermore 
>>>> the SNP is represented in the 1000 genomes data, and other datasets I 
>>>> deem trustworthy.
>>>> 
>>>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1053738
>>>> 
>>>> Thanks for explaining how/why this filtering is done.
>>>> 
>>>> Cheers,
>>>> 
>>>> Kim
>>>> 
>>>> On 08/10/10 14:14, Graham Ritchie wrote:
>>>>> Hi Kim,
>>>>> 
>>>>> This SNP has *more than* 3 alleles, and we have taken the decision to 
>>>>> fail all such SNPs, we debated this decision internally recently and 
>>>>> Paul concluded as follows:
>>>>> 
>>>>> "These are still far, far more likely to be errors than real.  While 
>>>>> some probably exist, true SNPs with all four alleles require very 
>>>>> complex selection pressures to remain in the population and so this 
>>>>> number is simply never likely to grow to "many SNPs."  In fact, the word 
>>>>> quadallelic does not return any results in Pubmed.
>>>>> 
>>>>> This does not mean that it will never happen, only that it is very, very 
>>>>> rare.  Note that we don't fail triallelic SNPs, which are also rare and 
>>>>> enriched for error."
>>>>> 
>>>>> Hope this makes sense. If you have example of SNPs that don't appear for 
>>>>> other reasons then please let us know. We do track all SNPs we fail and 
>>>>> the reason for doing so in the failed_variation table of the variation 
>>>>> database.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Graham
>>>>> 
>>>>> 
>>>>> On 8 Oct 2010, at 13:54, Kim Brugger wrote:
>>>>> 
>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> I am looking for the rs1053738 snp. When I do a search on the 
>>>>>> ensembl-web it is found and it exists with 2 synonyms, but if I want to 
>>>>>> display I am told it was not mapped as the variation has 3 alleles.
>>>>>> 
>>>>>> The SNP should be located at  3:124951820-124951821. I have a large set 
>>>>>> of snps that I cannot find either with the ensembl-web or using the 
>>>>>> api.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Kim
>>>>>> 
>>>>>> -- 
>>>>>> ==========================================================
>>>>>> Kim Brugger
>>>>>> EASIH, University of Cambridge
>>>>>> www.easih.ac.uk
>>>>>> ==========================================================
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Dev mailing list
>>>>>> Dev at ensembl.org
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> ==========================================================
>>>> Kim Brugger
>>>> EASIH, University of Cambridge
>>>> www.easih.ac.uk
>>>> ==========================================================
>> 
>> 
>
>
> -- 
> ==========================================================
> Kim Brugger
> EASIH, University of Cambridge
> www.easih.ac.uk
> ==========================================================
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>


-----

   Pablo Marin-Garcia





More information about the Dev mailing list