[ensembl-dev] about dbSNP deleted snps (was Re: Transcript variation alleles -)

Pablo Marin-Garcia pg4 at sanger.ac.uk
Tue Feb 8 14:14:03 GMT 2011


On Tue, 8 Feb 2011, Pontus Larsson wrote:

> Hi Pablo,
>
> Hi Pablo,
>
> The answer is no, we don't store subsnps to that level of detail. You may be 
> able to retrieve the subsnp_id for a particular submitter (e.g. AFFY) via the 
> allele and subsnp_handle tables but since you would have to join to the 
> variation (and variation_synonym and source) table using the variation_id, 
> there is no guarantee that the subsnp_id would corresponds to the particular 
> source you're looking at.

Thanks Pontus,
seems that I will need to find and learn the dbSNP sql schema .... but probably 
in another life ;-)

Cheers

   -Pablo

>
> Cheers
> /Pontus
>
>
> On 08/02/2011 13:46, Pablo Marin-Garcia wrote:
>> On Tue, 8 Feb 2011, Fiona Cunningham wrote:
>> 
>>> hi Pablo,
>>> 
>>> We do not store deleted SS IDs from dbSNP. If you need this
>>> information you can search the NCBI using a URL with this format:
>>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_retrieve.cgi?subsnp_id=ss113307624 
>>> regards,
>>> 
>> 
>> Thanks Fiona,
>> 
>> one more question about this.
>> 
>> If I have a chip that is in variation.source (like affy_500k), given an rs 
>> in the chip, can I retrieve (with the API or sql) the SS that they supplied 
>> to dbSNP?
>>
>>   -Pablo
>> 
>> 
>>> Fiona
>>> 
>>> ------------------------------------------------------
>>> Fiona Cunningham
>>> Ensembl Variation Project Leader, EBI
>>> www.ensembl.org
>>> www.lrg-sequence.org
>>> t: 01223 494612 || e: fiona at ebi.ac.uk
>>> 
>>> 
>>> 
>>> On 8 February 2011 11:21, Pontus Larsson <Pontus.Larsson at ebi.ac.uk> wrote:
>>>> In the variation_synonym table, we do store rsIds that have been merged
>>>> (e.g. 'rs67044803' that has been merged into 'rs67041202'), and you can 
>>>> get
>>>> the corresponding variations by e.g. searching on the website or querying
>>>> the API. However, we currently do not store data for rsIds or ssIds that
>>>> have been removed from the database.
>>>> 
>>>> Cheers
>>>> /Pontus
>>>> 
>>>> 
>>>> On 08/02/2011 10:48, Pablo Marin-Garcia wrote:
>>>>> 
>>>>> On Tue, 8 Feb 2011, Pontus Larsson wrote:
>>>>> 
>>>>>> Hi Pablo,
>>>>>> 
>>>>>> The validation status terms in the file on the ftp server were taken
>>>>>> directly from the dbSNP database and the exact phrasing is not always 
>>>>>> the
>>>>>> same that we use. I've updated the file to instead use the same terms 
>>>>>> that
>>>>>> you would find in Ensembl. I hope that clears up the confusion.
>>>>>> 
>>>>>> In Ensembl 60, the variation database was built from dbSNP release 131
>>>>>> while the current database is built on dbSNP 132. Generally we don't 
>>>>>> have
>>>>>> access to archived versions of the dbSNP database, so I can't really 
>>>>>> tell
>>>>>> you why particular validation statuses have changed. For rs56, I was 
>>>>>> able to
>>>>>> access the dbSNP data for release 130 and one of the subsnps clustered 
>>>>>> into
>>>>>> this rs was indeed submitted by 1000Genomes (ss113307624). When you 
>>>>>> search
>>>>>> for this subsnp on the dbSNP website, it appears to have failed quality
>>>>>> controls
>>>>>> (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_retrieve.cgi?subsnp_id=ss113307624), 
>>>>>> so that would be your explanation. For the other rsIds, I'd recommend 
>>>>>> that
>>>>>> you contact the dbSNP helpdesk directly to find out the details.
>>>>>> 
>>>>>> Hope this helps
>>>>>> /Pontus
>>>>> 
>>>>> Thanks a lot Pontus.
>>>>> 
>>>>> Is ensembl storing this data? If I ask in ensembl for a SS or SNP that 
>>>>> has
>>>>> been removed would I obtain only undef? If you don't have it now, Do you
>>>>> plan to have this info in the future?. The logic for having them would 
>>>>> be
>>>>> the same used for having the ensembl_qc failed SNPs not being removed 
>>>>> from
>>>>> the database but filtered out by default in the queries unless 
>>>>> explicitly
>>>>> requested.
>>>>> 
>>>>> If I have a SS or RS from an old clinical study, I would rather like to 
>>>>> be
>>>>> able to handle this situations in my scripts (log why this SNP or SS is 
>>>>> not
>>>>> longer available instead returning 'not_found'). If ensembl does not 
>>>>> store
>>>>> this, but dbSNP still has it, do someone know how to retrieve this cases
>>>>> with NCBI biotools/eUtils or similar tool (I have not used it for 
>>>>> several
>>>>> years)?
>>>>> 
>>>>>
>>>>>  -Pablo
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -----
>>>>>
>>>>>  Pablo Marin-Garcia
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> Pontus Larsson, Ph.D.
>>>> Ensembl Variation
>>>> 
>>>> EMBL-EBI
>>>> Wellcome Trust Genome Campus
>>>> Hinxton, Cambridge, CB10 1SD
>>>> UK
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list
>>>> Dev at ensembl.org
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> 
>>> 
>> 
>> 
>> -----
>>
>>   Pablo Marin-Garcia
>> 
>> 
>> 
>
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Pontus Larsson, Ph.D.
> Ensembl Variation
>
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton, Cambridge, CB10 1SD
> UK
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>


-----

   Pablo Marin-Garcia





More information about the Dev mailing list