[ensembl-dev] about dbSNP deleted snps (was Re: Transcript variation alleles -)
pg4 at sanger.ac.uk
Tue Feb 8 10:48:48 GMT 2011
On Tue, 8 Feb 2011, Pontus Larsson wrote:
> Hi Pablo,
> The validation status terms in the file on the ftp server were taken directly
> from the dbSNP database and the exact phrasing is not always the same that we
> use. I've updated the file to instead use the same terms that you would find
> in Ensembl. I hope that clears up the confusion.
> In Ensembl 60, the variation database was built from dbSNP release 131 while
> the current database is built on dbSNP 132. Generally we don't have access to
> archived versions of the dbSNP database, so I can't really tell you why
> particular validation statuses have changed. For rs56, I was able to access
> the dbSNP data for release 130 and one of the subsnps clustered into this rs
> was indeed submitted by 1000Genomes (ss113307624). When you search for this
> subsnp on the dbSNP website, it appears to have failed quality controls
> so that would be your explanation. For the other rsIds, I'd recommend that
> you contact the dbSNP helpdesk directly to find out the details.
> Hope this helps
Thanks a lot Pontus.
Is ensembl storing this data? If I ask in ensembl for a SS or SNP that has been
removed would I obtain only undef? If you don't have it now, Do you plan to have
this info in the future?. The logic for having them would be the same used for
having the ensembl_qc failed SNPs not being removed from the database but
filtered out by default in the queries unless explicitly requested.
If I have a SS or RS from an old clinical study, I would rather like to be able
to handle this situations in my scripts (log why this SNP or SS is not longer
available instead returning 'not_found'). If ensembl does not store this, but
dbSNP still has it, do someone know how to retrieve this cases with NCBI
biotools/eUtils or similar tool (I have not used it for several years)?
More information about the Dev