[ensembl-dev] [SPAM] - Re: [SPAM] - Re: Transcript variation alleles - - Bayesian Filter detected spam - Email found in subject

Pontus Larsson Pontus.Larsson at ebi.ac.uk
Tue Feb 8 10:19:27 GMT 2011


Hi Pablo,

The validation status terms in the file on the ftp server were taken 
directly from the dbSNP database and the exact phrasing is not always 
the same that we use. I've updated the file to instead use the same 
terms that you would find in Ensembl. I hope that clears up the confusion.

In Ensembl 60, the variation database was built from dbSNP release 131 
while the current database is built on dbSNP 132. Generally we don't 
have access to archived versions of the dbSNP database, so I can't 
really tell you why particular validation statuses have changed. For 
rs56, I was able to access the dbSNP data for release 130 and one of the 
subsnps clustered into this rs was indeed submitted by 1000Genomes 
(ss113307624). When you search for this subsnp on the dbSNP website, it 
appears to have failed quality controls 
(http://www.ncbi.nlm.nih.gov/projects/SNP/snp_retrieve.cgi?subsnp_id=ss113307624), 
so that would be your explanation. For the other rsIds, I'd recommend 
that you contact the dbSNP helpdesk directly to find out the details.

Hope this helps
/Pontus


On 07/02/2011 14:33, Pablo Marin-Garcia wrote:
> On Mon, 7 Feb 2011, Pontus Larsson wrote:
>
>> Hi Gavin,
>>
>> Please note that in the current release (61), we are missing this 
>> validation status for many variations. This happened because this 
>> data was not present in the dbSNP data we had access to at the time 
>> of the import. (See the 'known bugs' page 
>> http://www.ensembl.org/info/docs/knownbugs.html). I have put a 
>> tab-separated file containing the current dbSNP validation statuses 
>> (exported on Jan 20) for each rsId on the ftp site: 
>> ftp://ftp.ebi.ac.uk/pub/software/ensembl/snp/human/e61_rsid_validation_status.txt.gz.
>
> A) One question. could someone give some feedback on the following issue:
>
> the validation_status file seems to have a inconsistency with the 
> naming of the status. First seems that if it is only one status then 
> has the prefix 'by' but not happening in 'hapmap' in rs11. The 'by' 
> prefix get dropped when there are more than one status for a given 
> SNP, but this does not happen in rs56. Could the 'by' be dropped so we 
> can group by status better?
>
> rs6     by freq
> rs11    HapMap  # <====
> rs12    by submitter
> rs26    by cluster
> rs27    1000Genome,2hits,cluster
> rs56    by cluster,freq  # <====
> rs57    HapMap,cluster
>
>
> B) rs56 in ensembl_variation_60 has the status 
> 'cluster,freq,1000Genome' but now 'by cluster,freq'. Do you know why 
> the 1000Genome status has been dropped in dbSNP? I have had the same 
> issue with other SNPs where HapMap status disapeared when moving from 
> latest ensembl in build 36 to ens_60 (b37), I assume that it was 
> because a change in the dbSNP release. Are this missing status a bug 
> in dbSNP or is there a good reason for the drop out?.
>
>
>   -Pablo
>
>
>
>>
>> You should be able to get the validation status from there. Apologies 
>> for this inconvenience.
>>
>> Thanks
>> /Pontus Larsson - Ensembl Variation
>>
>>
>> On 07/02/2011 09:55, Oliver, Gavin wrote:
>>> Thanks Graham/Fiona,
>>>
>>>
>>> Really all I want to do at the moment is get a 'validated' or
>>> 'unvalidated' value for each variation I consider.
>>>
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>
>
> -----
>
>   Pablo Marin-Garcia
>   Vertebrate Genomics
>   Wellcome Trust Genome Campus
>
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pontus Larsson, Ph.D.
Ensembl Variation

EMBL-EBI
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SD
UK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





More information about the Dev mailing list