[ensembl-dev] Disappeared Variation synonyms

Anja Thormann anja at ebi.ac.uk
Fri Jul 6 16:35:25 BST 2012


Hello John,

We have found that some individuals and their genotypes were lost due to an error in one of the dbSNP tables from which we import our data.

We have not encountered this error before and are really thankful that you pointed this out to us. dbSNP is already aware of the problem and is investigating the cause of it. Unfortunately, we can not correct this until after the next release 68.

You mentioned that the missing individuals do not effect your work because you are using external flat files. However, if you need the functionality of the API we could try and build a corrected database and put the dump files on our ftp server from which you could build a local version of the database.

Best regards,
Anja

On 5 Jul 2012, at 21:31, Ma, Man Chun John wrote:

> Hi Anja,
> 
> Thank you for the data.
> 
> While I should have opened a new thread instead, I have noticed some data issues regarding individual genotypes from formerly STAR-gtype SNPs.
> 
> For example, let's say, rs13457237 (http://may2012.archive.ensembl.org/Rattus_norvegicus/Variation/Individual?r=1:3261144-3262144;source=dbSNP;v=rs13457237;vdb=variation;vf=7039489 ). Like all previously STAR-gtype SNPs, the individual genotypes were uploaded to dbSNP by Ensembl in batch 2009-11_STAR-genotype, and its ss# in this submission is ss149342018 (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_retrieve.cgi?subsnp_id=179342018). Comparing the two sets of data, what I have noticed are:
> 
> 1. In the Ensembl data, all individuals were listed twice, once with their strain name as their population ID and once under the population ENSEMBL:STAR_mdc/cng.
> 2. Approximately half of the individuals in the ss submission were neither given a individual ID (http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?pop=12942) in dbSNP nor have their genotypes listed in Ensembl.
> 
> While this does not matter much to us (as we routinely use the original genotype file rather than Ensembl), I think this needs to be pointed out.
> 
> Cheers,
> 
> John MC Ma
> Graduate Assistant
> Kwitek Lab
> Department of Pharmacology
> 3125E MERF
> 375 Newton Road
> Iowa City IA 52242
> -----Original Message-----
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Anja Thormann
> Sent: Friday, June 29, 2012 10:59 AM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Disappeared Variation synonyms
> 
> Hello John,
> 
> I built a mapping file which maps ENSEMBL IDS to the respective rs IDs in our current rat variation database 67. Additionally, I put the source (ENSEMBL:celera, ENSEMBL:STAR-4strain, ENSEMBL:STAR-gtype) in the mapping file. The file can  be found here: ftp://ftp.ebi.ac.uk/pub/databases/ensembl/snp/rat/rat_ENSEMBL_IDs.txt.gz.
> Please let me know if you have any problems with the file.
> 
> Please notice that we are also working on providing HGVS names for all our variations for the next release.
> 
> Best regards,
> Anja
> 
> 
> On 27 Jun 2012, at 19:01, Ma, Man Chun John wrote:
> 
>> Hi,
>> 
>> Sorry for one typo, I meant to say "those rs# are invalid again on dbSNP again" in the previous reply.
>> 
>> There's another annoyance with these imports: there's no longer an easy way to identify which of these Variants are from the different original sources (STAR-4strain, STAR-gtype etc). Pre-v66 I could have been do a fetch_all_by_source, but in v67 the only residual bit of that information is in the population/individual genotype.
>> 
>> To top that, the individual genotypes for these SNPs as on dbSNP 136 was incomplete (http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?pop=12942 ). This mean you're replacing a complete dataset with an incomplete one which is even more difficult to access...
>> 
>> 
>> John MC Ma
>> Graduate Assistant
>> Kwitek Lab
>> Department of Pharmacology
>> 3125E MERF
>> 375 Newton Road
>> Iowa City IA 52242
>> -----Original Message-----
>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On
>> Behalf Of Ma, Man Chun John
>> Sent: Wednesday, June 27, 2012 12:45 PM
>> To: 'Ensembl developers list'
>> Subject: Re: [ensembl-dev] Disappeared Variation synonyms
>> 
>> *Sigh*
>> 
>> Those rs# are again on dbSNP again. Given the underlying ss# are still there, I take that to mean NCBI are now re-mapping those to the new rn5.0 assembly. For one reason or the other, rat rs# are even more unstable than ENSSNP#, and given not even all of these SNPs have HGVS, I'm at a loss on what ID should I use.
>> 
>> John MC Ma
>> Graduate Assistant
>> Kwitek Lab
>> Department of Pharmacology
>> 3125E MERF
>> 375 Newton Road
>> Iowa City IA 52242
>> -----Original Message-----
>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On
>> Behalf Of Anja Thormann
>> Sent: Wednesday, June 27, 2012 12:04 PM
>> To: Ensembl developers list
>> Subject: Re: [ensembl-dev] Disappeared Variation synonyms
>> 
>> Hello John,
>> 
>> for release 67 we built a new rat variation database with the dbSNP import version 136 and we retired variants called with the ensembl pipeline.
>> I can provide you with a file that maps ENSSNP ids to the variation rs-ids from the current database.
>> 
>> Anja
>> On 27 Jun 2012, at 17:17, Ma, Man Chun John wrote:
>> 
>>> Hi all,
>>> 
>>> V67's inclusion of dbSNP 136 updates for rat is clearly good news, as this bridged a gap between the two databases (many Ensembl rat SNPs were pulled from dbSNP post-130 and they were not re-included into dbSNP for the past 3 years).
>>> 
>>> However, when I attempted to access one of those SNPs I haven't been able to locate them using the previous ENSSNP IDs. After searching for the SNPs in question by using genomic coordinates, I've noticed the v67 entry included none of the synonyms listed in v66.
>>> 
>>> For example:
>>> 
>>> rs105805725 in v67 (http://www.ensembl.org/Rattus_norvegicus/Variation/Summary?db=core;r=17:39585392-39585392;source=dbSNP;v=rs105805725;vdb=variation;vf=5120020) vs rs65331131 aka ENSRNOSNP971187 in v66: (http://feb2012.archive.ensembl.org/Rattus_norvegicus/Variation/Summary?r=17:39584892-39585892;source=dbSNP;v=rs65331131;vdb=variation;vf=3051002). They are essentially the same Variation (same position and alleles and even individual genotypes), yet the v67 entry did not include any synonyms seen in v66.
>>> 
>>> Due to the issue mentioned in the beginning of the email, any rs# for these SNPs in pre-v67 are de jure invalid, and our lab relied on the ENSSNP ID for identification. I hope those ENSSNP IDs have not been retired so abruptly right?
>>> 
>>> Cheers,
>>> 
>>> John MC Ma
>>> Graduate Assistant
>>> Kwitek Lab
>>> Department of Pharmacology
>>> 3125E MERF
>>> 375 Newton Road
>>> Iowa City IA 52242
>>> 
>>> 
>>> ________________________________
>>> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
>>> ________________________________
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> ________________________________
>> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
>> ________________________________
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> ________________________________
>> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
>> ________________________________
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> ________________________________
> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
> ________________________________
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list