[ensembl-dev] FTP + variation rs id synonym mappings

danny.kunz at gmx.de danny.kunz at gmx.de
Thu Mar 25 14:26:19 GMT 2021


Hi Anja,

 

thank you very much for that information!

 

The ftp data set of dbsnp was exactly what I was searching for. 

 

Pretty large data size but should not be a problem for our pipeline.

 

Thanks for pointing me to it!

 

Best regards,

Danny

 

 

 

 

 

Von: Dev <dev-bounces at ensembl.org> Im Auftrag von Anja Thormann
Gesendet: Montag, 22. März 2021 12:37
An: Ensembl developers list <dev at ensembl.org>
Betreff: Re: [ensembl-dev] FTP + variation rs id synonym mappings

 

Hi Danny,

 

you will get the most detailed information on the merge history of an rs id from dbSNP.

 

I recommend that you take a look at dbSNP's API:

https://api.ncbi.nlm.nih.gov/variation/v0/

 

Or flat files from:

https://ftp.ncbi.nih.gov/snp/latest_release/

This file contains the merge information:  https://ftp.ncbi.nih.gov/snp/latest_release/JSON/refsnp-merged.json.bz2

 

And here is an example of using the API:

Getting information for rs10001600 (https://www.ncbi.nlm.nih.gov/snp/rs10001600):

https://api.ncbi.nlm.nih.gov/variation/v0/beta/refsnp/10001600 where merged_snapshot_data stores the id history.

 

 

We are not extracting the full merge history for each rs id into Ensembl and therefore wouldn’t give a complete picture and decided against adding this information into our data dumps.

 

Best wishes,

Anja

 





On 18 Feb 2021, at 18:08, Andrew Parton <aparton at ebi.ac.uk <mailto:aparton at ebi.ac.uk> > wrote:

 

Hi Danny,

 

Currently, we do not have a file contains all of these mappings. However, VEP will allow you to annotate your VCFs with the variation synonym data that we have, by providing known synonyms for colocated variants: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_var_synonyms

 

Additionally, it may be possible for us to generate these synonyms in a single file as part of our next release, however VEP should be a quicker solution for you.

 

Kind Regards,

Andrew

 





On 12 Feb 2021, at 05:29, danny.kunz at gmx.de <mailto:danny.kunz at gmx.de>  wrote:

 

Hi all,

 

Quick question:

 

Our pipeline has to deal with VCF from older assembly releases from the GRCH37 branch.

 

We tried utilizing the FTP variation VCF files, but realized that we only have hits in about 40% of the patient VCF ids matched within the FTP variation data.

 

Obviously the old rs ids (synonyms) from the older assemblies are not contained in those newer releases.

 

Is there any file on the FTP which contains those synonym mappings?

 

-

 

Calling the REST api does not cause a problem with the old rs ids as it translates them to the newer ones, but if we want to reduce the REST communication overhead, it would be helpful to be able to achieve the same with the FTP data, right?

 

Thanks,

Danny

_______________________________________________
Dev mailing list     <mailto:Dev at ensembl.org> Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info:  <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog:  <http://www.ensembl.info/> http://www.ensembl.info/

 

_______________________________________________
Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org> 
Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
Ensembl Blog: http://www.ensembl.info/

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210325/73b32689/attachment.html>


More information about the Dev mailing list