[ensembl-dev] Automate the SNP variant result from "population genetics"

Anja Thormann anja at ebi.ac.uk
Thu Sep 21 11:07:05 BST 2017


Hi DK,

Let me give you some background on where we get our data for allele frequency or genotype frequency annotations from:

We provide allele frequencies and genotype frequencies (where available) from a set of reference populations provided by projects like 1000 Genomes Project, ESP, gnomAD (supersedes ExAC).

Only 1000 Genomes provides sample genotypes from which we can compute population genotype frequencies.

ESP provides population genotype frequencies. But we don't get a break down of genotypes by sample in the population.

gnomAD only provides allele counts in a population.

Here is a variant which has annotations from all of the above projects: http://www.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=1:230709548-230710548;v=rs699;vdb=variation;vf=664

For 1000GENOMES:phase_3:AFR allele frequencies: A: 0.097 G: 0.903 genotype frequencies: A|A: 0.126 A|G: 0.338 G|G: 0.536
For gnomADe:AFR allele frequencies: A: 0.152 G: 0.848 No genotype frequencies

Our variation endpoint does not return gnomAD frequencies at the moment. We will include the frequencies for the next release.

For now I would recommend that you use our VEP endpoint
https://rest.ensembl.org/documentation/info/vep_id_get <https://rest.ensembl.org/documentation/info/vep_id_get>
Examples:
https://rest.ensembl.org/vep/human/id/rs769971095?content-type=application/json <https://rest.ensembl.org/vep/human/id/rs769971095?content-type=application/json>
https://rest.ensembl.org/vep/human/id/rs699?content-type=application/json <https://rest.ensembl.org/vep/human/id/rs699?content-type=application/json>

The VEP makes use of cache files which store allele frequencies for the 1000 Genomes Project super populations (AFR, AMR, EAS, EUR, SAS) and the gnomAD exome data.

Please find a list of our populations, their short names and descriptions here:
http://www.ensembl.org/info/genome/variation/data_description.html#populations <http://www.ensembl.org/info/genome/variation/data_description.html#populations>

The VEP provides annotations from:
gnomADe:ALL - All gnomAD exomes individuals
gnomADe:AFR - African/African American
gnomADe:AMR - Admixed American
gnomADe:ASJ - Ashkenazi Jewish
gnomADe:EAS - East Asian
gnomADe:FIN - Finnish
gnomADe:NFE - Non-Finnish European
gnomADe:OTH - Other
gnomADe:SAS - South Asian
1000GENOMES:phase_3:AFR African
1000GENOMES:phase_3:AMR American
1000GENOMES:phase_3:EAS East Asian
1000GENOMES:phase_3:EUR European
1000GENOMES:phase_3:SAS South Asian

The populations use the following names in the vep endpoints:
  - for example for gnomADe:NFE:  gnomad_nfe_maf and  gnomad_nfe_allele
  - for example for 1000GENOMES:phase_3:AFR: afr_maf and afr_allele

We have a post vep endpoint which allows you to send a list of variant IDs for annotation. https://rest.ensembl.org/documentation/info/vep_id_post <https://rest.ensembl.org/documentation/info/vep_id_post>

I hope that helps you with your use case.

Anja


> On 20 Sep 2017, at 22:05, deepak kumar <deepak.k.choubey at gmail.com> wrote:
> 
> Hi Anja,
> 
> Thank you so much for the reply. It certainly helped me to get to the right direction of my query. However, could you please help me understand a few queries regarding the same:
> 
> To start of with, I find the "Rest API" a very clean approach to get variant information.
> 
> a) My aim is to find if a SNP (rsID let say rs769971095) share populations, or in other words, if this rsID mutation can be found in more than one population. From the links you provided I see that I can find an answer but am confused between "population allele frequency" and "population genotype frequency". To fulfill my aim, data for this rsID should be taken from "population allele frequency" or "population genotype frequency"?
> 
> b) The population name given in the "example output" of the "Rest API" are in short form like 'AMR', 'SAS' etc. Could you please let me know how can i retrieve the full population name for a given rsID?
> 
> Thanks much!
> DK
> 
> On Tue, Sep 19, 2017 at 7:22 PM, Anja Thormann <anja at ebi.ac.uk <mailto:anja at ebi.ac.uk>> wrote:
> Hi DK,
> 
> you have a few options of getting allele frequencies for a variant.
> 
> You can use
>     - our perl API: http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles <http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles> (to get you started)
>     - our REST API: https://rest.ensembl.org/documentation/info/variation_id <https://rest.ensembl.org/documentation/info/variation_id> (to get you started)
>     - the VEP: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html <https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html> It will allow you to annotate your input variants with frequency data if available
> 
> Please feel free to contact us again if you have any questions regarding the above approaches.
> 
> Kind regards,
> Anja
> 
> 
>> On 19 Sep 2017, at 16:06, deepak kumar <deepak.k.choubey at gmail.com <mailto:deepak.k.choubey at gmail.com>> wrote:
>> 
>> Dear ALL,
>> 
>>  I have been looking for a way to find "which nsSNP (with rs ID number like rs769971095) belong to what population(s), and if possible what gender"? I came to know about the Ensembl "population genetics" for the variants. 
>> 
>> I found the respective population genetics info for 2 rsIDs; rs559632360 & rs769971095
>> 
>> For "rs769971095" the super-population it shows is: ALL, AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS. 
>> 
>> For "rs559632360" the super-population it shows is: ALL, AFR, AMR, EAS, SAS, EUR.
>> 
>> 
>> 
>> For rs559632360 rsID, it also shows population genetics from "1000 Genomes Project Phase 3 & gnomAD exomes" along with "subpopulation" information, whereas, for rs769971095 it shows only "gnomAD exomes" population genetics.
>> 
>> http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093 <http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093>
>> http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12632759-12633759;v=rs559632360;vdb=variation;vf=92299087#population_freq_SAS <http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12632759-12633759;v=rs559632360;vdb=variation;vf=92299087#population_freq_SAS>
>> Does this mean that for "rs769971095" there is no "1000 genomes project phase 3" data available? 
>> 
>> I am interested to know if these two rsIDs belong to one population, so, can it be said that these rsIDs share same population? If yes, what population they share? It would be great if I could know how to make a reasonable interpretation for this.
>> 
>> Also, I need to do this for many rsIDs, could you please let me know how this process can be automated? Where, I can generate results like this:
>> 
>> 
>> 
>> rsID                  Super-Population with allele frequencies          Sub-population
>> 
>> rs769971095     ALL, AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS.      .......etc
>> 
>> rs559632360      ALL, AFR, AMR, EAS, SAS, EUR                                ......etc 
>> 
>> 
>> 
>> 
>> 
>> Thanks much! DK
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170921/6dd5c0f9/attachment.html>


More information about the Dev mailing list