[ensembl-dev] Automate the SNP variant result from "population genetics"

Anja Thormann anja at ebi.ac.uk
Tue Sep 26 11:50:02 BST 2017


Hi DK,

for a) you can use our lookup endpoint
https://rest.ensembl.org/documentation/info/lookup <https://rest.ensembl.org/documentation/info/lookup>

for your example:
https://rest.ensembl.org/lookup/id/NM_007299.3?content-type=application/json;expand=1;utr=1 <https://rest.ensembl.org/lookup/id/NM_007299.3?content-type=application/json;expand=1;utr=1>


for b) you can use the ID overlap endpoint:
https://rest.ensembl.org/documentation/info/overlap_id <https://rest.ensembl.org/documentation/info/overlap_id>

In order to get all overlapping variants for a transcript you use the endpoint like this:
https://rest.ensembl.org/overlap/id/NM_007299.3?feature=variation;content-type=application/json <https://rest.ensembl.org/overlap/id/NM_007299.3?feature=variation;content-type=application/json>

To only return variants in 5’ and 3’ UTRs you can add additional filters to the request:
https://rest.ensembl.org/overlap/id/NM_007299.3?feature=variation&so_term=3_prime_UTR_variant;so_term=5_prime_UTR_variant;content-type=application/json <https://rest.ensembl.org/overlap/id/NM_007299.3?feature=variation&so_term=3_prime_UTR_variant;so_term=5_prime_UTR_variant;content-type=application/json>


For each variant in our database we compute its consequence on overlapping transcripts. We use sequence ontology terms for describing the consequences. You can find a ranked list of all the SO terms we assign here: http://www.ensembl.org/info/genome/variation/predicted_data.html#consequences <http://www.ensembl.org/info/genome/variation/predicted_data.html#consequences>

We do compute 5_prime_UTR_variant and 3_prime_UTR_variant consequences which allows you to filter for only those variants.

However, for each variant that we return in the overlap endpoint we only report the most severe consequence for the given variant and overlapping transcript. Sometimes the returned consequence_type will be different from 5_prime_UTR_variant and 3_prime_UTR_variant because the variant is not only a 5 prime UTR variant but also for example causes a frameshift. In this case the consequence type is frameshift variant.

If you want a detailed list of all the consequences for all variants in a 3’ and 5’ region you need to first retrieve all the variants from the overlap endpoint and then use the variants as input for the VEP endpoint.

Best,
Anja


> On 23 Sep 2017, at 14:17, deepak kumar <deepak.k.choubey at gmail.com> wrote:
> 
> Thanks much Anja! 
> 
> I think Ensembl is a very useful platform for such queries. Am curious for this following query below, could you please let me know how can I do this using the Ensembl platform:
> 
> a) I want to extract the 5' and 3' UTRs from the mRNA of BRCA1 and BRCA2.  For instance information 5' & 3' UTR for the refseq geneid "NM_007299.3" of BRCA1
> 
> 
> b) Also, find the position of the SNPs (rsIDs) in the 5' and 3'  UTRs. For instance information like this: (for the refseq geneid "NM_007299.3" of BRCA1)
>  
> refseg-geneID                                    mutant-allele                                       position-of-mutation
> NM_007299.3                 c to t                       400032
> 
> Thanks much! Please let me know if something is not clear.
> 
> 
> On Thu, Sep 21, 2017 at 1:07 PM, Anja Thormann <anja at ebi.ac.uk <mailto:anja at ebi.ac.uk>> wrote:
> Hi DK,
> 
> Let me give you some background on where we get our data for allele frequency or genotype frequency annotations from:
> 
> We provide allele frequencies and genotype frequencies (where available) from a set of reference populations provided by projects like 1000 Genomes Project, ESP, gnomAD (supersedes ExAC).
> 
> Only 1000 Genomes provides sample genotypes from which we can compute population genotype frequencies.
> 
> ESP provides population genotype frequencies. But we don't get a break down of genotypes by sample in the population.
> 
> gnomAD only provides allele counts in a population.
> 
> Here is a variant which has annotations from all of the above projects: http://www.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=1:230709548-230710548;v=rs699;vdb=variation;vf=664 <http://www.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=1:230709548-230710548;v=rs699;vdb=variation;vf=664>
> 
> For 1000GENOMES:phase_3:AFR allele frequencies: A: 0.097 G: 0.903 genotype frequencies: A|A: 0.126 A|G: 0.338 G|G: 0.536
> For gnomADe:AFR allele frequencies: A: 0.152 G: 0.848 No genotype frequencies
> 
> Our variation endpoint does not return gnomAD frequencies at the moment. We will include the frequencies for the next release.
> 
> For now I would recommend that you use our VEP endpoint
> https://rest.ensembl.org/documentation/info/vep_id_get <https://rest.ensembl.org/documentation/info/vep_id_get>
> Examples:
> https://rest.ensembl.org/vep/human/id/rs769971095?content-type=application/json <https://rest.ensembl.org/vep/human/id/rs769971095?content-type=application/json>
> https://rest.ensembl.org/vep/human/id/rs699?content-type=application/json <https://rest.ensembl.org/vep/human/id/rs699?content-type=application/json>
> 
> The VEP makes use of cache files which store allele frequencies for the 1000 Genomes Project super populations (AFR, AMR, EAS, EUR, SAS) and the gnomAD exome data.
> 
> Please find a list of our populations, their short names and descriptions here:
> http://www.ensembl.org/info/genome/variation/data_description.html#populations <http://www.ensembl.org/info/genome/variation/data_description.html#populations>
> 
> The VEP provides annotations from:
> gnomADe:ALL - All gnomAD exomes individuals
> gnomADe:AFR - African/African American
> gnomADe:AMR - Admixed American
> gnomADe:ASJ - Ashkenazi Jewish
> gnomADe:EAS - East Asian
> gnomADe:FIN - Finnish
> gnomADe:NFE - Non-Finnish European
> gnomADe:OTH - Other
> gnomADe:SAS - South Asian
> 1000GENOMES:phase_3:AFR African
> 1000GENOMES:phase_3:AMR American
> 1000GENOMES:phase_3:EAS East Asian
> 1000GENOMES:phase_3:EUR European
> 1000GENOMES:phase_3:SAS South Asian
> 
> The populations use the following names in the vep endpoints:
>   - for example for gnomADe:NFE:  gnomad_nfe_maf and  gnomad_nfe_allele
>   - for example for 1000GENOMES:phase_3:AFR: afr_maf and afr_allele
> 
> We have a post vep endpoint which allows you to send a list of variant IDs for annotation. https://rest.ensembl.org/documentation/info/vep_id_post <https://rest.ensembl.org/documentation/info/vep_id_post>
> 
> I hope that helps you with your use case.
> 
> Anja
> 
> 
>> On 20 Sep 2017, at 22:05, deepak kumar <deepak.k.choubey at gmail.com <mailto:deepak.k.choubey at gmail.com>> wrote:
>> 
>> Hi Anja,
>> 
>> Thank you so much for the reply. It certainly helped me to get to the right direction of my query. However, could you please help me understand a few queries regarding the same:
>> 
>> To start of with, I find the "Rest API" a very clean approach to get variant information.
>> 
>> a) My aim is to find if a SNP (rsID let say rs769971095) share populations, or in other words, if this rsID mutation can be found in more than one population. From the links you provided I see that I can find an answer but am confused between "population allele frequency" and "population genotype frequency". To fulfill my aim, data for this rsID should be taken from "population allele frequency" or "population genotype frequency"?
>> 
>> b) The population name given in the "example output" of the "Rest API" are in short form like 'AMR', 'SAS' etc. Could you please let me know how can i retrieve the full population name for a given rsID?
>> 
>> Thanks much!
>> DK
>> 
>> On Tue, Sep 19, 2017 at 7:22 PM, Anja Thormann <anja at ebi.ac.uk <mailto:anja at ebi.ac.uk>> wrote:
>> Hi DK,
>> 
>> you have a few options of getting allele frequencies for a variant.
>> 
>> You can use
>>     - our perl API: http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles <http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#alleles> (to get you started)
>>     - our REST API: https://rest.ensembl.org/documentation/info/variation_id <https://rest.ensembl.org/documentation/info/variation_id> (to get you started)
>>     - the VEP: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html <https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html> It will allow you to annotate your input variants with frequency data if available
>> 
>> Please feel free to contact us again if you have any questions regarding the above approaches.
>> 
>> Kind regards,
>> Anja
>> 
>> 
>>> On 19 Sep 2017, at 16:06, deepak kumar <deepak.k.choubey at gmail.com <mailto:deepak.k.choubey at gmail.com>> wrote:
>>> 
>>> Dear ALL,
>>> 
>>>  I have been looking for a way to find "which nsSNP (with rs ID number like rs769971095) belong to what population(s), and if possible what gender"? I came to know about the Ensembl "population genetics" for the variants. 
>>> 
>>> I found the respective population genetics info for 2 rsIDs; rs559632360 & rs769971095
>>> 
>>> For "rs769971095" the super-population it shows is: ALL, AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS. 
>>> 
>>> For "rs559632360" the super-population it shows is: ALL, AFR, AMR, EAS, SAS, EUR.
>>> 
>>> 
>>> 
>>> For rs559632360 rsID, it also shows population genetics from "1000 Genomes Project Phase 3 & gnomAD exomes" along with "subpopulation" information, whereas, for rs769971095 it shows only "gnomAD exomes" population genetics.
>>> 
>>> http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093 <http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093>
>>> http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12632759-12633759;v=rs559632360;vdb=variation;vf=92299087#population_freq_SAS <http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12632759-12633759;v=rs559632360;vdb=variation;vf=92299087#population_freq_SAS>
>>> Does this mean that for "rs769971095" there is no "1000 genomes project phase 3" data available? 
>>> 
>>> I am interested to know if these two rsIDs belong to one population, so, can it be said that these rsIDs share same population? If yes, what population they share? It would be great if I could know how to make a reasonable interpretation for this.
>>> 
>>> Also, I need to do this for many rsIDs, could you please let me know how this process can be automated? Where, I can generate results like this:
>>> 
>>> 
>>> 
>>> rsID                  Super-Population with allele frequencies          Sub-population
>>> 
>>> rs769971095     ALL, AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS.      .......etc
>>> 
>>> rs559632360      ALL, AFR, AMR, EAS, SAS, EUR                                ......etc 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Thanks much! DK
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170926/9eb6c23a/attachment.html>


More information about the Dev mailing list