[ensembl-dev] Automate the SNP variant result from "population genetics"
deepak kumar
deepak.k.choubey at gmail.com
Tue Oct 3 16:24:39 BST 2017
Hi Anja,
Thank you so much for the clarification! I got most of the results by
following your suggestions. There is only one query left which I would
really appreciate if you could give your suggestions on:
How to get the reference allele and the mutant allele of a rsID? Let say
for rsID "rs80357307" what is the reference allele and the mutant allele? I
am trying to automate this for many rsIDs, so your suggestion would really
help.
Thanks much!
DK
On Tue, Sep 26, 2017 at 1:50 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
> Hi DK,
>
> for a) you can use our lookup endpoint
> https://rest.ensembl.org/documentation/info/lookup
>
> for your example:
> https://rest.ensembl.org/lookup/id/NM_007299.3?content-
> type=application/json;expand=1;utr=1
>
>
> for b) you can use the ID overlap endpoint:
> https://rest.ensembl.org/documentation/info/overlap_id
>
> In order to get all overlapping variants for a transcript you use the
> endpoint like this:
> https://rest.ensembl.org/overlap/id/NM_007299.3?feature=variation;content-
> type=application/json
>
> To only return variants in 5’ and 3’ UTRs you can add additional filters
> to the request:
> https://rest.ensembl.org/overlap/id/NM_007299.3?
> feature=variation&so_term=3_prime_UTR_variant;so_term=5_
> prime_UTR_variant;content-type=application/json
>
>
> For each variant in our database we compute its consequence on overlapping
> transcripts. We use sequence ontology terms for describing the
> consequences. You can find a ranked list of all the SO terms we assign
> here: http://www.ensembl.org/info/genome/variation/predicted_data.html#
> consequences
>
> We do compute 5_prime_UTR_variant and 3_prime_UTR_variant consequences
> which allows you to filter for only those variants.
>
> However, for each variant that we return in the overlap endpoint we only
> report the most severe consequence for the given variant and overlapping
> transcript. Sometimes the returned consequence_type will be different from
> 5_prime_UTR_variant and 3_prime_UTR_variant because the variant is not only
> a 5 prime UTR variant but also for example causes a frameshift. In this
> case the consequence type is frameshift variant.
>
> If you want a detailed list of all the consequences for all variants in a
> 3’ and 5’ region you need to first retrieve all the variants from the
> overlap endpoint and then use the variants as input for the VEP endpoint.
>
> Best,
> Anja
>
>
> On 23 Sep 2017, at 14:17, deepak kumar <deepak.k.choubey at gmail.com> wrote:
>
> Thanks much Anja!
>
> I think Ensembl is a very useful platform for such queries. Am curious for
> this following query below, could you please let me know how can I do this
> using the Ensembl platform:
>
> a) I want to extract the 5' and 3' UTRs from the mRNA of BRCA1 and BRCA2.
> For instance information 5' & 3' UTR for the refseq geneid "NM_007299.3"
> of BRCA1
>
>
> b) Also, find the position of the SNPs (rsIDs) in the 5' and 3' UTRs. For
> instance information like this: (for the refseq geneid "NM_007299.3" of
> BRCA1)
>
> refseg-geneID mutant-allele
> position-of-mutation
> NM_007299.3 c to t 400032
>
> Thanks much! Please let me know if something is not clear.
>
>
> On Thu, Sep 21, 2017 at 1:07 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>
>> Hi DK,
>>
>> Let me give you some background on where we get our data for allele
>> frequency or genotype frequency annotations from:
>>
>> We provide allele frequencies and genotype frequencies (where available)
>> from a set of reference populations provided by projects like 1000 Genomes
>> Project, ESP, gnomAD (supersedes ExAC).
>>
>> Only 1000 Genomes provides sample genotypes from which we can compute
>> population genotype frequencies.
>>
>> ESP provides population genotype frequencies. But we don't get a break
>> down of genotypes by sample in the population.
>>
>> gnomAD only provides allele counts in a population.
>>
>> Here is a variant which has annotations from all of the above projects:
>> http://www.ensembl.org/Homo_sapiens/Variation/Population?db=
>> core;r=1:230709548-230710548;v=rs699;vdb=variation;vf=664
>>
>> For 1000GENOMES:phase_3:AFR allele frequencies: A: 0.097 G: 0.903
>> genotype frequencies: A|A: 0.126 A|G: 0.338 G|G: 0.536
>> For gnomADe:AFR allele frequencies: A: 0.152 G: 0.848 No genotype
>> frequencies
>>
>> Our variation endpoint does not return gnomAD frequencies at the moment.
>> We will include the frequencies for the next release.
>>
>> For now I would recommend that you use our VEP endpoint
>> https://rest.ensembl.org/documentation/info/vep_id_get
>> Examples:
>> https://rest.ensembl.org/vep/human/id/rs769971095?content-ty
>> pe=application/json
>> https://rest.ensembl.org/vep/human/id/rs699?content-type=application/json
>>
>> The VEP makes use of cache files which store allele frequencies for the
>> 1000 Genomes Project super populations (AFR, AMR, EAS, EUR, SAS) and the
>> gnomAD exome data.
>>
>> Please find a list of our populations, their short names and descriptions
>> here:
>> http://www.ensembl.org/info/genome/variation/data_descriptio
>> n.html#populations
>>
>> The VEP provides annotations from:
>> gnomADe:ALL - All gnomAD exomes individuals
>> gnomADe:AFR - African/African American
>> gnomADe:AMR - Admixed American
>> gnomADe:ASJ - Ashkenazi Jewish
>> gnomADe:EAS - East Asian
>> gnomADe:FIN - Finnish
>> gnomADe:NFE - Non-Finnish European
>> gnomADe:OTH - Other
>> gnomADe:SAS - South Asian
>> 1000GENOMES:phase_3:AFR African
>> 1000GENOMES:phase_3:AMR American
>> 1000GENOMES:phase_3:EAS East Asian
>> 1000GENOMES:phase_3:EUR European
>> 1000GENOMES:phase_3:SAS South Asian
>>
>> The populations use the following names in the vep endpoints:
>> - for example for gnomADe:NFE: gnomad_nfe_maf and gnomad_nfe_allele
>> - for example for 1000GENOMES:phase_3:AFR: afr_maf and afr_allele
>>
>> We have a post vep endpoint which allows you to send a list of variant
>> IDs for annotation. https://rest.ensembl.org/documentation/info/vep_
>> id_post
>>
>> I hope that helps you with your use case.
>>
>> Anja
>>
>>
>> On 20 Sep 2017, at 22:05, deepak kumar <deepak.k.choubey at gmail.com>
>> wrote:
>>
>> Hi Anja,
>>
>> Thank you so much for the reply. It certainly helped me to get to the
>> right direction of my query. However, could you please help me understand a
>> few queries regarding the same:
>>
>> To start of with, I find the "Rest API" a very clean approach to get
>> variant information.
>>
>> a) My aim is to find if a SNP (rsID let say rs769971095) share
>> populations, or in other words, if this rsID mutation can be found in more
>> than one population. From the links you provided I see that I can find an
>> answer but am confused between "population allele frequency" and
>> "population genotype frequency". To fulfill my aim, data for this rsID
>> should be taken from "population allele frequency" or "population genotype
>> frequency"?
>>
>> b) The population name given in the "example output" of the "Rest API"
>> are in short form like 'AMR', 'SAS' etc. Could you please let me know how
>> can i retrieve the full population name for a given rsID?
>>
>> Thanks much!
>> DK
>>
>> On Tue, Sep 19, 2017 at 7:22 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>>
>>> Hi DK,
>>>
>>> you have a few options of getting allele frequencies for a variant.
>>>
>>> You can use
>>> - our perl API: http://www.ensembl.org/in
>>> fo/docs/api/variation/variation_tutorial.html#alleles (to get you
>>> started)
>>> - our REST API: https://rest.ensembl.org/
>>> documentation/info/variation_id (to get you started)
>>> - the VEP: https://www.ensembl.org/info/docs/tools/vep/script/vep_
>>> options.html It will allow you to annotate your input variants with
>>> frequency data if available
>>>
>>> Please feel free to contact us again if you have any questions regarding
>>> the above approaches.
>>>
>>> Kind regards,
>>> Anja
>>>
>>>
>>> On 19 Sep 2017, at 16:06, deepak kumar <deepak.k.choubey at gmail.com>
>>> wrote:
>>>
>>> Dear ALL,
>>>
>>> I have been looking for a way to find "which nsSNP (with rs ID number
>>> like rs769971095) belong to what population(s), and if possible what
>>> gender"? I came to know about the Ensembl "population genetics" for the
>>> variants.
>>>
>>> I found the respective population genetics info for 2 rsIDs; rs559632360
>>> & rs769971095
>>>
>>> For "rs769971095" the super-population it shows is: ALL, AFR, AMR, ASJ,
>>> EAS, FIN, NFE, OTH, SAS.
>>>
>>> For "rs559632360" the super-population it shows is: ALL, AFR, AMR, EAS,
>>> SAS, EUR.
>>>
>>>
>>> For rs559632360 rsID, it also shows population genetics from "1000
>>> Genomes Project Phase 3 & gnomAD exomes" along with "subpopulation"
>>> information, whereas, for rs769971095 it shows only "gnomAD exomes"
>>> population genetics.
>>>
>>> http://grch37.ensembl.org/Homo_sapiens/Variation/Population?
>>> db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093
>>>
>>> http://grch37.ensembl.org/Homo_sapiens/Variation/Population?
>>> db=core;r=3:12632759-12633759;v=rs559632360;vdb=variation;vf
>>> =92299087#population_freq_SAS
>>>
>>> Does this mean that for "rs769971095" there is no "1000 genomes project
>>> phase 3" data available?
>>>
>>> I am interested to know if these two rsIDs belong to one population, so,
>>> can it be said that these rsIDs share same population? If yes, what
>>> population they share? It would be great if I could know how to make a
>>> reasonable interpretation for this.
>>>
>>> Also, I need to do this for many rsIDs, could you please let me know how
>>> this process can be automated? Where, I can generate results like this:
>>>
>>>
>>> *rsID Super-Population with allele frequencies
>>> Sub-population*
>>>
>>> rs769971095 ALL, AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS.
>>> .......etc
>>>
>>> rs559632360 ALL, AFR, AMR, EAS, SAS, EUR
>>> ......etc
>>>
>>>
>>>
>>> Thanks much! DK
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20171003/942a3423/attachment.html>
More information about the Dev
mailing list