[ensembl-dev] inconsistency when mapping the same variant from protein to genomics vs genomics to protein

David Tamborero david.tamborero at gmail.com
Sun Dec 30 02:22:08 GMT 2018


(Sorry for the late response, i m currenrly with no much internet access)

Thanks Andrew for your answer. I m a bit surprised of the general lack of
tools addressing these issues. Maybe it is not that required by the
community, although i would say the contrary.

In any case, i will be tuned to see whether your next releases can address
some of them.

Thanks again!
Br
D


El vie., 21 dic. 2018 20:57, Andrew Parton <aparton at ebi.ac.uk> escribió:

> Hi David,
>
> One of the improvements that we could make that would make this process a
> little easier would be if variant_recoder gave a VCF output, this is
> something that we will look into. Thanks.
>
> VEP could definitely do a better job of predicting HGVSg from HGVSp.
> Officially, we require that input HGVS is relative to genomic or transcript
> coordinates. VEP and Variant Recoder will successfully convert from HGVSp
> to HGVSg sometimes, but as you’ve noticed, there are distinct improvements
> that we can make. And while the ability of variant recoder to convert from
> HGVSp will improve over time, and I’ve added your comments to our list of
> things to look at in the future, but I can’t guarantee when or even if
> it’ll happen.
>
> Kind Regards,
> Andrew
>
>
>
>
> On 17 Dec 2018, at 15:24, David Tamborero <david.tamborero at gmail.com>
> wrote:
>
> thanks for your answer!
>
> mmm i understand that the protein representation can lead to a
> non-univocal genomic mapping, but i m unsure of why VEP tries to infer the
> genomic coordinates without considering the passed aminoacid of reference,
> (if this is what is happening !). Note that this particular aminoacid
> change (TP53:p.E285V) maps to a unique genomic missense mutation in all
> TP53 transcripts.
>
> FYI (likely you know it), but when the mapping is not univocal, is not
> uncommon for other tools dealing with HGVS to give a guess --which is
> normally the 'most probable' based on different metrics-- as a first 'hit'
> (and detail the rest). This is specially needed when dealing with indels.
>
> Although maybe this is too complicated for VEP. However, I m still not
> finding a good way for --by using your tools-- passing from HGVS protein
> representation to genomic coordinates (in 'vcf format', meaning chr pos ref
> alt). This is not an uncommon need in the field. If I may use this forum to
> ask, are you planning to support that in e.g. one of your API (i.e. like
> the 'hgvs conversor' but supporting the vcf-like output)?
>
> many thanks for your time (and your work!)
> best regards from Stockholm
> d
>
> El dom., 16 dic. 2018 a las 14:19, Andrew Parton (<aparton at ebi.ac.uk>)
> escribió:
>
>> Hi David,
>>
>> I’ve taken a look at this issue this morning and I think I can see what’s
>> going on. I can reproduce this issue with the query: perl vep -id
>> 'TP53:p.E285V' --database --force_overwrite --hgvs --port 3337
>>
>> VEP guesses the genomic location based on this HGVS input (17:7565261),
>> and identifies that overlapping transcript ENST00000413465 has a protein
>> product. However, the 285th amino acid of this transcript is not E, but Y.
>> The alternate allele is guessed by VEP from a collection of options that it
>> has. For example, with the input HGVS 'TP53:p.M237I’, then VEP has 3
>> potential alternate alleles it can use to do this, by converting the given
>> ATG to one of ATA, ATC or ATT.
>>
>> While VEP supports HGVS input, due to the complexity of HGVS and the
>> variety of ways in which people use it, then we require that input HGVS is
>> relative to genomic or transcript coordinates. In protein cases, we give a
>> best guess where we can, but this is not guaranteed.
>>
>> Sorry that I couldn’t be of more help.
>>
>> Kind Regards,
>> Andrew
>>
>>
>> On 14 Dec 2018, at 18:00, David Tamborero <david.tamborero at gmail.com>
>> wrote:
>>
>> Hi there,
>>
>> regarding the conversion from protein to genomic representation supported
>> by VEP, I ve found a funny case; if I input
>>
>> TP53:p.E285V
>>
>> VEP gives as output (vcf format)
>>
>> 17    7565261    TP53:p.E285V    T    A
>>
>> And then if I input to VEP that vcf entry,  I obtain  two TP53 protein
>> annotations:
>>
>> downstream_gene_variant for ENST00000359597
>> missense_variant for ENST00000413465
>>
>> However, the missense variant is annotated as 285 Y/F   (and not the E/V
>> that I had at the start !)
>>
>> so it looks that some inconsistency happened here, not sure why. Am I
>> missing some point ?
>>
>> thanks in advance!
>> d
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20181230/63fafb75/attachment.html>


More information about the Dev mailing list