[ensembl-dev] inconsistency when mapping the same variant from protein to genomics vs genomics to protein

David Tamborero david.tamborero at gmail.com
Mon Dec 17 15:24:12 GMT 2018


thanks for your answer!

mmm i understand that the protein representation can lead to a non-univocal
genomic mapping, but i m unsure of why VEP tries to infer the genomic
coordinates without considering the passed aminoacid of reference, (if this
is what is happening !). Note that this particular aminoacid change
(TP53:p.E285V) maps to a unique genomic missense mutation in all TP53
transcripts.

FYI (likely you know it), but when the mapping is not univocal, is not
uncommon for other tools dealing with HGVS to give a guess --which is
normally the 'most probable' based on different metrics-- as a first 'hit'
(and detail the rest). This is specially needed when dealing with indels.

Although maybe this is too complicated for VEP. However, I m still not
finding a good way for --by using your tools-- passing from HGVS protein
representation to genomic coordinates (in 'vcf format', meaning chr pos ref
alt). This is not an uncommon need in the field. If I may use this forum to
ask, are you planning to support that in e.g. one of your API (i.e. like
the 'hgvs conversor' but supporting the vcf-like output)?

many thanks for your time (and your work!)
best regards from Stockholm
d

El dom., 16 dic. 2018 a las 14:19, Andrew Parton (<aparton at ebi.ac.uk>)
escribió:

> Hi David,
>
> I’ve taken a look at this issue this morning and I think I can see what’s
> going on. I can reproduce this issue with the query: perl vep -id
> 'TP53:p.E285V' --database --force_overwrite --hgvs --port 3337
>
> VEP guesses the genomic location based on this HGVS input (17:7565261),
> and identifies that overlapping transcript ENST00000413465 has a protein
> product. However, the 285th amino acid of this transcript is not E, but Y.
> The alternate allele is guessed by VEP from a collection of options that it
> has. For example, with the input HGVS 'TP53:p.M237I’, then VEP has 3
> potential alternate alleles it can use to do this, by converting the given
> ATG to one of ATA, ATC or ATT.
>
> While VEP supports HGVS input, due to the complexity of HGVS and the
> variety of ways in which people use it, then we require that input HGVS is
> relative to genomic or transcript coordinates. In protein cases, we give a
> best guess where we can, but this is not guaranteed.
>
> Sorry that I couldn’t be of more help.
>
> Kind Regards,
> Andrew
>
>
> On 14 Dec 2018, at 18:00, David Tamborero <david.tamborero at gmail.com>
> wrote:
>
> Hi there,
>
> regarding the conversion from protein to genomic representation supported
> by VEP, I ve found a funny case; if I input
>
> TP53:p.E285V
>
> VEP gives as output (vcf format)
>
> 17    7565261    TP53:p.E285V    T    A
>
> And then if I input to VEP that vcf entry,  I obtain  two TP53 protein
> annotations:
>
> downstream_gene_variant for ENST00000359597
> missense_variant for ENST00000413465
>
> However, the missense variant is annotated as 285 Y/F   (and not the E/V
> that I had at the start !)
>
> so it looks that some inconsistency happened here, not sure why. Am I
> missing some point ?
>
> thanks in advance!
> d
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20181217/2e81f400/attachment.html>


More information about the Dev mailing list