[ensembl-dev] Fwd: Problems in using VEP to convert variants from protein to genomic coordinates

David Tamborero david.tamborero at gmail.com
Fri Dec 7 18:39:28 GMT 2018


Hi Andrew,

I have been already in contact with someone from the VEP Ensembl team, and
she indeed super kindly addressed some of my issues and keep me updated of
your roadmap about HGVS conversions. I however thought that my question
here was not part of that 'todo' list (and that the whole forum may benefit
from the potential answer), apologies for the double check, it was not on
purpose.

In any case, and just FYI, I m using VEP for that conversions because i
need the result in 'vcf  representation', meaning that i need
chr-pos-ref-alt specifically stated (eg I would need additional steps to
retrieve the ref from the 'g.32336340_32336341delinsAA' result). I have
been not able to find a tool supporting that among your tools (hope I m
wrong?).

Thanks for your answer anyway!
and have a good weekend
d

El vie., 7 dic. 2018 a las 18:16, Andrew Parton (<aparton at ebi.ac.uk>)
escribió:

> Hi David,
>
> Thanks for your query - while VEP supports HGVS input, due to the
> complexity of HGVS and the variety of ways in which people use it, then we
> require that input HGVS is relative to genomic or transcript coordinates.
> We have some documentation (that we are in the process of improving) on the
> matter here:
> http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#hgvs
>
> It may be more appropriate for you to use our variant recoder tool, also
> contained within the ensembl-vep repository, rather than VEP itself, as
> this will give you HGVSg output. However, without a particular transcript
> within the input, then it is possible that the variant could map to
> multiple locations. Variant Recoder will often guess at a solution (in the
> BRCA2 example you gave then it suggests
> NC_000013.11:g.32336340_32336341delinsAA), however as you have noticed,
> unless the correct genomic/transcript nomenclature is used then a result
> cannot be guaranteed.
>
> We are currently in the process of improving how we handle HGVS inputs,
> however this work is still in development. Sorry that I couldn’t give you a
> more beneficial response.
>
> Kind Regards,
> Andrew
>
>
> On 5 Dec 2018, at 09:27, David Tamborero <david.tamborero at gmail.com>
> wrote:
>
> Hi there,
>
> I m badly interested in converting protein changes to genomic changes via
> the VEP  standalone perl script. Of note, I have the gene (but not the
> specific transcript) in which such a change is annotated (eg NRAS:Q61L).
>
> I m experiencing some problems that I cannot figure it out how to solve; I
> ve tried to find any documentation to address those in the dev list
> archives etc but I failed. So I hope this is a good way to look for help
> (thanks in advance!).
>
> I m using VEP 93.3 with hg19 (I do not think that the vep parameters are
> relevant, but note that I m using vcf as output format since I need the
> coordinates in chr-pos-ref-alt   ---and not HGVS---  format).
>
> -- insertion of stop codons--
>
> so for cases as CDKN2A:p.Q50* or PTEN:p.R233*, VEP works smooth; however,
> for other cases (eg BRCA2:p.S662*) VEP does not seem happy and does not
> give an output.
>
> My guess is that this has to be with the fact that the ones that work are
> stop codons caused by a specific nucleotide change (eg. CDKN2A:p.Q50* ->
> chr10 89717672 C>T) whereas those that do not work are caused by
> insertions, which I guess it gives a larger universe of possible nucleotide
> changes.
>
> However, why not to have a 'more possible' guess? e.g. for BRCA2:p.S662*,
> TransVar gives a fair chr13:g.32910477_32910478delCTinsAA try.
>
> --indels--
>
> here I m failing badly. I m not even sure that I m using the correct
> nomenclature. For instance, for frameshifts, DTX1:p.P11fs*2 is giving a
> 'Unable to parse HGVS notation' error. I went to the variant recoder API
> REST, and this specific variant in cdna nomenclature (which i retrieved
> also by TransVar), is represented as "ENSP00000257600.3:p.Pro11LeufsTer2"
> in protein coordinates
>
>
> https://rest.ensembl.org/variant_recoder/human/DTX1:c.32delC?content-type=application/json
>
> but when I input this representation to the VEP command, I m having the
> format error anyway, even after changing ENSP00000257600 for the gene
> (DTX1) or transcript (ENST00000257600) name plus some alterntives to the p.Pro11LeufsTer2
> representation.
>
> Am I missing some point?
>
> many thanks in advance (and congratulations for such a great tool!)
> br
> d
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20181207/78e976e2/attachment.html>


More information about the Dev mailing list