[ensembl-dev] Change in consequence and HGVSp between VEP v99 and v100

Wallace Ko myko at l3-bioinfo.com
Mon Jan 24 06:54:38 GMT 2022


Hello Nakib,

After double checking, I found that my previous claim was wrong due to
incorrect paths being used.
I have done tests again with 5 old versions of VEP (92.6, 93.5, 94.5, 99.2,
100.4) and 3 different versions of GRCh37 RefSeq cache (92, 99, 100)
Below are the observations:

   - Example 1 (10-97192324-A-G, NM_001034954.1:c.182T>C)
      - For cache version ≤ 99:
         - HGVSp: NP_001030126.1:p.Pro61=
         - Consequence: synonymous_variant
      - For cache version 100:
         - HGVSp: NP_001030126.1:p.Leu61Pro
         - Consequence: missense_variant
      - Example 2 (4-648595-G-GTGTCTTCTGCTTCTCAGGAAAT,
   NM_000283.3:c.928-9_940dup)
      - For ensembl-vep version ≤ 99:
         - HGVSp: NP_000274.2:p.Tyr314CysfsTer50
         - Consequence: intron_variant
      - For ensembl-vep version = 100:
         - HGVSp: -
         - Consequence: intron_variant
      - Example 3 (14-75483904-T-TCTATGGGAAGAAAGAATAACTTCAATTAGCAATATG,
   NM_001040108.1:c.4243-36_4243-1dup)
      - Fro ensembl-vep version ≤93:
         -
         HGVSp:
NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTerSerTyrSerPhePheProTer
         - Consequence: stop_gained,inframe_insertion,splice_region_variant
      - For ensembl-vep version ≥94:
         - HGVSp: NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTer
         - Consequence: stop_gained,inframe_insertion,splice_region_variant


For example 1, I would assume that the difference is due to the updated
RefSeq cache. Therefore, I believe I will inevitably have to reanalyze all
the samples with the latest VEP.
For examples 2 and 3, I believe the differences are due to code change,
though I am not not sure which changes would explain the differences.

Below are the example input file and test commands anyway:

test_cases.vcf (spaces are tabs):

##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
10 97192324 example_1 A G . . .
4 648595 example_2 G GTGTCTTCTGCTTCTCAGGAAAT . . .
14 75483904 example_3 T TCTATGGGAAGAAAGAATAACTTCAATTAGCAATATG . . .


Test commands:

VEP_VERSIONS="92.6 93.5 94.5 99.2 100.4"
CACHE_VERSIONS="92 99 100"
for vep in $VEP_VERSIONS; do
  for cache in $CACHE_VERSIONS; do
    ./ensembl-vep-release-${vep}/vep --cache --offline --force_overwrite
--tab --format vcf --no_stats --no_headers \
    --refseq --exclude_predicted --hgvs --total_length --biotype
--canonical --pick --pick_order length,canonical,biotype,rank \
    --fields
Uploaded_variation,CANONICAL,BIOTYPE,Consequence,CDS_position,HGVSc,HGVSp,PICK
\
    --fasta data/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --dir data
--cache_version ${cache} \
    -i test_cases.vcf -o test_cases_vep_${vep}_${cache}.txt
  done
done


Regards,
Wallace Ko


On Fri, Jan 21, 2022 at 11:23 PM Syed Hossain <snhossain at ebi.ac.uk> wrote:

> Hello Wallace,
>
> Thank you for reporting this issue. But, unfortunately, we were unable
> to reproduce it.
>
> For our case VEP v99 is also giving the same output as what you get for
> VEP v100 (i.e - HGVSc: NM_001034954.1:c.182T>C --> HGVSp:
> NP_001030126.1:p.Leu61Pro).
>
> Can you kindly share the full command you are using and (if possible)
> the input files.
>
> Best regards,
> Nakib
>
> On 2022-01-20 13:02:47 +0800, Wallace Ko wrote:
> > Hello Ensembl Team,
> >
> > During re-evaluation of some old samples, we found that VEP v99 and
> > v100
> > give the same HGVSc, but different annotation, including consequence
> > and
> > HGVSp, e.g.:
> >
> >    1. HGVSc: NM_001034954.1:c.182T>C
> >       - v99:
> >          - HGVSp: NP_001030126.1:p.Pro61=
> >          - Consequence: synonymous_variant
> >       - v100:
> >          - HGVSp: NP_001030126.1:p.Leu61Pro
> >          - Consequence: missense_variant
> >       2. HGVSc: NM_000283.3:c.928-9_940dup
> >    - v99 HGVSp:  NP_000274.2:p.Tyr314CysfsTer50
> >       - v100 HGVSp:  -
> >    3. HGVSc: NM_001040108.1:c.4243-36_4243-1dup
> >       - v99 HGVSp:
> >
> > NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTerSerTyrSerPhePheProTer
> >       - v100 HGVSp:  NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTer
> >
> > I tried using offline VEP v100 with v99 cache and vice versa. VEP v100
> > produces the new results regardless of the cache being used. So it
> > should
> > be due to code change instead of data update.
> >
> > The change is sort of significant but there seem no relevant details in
> > the
> > release descriptions (
> > https://github.com/Ensembl/ensembl-vep/releases?q=release%2F100) or the
> > mail list about the change.
> >
> > I wonder what causes such changes and if there is a scope or pattern on
> > the
> > changes. Otherwise, we may need to re-analyze all results from VEP v99
> > or
> > older.
> >
> > Thank you.
> >
> > Regards,
> > Wallace Ko
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > <
> http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20220120/011e8c23/attachment-0001.html
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info:
> > https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> > Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20220124/368f464b/attachment.html>


More information about the Dev mailing list