[ensembl-dev] Change in consequence and HGVSp between VEP v99 and v100

Syed Hossain snhossain at ebi.ac.uk
Mon Jan 31 17:23:33 GMT 2022


Hello Wallace,

We have looked at the code and it is indeed the code that is causing 
different output between versions.

For Example 2:

 From VEP v100 a new option is introduced "--shift_3prime" which if 
enabled - "Right aligns all variants relative to their associated 
transcripts prior to consequence calculation". Prior to v100 it was 
default nature to shift before HGVSp is calculated.

So, getting back to your example, if you do not use --shift_3prime the 
insertion is happening in 928-9 position which is a non-coding region. 
VEP by default does not return HGVSp if the variant falls within 
un-translated region. That is why you are not getting any HGVSp for VEP 
 > v99.

So, you can use --shift_3prime to get the same result as previous 
versions using > v99. But, do note that consequence will be different 
(frameshift instead of intronic).

For Example 3:

it is more correct to express the representation shorter and that is 
what happening here. HGVSp representation stopped after first Ter was 
found.

NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTerSerTyrSerPhePheProTer

truncates to

NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTer

This was added on v94 code.

Best regards,
Nakib

On 2022-01-25 22:08, Syed Hossain wrote:
> Hello Wallace,
> 
> You are right about Example 1. VEP v100 is giving different result
> because of updated RefSeq cache and the alleles are different.
> 
> We are looking into why the other examples you provided are giving
> different results. We will get back to you once we find out the cause.
> 
> Best regards,
> Nakib
> 
> On 2022-01-24 12:54, Wallace Ko wrote:
>> Hello Nakib,
>> 
>> After double checking, I found that my previous claim was wrong due to
>> incorrect paths being used.
>> I have done tests again with 5 old versions of VEP (92.6, 93.5, 94.5,
>> 99.2, 100.4) and 3 different versions of GRCh37 RefSeq cache (92, 99,
>> 100)
>> Below are the observations:
>> 
>> 	* Example 1 (10-97192324-A-G, NM_001034954.1:c.182T>C)
>> 
>> 	* For cache version ≤ 99:
>> 
>> 	* HGVSp: NP_001030126.1:p.Pro61=
>> 	* Consequence: synonymous_variant
>> 
>> 	* For cache version 100:
>> 
>> 	* HGVSp: NP_001030126.1:p.Leu61Pro
>> 	* Consequence: missense_variant
>> 
>> 	* Example 2 (4-648595-G-GTGTCTTCTGCTTCTCAGGAAAT,
>> NM_000283.3:c.928-9_940dup)
>> 
>> 	* For ensembl-vep version ≤ 99:
>> 
>> 	* HGVSp: NP_000274.2:p.Tyr314CysfsTer50
>> 	* Consequence: intron_variant
>> 
>> 	* For ensembl-vep version = 100:
>> 
>> 	* HGVSp: -
>> 	* Consequence: intron_variant
>> 
>> 	* Example 3 (14-75483904-T-TCTATGGGAAGAAAGAATAACTTCAATTAGCAATATG,
>> NM_001040108.1:c.4243-36_4243-1dup)
>> 
>> 	* Fro ensembl-vep version ≤93:
>> 
>> 	* HGVSp:
>> NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTerSerTyrSerPhePheProTer
>> 	* Consequence: stop_gained,inframe_insertion,splice_region_variant
>> 
>> 	* For ensembl-vep version ≥94:
>> 
>> 	* HGVSp: NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTer
>> 	* Consequence: stop_gained,inframe_insertion,splice_region_variant
>> 
>> For example 1, I would assume that the difference is due to the
>> updated RefSeq cache. Therefore, I believe I will inevitably have to
>> reanalyze all the samples with the latest VEP.
>> For examples 2 and 3, I believe the differences are due to code
>> change, though I am not not sure which changes would explain the
>> differences.
>> 
>> Below are the example input file and test commands anyway:
>> 
>> test_cases.vcf (spaces are tabs):
>> 
>>> ##fileformat=VCFv4.1
>>> 
>>> #CHROM POS ID REF ALT QUAL FILTER INFO
>>> 
>>> 10 97192324 example_1 A G . . .
>>> 
>>> 4 648595 example_2 G GTGTCTTCTGCTTCTCAGGAAAT . . .
>>> 
>>> 14 75483904 example_3 T TCTATGGGAAGAAAGAATAACTTCAATTAGCAATATG . . .
>> 
>> Test commands:
>> 
>>> VEP_VERSIONS="92.6 93.5 94.5 99.2 100.4"
>>> 
>>> CACHE_VERSIONS="92 99 100"
>>> 
>>> for vep in $VEP_VERSIONS; do
>>> 
>>> for cache in $CACHE_VERSIONS; do
>>> 
>>> ./ensembl-vep-release-${vep}/vep --cache --offline
>>> --force_overwrite --tab --format vcf --no_stats --no_headers \
>>> 
>>> --refseq --exclude_predicted --hgvs --total_length --biotype
>>> --canonical --pick --pick_order length,canonical,biotype,rank \
>>> 
>>> --fields
>>> 
>> Uploaded_variation,CANONICAL,BIOTYPE,Consequence,CDS_position,HGVSc,HGVSp,PICK
>>> \
>>> 
>>> --fasta data/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
>>> --dir data --cache_version ${cache} \
>>> 
>>> -i test_cases.vcf -o test_cases_vep_${vep}_${cache}.txt
>>> 
>>> done
>>> 
>>> done
>> 
>> Regards,
>> Wallace Ko
>> 
>> On Fri, Jan 21, 2022 at 11:23 PM Syed Hossain <snhossain at ebi.ac.uk>
>> wrote:
>> 
>>> Hello Wallace,
>>> 
>>> Thank you for reporting this issue. But, unfortunately, we were
>>> unable
>>> to reproduce it.
>>> 
>>> For our case VEP v99 is also giving the same output as what you get
>>> for
>>> VEP v100 (i.e - HGVSc: NM_001034954.1:c.182T>C --> HGVSp:
>>> NP_001030126.1:p.Leu61Pro).
>>> 
>>> Can you kindly share the full command you are using and (if
>>> possible)
>>> the input files.
>>> 
>>> Best regards,
>>> Nakib
>>> 
>>> On 2022-01-20 13:02:47 +0800, Wallace Ko wrote:
>>>> Hello Ensembl Team,
>>>> 
>>>> During re-evaluation of some old samples, we found that VEP v99
>>> and
>>>> v100
>>>> give the same HGVSc, but different annotation, including
>>> consequence
>>>> and
>>>> HGVSp, e.g.:
>>>> 
>>>> 1. HGVSc: NM_001034954.1:c.182T>C
>>>> - v99:
>>>> - HGVSp: NP_001030126.1:p.Pro61=
>>>> - Consequence: synonymous_variant
>>>> - v100:
>>>> - HGVSp: NP_001030126.1:p.Leu61Pro
>>>> - Consequence: missense_variant
>>>> 2. HGVSc: NM_000283.3:c.928-9_940dup
>>>> - v99 HGVSp:  NP_000274.2:p.Tyr314CysfsTer50
>>>> - v100 HGVSp:  -
>>>> 3. HGVSc: NM_001040108.1:c.4243-36_4243-1dup
>>>> - v99 HGVSp:
>>>> 
>>>> 
>>> 
>> NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTerSerTyrSerPhePheProTer
>>>> - v100 HGVSp:
>>> NP_001035197.1:p.Gln1414_Ile1415insHisIleAlaAsnTer
>>>> 
>>>> I tried using offline VEP v100 with v99 cache and vice versa. VEP
>>> v100
>>>> produces the new results regardless of the cache being used. So it
>>> 
>>>> should
>>>> be due to code change instead of data update.
>>>> 
>>>> The change is sort of significant but there seem no relevant
>>> details in
>>>> the
>>>> release descriptions (
>>>> https://github.com/Ensembl/ensembl-vep/releases?q=release%2F100)
>>> or the
>>>> mail list about the change.
>>>> 
>>>> I wonder what causes such changes and if there is a scope or
>>> pattern on
>>>> the
>>>> changes. Otherwise, we may need to re-analyze all results from VEP
>>> v99
>>>> or
>>>> older.
>>>> 
>>>> Thank you.
>>>> 
>>>> Regards,
>>>> Wallace Ko
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL:
>>>> 
>>> 
>> <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20220120/011e8c23/attachment-0001.html>
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/



More information about the Dev mailing list