[ensembl-dev] VEP missing annotation for intergenic-corrected variants

William McLaren wm2 at ebi.ac.uk
Mon Nov 27 15:30:09 GMT 2017


Hi again,

The missing BAM_EDIT field will be fixed in release/91 of VEP, due in a few weeks.

https://github.com/Ensembl/ensembl-vep/commit/d1204ccc0a17b8056a8a789ed29f1c5e434ed21c

Regards

Will


On 23 November 2017 at 9:42:21 am, William McLaren (wm2 at ebi.ac.uk) wrote:

Sorry, I wasn’t clear - BAM_EDIT should appear in the VCF output, but I think there’s a bug that means it isn’t being included.

Will


On 23 November 2017 at 7:30:00 am, Luke Goodsell (l.goodsell at achillestx.com) wrote:

Hi Will,

 

Thanks for the information and suggestion. How do I find the BAM_EDIT status in the VCF annotation?

 

Kind regards,

Luke

 

From: Dev <dev-bounces at ensembl.org> on behalf of William McLaren <wm2 at ebi.ac.uk>
Reply-To: Ensembl developers list <dev at ensembl.org>
Date: Wednesday, 22 November 2017 at 13:20
To: Ensembl developers list <dev at ensembl.org>
Subject: Re: [ensembl-dev] VEP missing annotation for intergenic-corrected variants

 

Hi Luke

 

On 22 November 2017 at 11:41:43 am, Luke Goodsell (l.goodsell at achillestx.com) wrote:

Hi Will, 

Thanks for fixing the bug so quickly. I have tested the variant with your change and it now annotates correctly for me. 

If I understand correctly: 

* Using EnsEMBL’s corrected alignment of the RefSeq transcripts (as a consequence of the implicit use of ‘--use_transcript_ref’ from the new BAM-containing VEP cache), these variants hit NBPF1 and LOC645166, but are synonymous and non-coding respectively. 

Correct - these consequence calls really now just represent the location of the called variant, not the impact of any allele change (since there isn’t one!)



* Using RefSeq’s alignment of the transcripts (when forced with ‘--use_given_ref’) they are annotated as missense and non-coding respectively. 

—use_given_ref forces VEP to use your input allele (i.e. the one from the genome) to call the consequences in place of the one from the RefSeq transcript. This annotation should be considered invalid in this case, as any consequence calls would be made on incorrect data (at least for that transcript).



* The variants have been flagged with ‘rseq_cds_mismatch’, so the transcripts’ sequences don’t match the reference genome. 

Correct; this system was used by VEP before we provided the BAM-edited transcripts.



* When I use default output rather than VCF, the BAM_EDIT status is “OK”, so the sequence of the corrected model matches that in the BAM alignment. 

Correct. In very rare cases the BAM edit can fail; these are flagged as such to warn the user there may be errors in any derived annotation.



* When using ‘--use_transcript_ref’, I can check the GIVEN_REF and USED_REF annotations to see if the reference base has changed as a result of the corrected alignment. 

Correct again.



Remaining questions: 

1. Does EnsEMBL have any recommendations as to what to do in the event of differing alignments? Use RefSeq/EnsEMBL/caution? The default use of corrected alignments suggests that you advise such. 

This is an interesting quandary, and we don’t as yet have any formal recommendations. Almost all variant calls fed to VEP are from resequencing experiments where the reference genome has been used to call variants. Ensembl transcript models are built from the reference genome, so it is fully valid to use these variant calls as is to predict how they affect Ensembl transcript models.

RefSeq’s transcript models are built from primary sequence evidence without necessarily referring to the reference genome (and the inherent biases in it), so there is an argument that in some cases it can be more suitable to use these models. However, because variants are typically called using the reference genome, mapping those calls to non-genome-based transcript models can be potentially invalid.



2. If variants subject to differing consequences should be handled with caution, is there an easy way to identify them without running VEP twice? The last point above only tells me if the specific reference base is the same, not if it’s in the same position in the transcript, for example. 

I would not advise combining interpretation across VEP runs with —use_given_ref vs —use_transcript_ref. I would, however, advise using the merged cache containing both Ensembl and RefSeq transcripts, as this allows you to compare annotations between transcript sets, and follow up with more detailed investigation in those cases where perhaps they disagree and there is potential for an effect on your interpretations and conclusions.



Incidentally, it’d be useful to be able to get the BAM_EDIT field in VCF output, too. 

This should be there; I’ll take a look.

Cheers

Will



Kind regards, 
Luke 

From: Dev <dev-bounces at ensembl.org> on behalf of William McLaren <wm2 at ebi.ac.uk> 
Reply-To: Ensembl developers list <dev at ensembl.org> 
Date: Tuesday, 21 November 2017 at 17:21 
To: Ensembl developers list <dev at ensembl.org> 
Subject: Re: [ensembl-dev] VEP missing annotation for intergenic-corrected variants 

Hi Luke, 

Thanks for the report. This was occurring because your input variant has an ALT allele that matches the reference base in the RefSeq transcript; VEP was then dismissing this as non-variant so not producing any output. See [1] for more info on how VEP deals with RefSeqs that do not match the reference genome. 

I've patched a fix to the ensembl-variation repo which contains the code for handling this; if you re-run INSTALL.pl you should be able to pick up the fix. 

We’ll get the web code updated to match hopefully tomorrow. 

Regards 

Will McLaren 
Ensembl Variation 

[1]: http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq 

On 21 November 2017 at 2:34:55 pm, Luke Goodsell (l.goodsell at achillestx.com) wrote: 
Apologies for the scrambled VCF data. Here it is with spaces for tabs: 

#CHROM POS ID REF ALT QUAL FILTER INFO 
chr1 16903882 . T C . . . 
chr1 148932885 . C T . . . 

Kind regards, 
Luke 

On 21/11/2017, 14:32, "Dev on behalf of Luke Goodsell" <dev-bounces at ensembl.org on behalf of l.goodsell at achillestx.com> wrote: 

Hi, 

I have a couple of variants (listed below) that are missing output when run through vep v90.7 for GRCh37 RefSeq transcripts with the latest cache. 

#CHROMPOSIDREFALTQUALFILTERINFO 
chr116903882.TC... 
chr1148932885.CT... 

Example command: 

vep --input_file snvs.vcf --output_file snvs_annotated.vcf --dir_cache [PATH] --fasta [PATH] --cache --offline --assembly "GRCh37" --refseq --vcf 

The output file is exactly the same as the input but with vep’s header lines added. I would expect at least something like “CSQ=C|intergenic_variant|MODIFIER||||||||||||||||||||||||||||” to be added to the INFO field to show that the variant was passed through vep. 

I have reproduced this with the online VEP interface (https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgrch37.ensembl.org%2FHomo_sapiens%2FTools%2FVEP&data=02%7C01%7Cl.goodsell%40achillestx.com%7C06520598b7a04c5c832108d530ecaf4c%7C6681f8afefec4f58b633944e0b80eb58%7C0%7C0%7C636468715487973256&sdata=nCL2VQoHuvQKkUOdjflkHT5A4KK3iuauWga1ioNoZAw%3D&reserved=0 ; select “RefSeq transcripts” and set “Get regulatory region consequences” to “No”). 

Comparing the positions in EnsEMBL’s genome browser and UCSC’s, these regions appear to be intergenic in EnsEMBL while in UCSC theyo hit RefSeq transcripts - a pseudogene (LOC645166) and a protein-coding gene (NBPF1) respectively. If I add the “--use_given_ref” flag, vep reports the same transcripts as UCSC. However, this raises two questions: 

1. Is adding the “--use_given_ref” flag the right thing to do? Vep reports mismatches (“rseq_mrna_nonmatch&rseq_5p_mismatch&rseq_cds_mismatch&rseq_3p_mismatch&rseq_ens_no_match”), which suggests that this is not the best mapping for the transcript and hence why EnsEMBL’s alignment is different. 

2. If I don’t add the “--use_given_ref” flag, why isn’t VEP reporting these variants as intergenic? 

Kind regards, 
Luke 

This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK. 
_______________________________________________ 
Dev mailing list Dev at ensembl.org 
Posting guidelines and subscribe/unsubscribe info: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ensembl.org%2Fmailman%2Flistinfo%2Fdev&data=02%7C01%7Cl.goodsell%40achillestx.com%7C06520598b7a04c5c832108d530ecaf4c%7C6681f8afefec4f58b633944e0b80eb58%7C0%7C0%7C636468715487973256&sdata=p1feCi5bUXNnVUui6bmDrG0NnTfhFSiDRDxv65ZolBM%3D&reserved=0 
Ensembl Blog: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ensembl.info%2F&data=02%7C01%7Cl.goodsell%40achillestx.com%7C06520598b7a04c5c832108d530ecaf4c%7C6681f8afefec4f58b633944e0b80eb58%7C0%7C0%7C636468715487973256&sdata=pO6uWQUmnm9tLVK3bBjXxZi8RzSgxABgsxVX9V8Vh14%3D&reserved=0 



This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK. 
_______________________________________________ 
Dev mailing list Dev at ensembl.org 
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev 
Ensembl Blog: http://www.ensembl.info/ 


This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK. 
_______________________________________________
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK. _______________________________________________
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20171127/0441478f/attachment.html>


More information about the Dev mailing list