[ensembl-dev] VEP missing annotation for intergenic-corrected variants

William McLaren wm2 at ebi.ac.uk
Tue Nov 21 17:21:05 GMT 2017


Hi Luke,

Thanks for the report. This was occurring because your input variant has an ALT allele that matches the reference base in the RefSeq transcript; VEP was then dismissing this as non-variant so not producing any output. See [1] for more info on how VEP deals with RefSeqs that do not match the reference genome.

I've patched a fix to the ensembl-variation repo which contains the code for handling this; if you re-run INSTALL.pl you should be able to pick up the fix.

We’ll get the web code updated to match hopefully tomorrow.

Regards

Will McLaren
Ensembl Variation

[1]: http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq

On 21 November 2017 at 2:34:55 pm, Luke Goodsell (l.goodsell at achillestx.com) wrote:

Apologies for the scrambled VCF data. Here it is with spaces for tabs:  

#CHROM POS ID REF ALT QUAL FILTER INFO  
chr1 16903882 . T C . . .  
chr1 148932885 . C T . . .  

Kind regards,  
Luke  

On 21/11/2017, 14:32, "Dev on behalf of Luke Goodsell" <dev-bounces at ensembl.org on behalf of l.goodsell at achillestx.com> wrote:  

Hi,  

I have a couple of variants (listed below) that are missing output when run through vep v90.7 for GRCh37 RefSeq transcripts with the latest cache.  

#CHROMPOSIDREFALTQUALFILTERINFO  
chr116903882.TC...  
chr1148932885.CT...  

Example command:  

vep --input_file snvs.vcf --output_file snvs_annotated.vcf --dir_cache [PATH] --fasta [PATH] --cache --offline --assembly "GRCh37" --refseq --vcf  

The output file is exactly the same as the input but with vep’s header lines added. I would expect at least something like “CSQ=C|intergenic_variant|MODIFIER||||||||||||||||||||||||||||” to be added to the INFO field to show that the variant was passed through vep.  

I have reproduced this with the online VEP interface (https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgrch37.ensembl.org%2FHomo_sapiens%2FTools%2FVEP&data=02%7C01%7Cl.goodsell%40achillestx.com%7C06520598b7a04c5c832108d530ecaf4c%7C6681f8afefec4f58b633944e0b80eb58%7C0%7C0%7C636468715487973256&sdata=nCL2VQoHuvQKkUOdjflkHT5A4KK3iuauWga1ioNoZAw%3D&reserved=0 ; select “RefSeq transcripts” and set “Get regulatory region consequences” to “No”).  

Comparing the positions in EnsEMBL’s genome browser and UCSC’s, these regions appear to be intergenic in EnsEMBL while in UCSC theyo hit RefSeq transcripts - a pseudogene (LOC645166) and a protein-coding gene (NBPF1) respectively. If I add the “--use_given_ref” flag, vep reports the same transcripts as UCSC. However, this raises two questions:  

1. Is adding the “--use_given_ref” flag the right thing to do? Vep reports mismatches (“rseq_mrna_nonmatch&rseq_5p_mismatch&rseq_cds_mismatch&rseq_3p_mismatch&rseq_ens_no_match”), which suggests that this is not the best mapping for the transcript and hence why EnsEMBL’s alignment is different.  

2. If I don’t add the “--use_given_ref” flag, why isn’t VEP reporting these variants as intergenic?  

Kind regards,  
Luke  

This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK.  
_______________________________________________  
Dev mailing list Dev at ensembl.org  
Posting guidelines and subscribe/unsubscribe info: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ensembl.org%2Fmailman%2Flistinfo%2Fdev&data=02%7C01%7Cl.goodsell%40achillestx.com%7C06520598b7a04c5c832108d530ecaf4c%7C6681f8afefec4f58b633944e0b80eb58%7C0%7C0%7C636468715487973256&sdata=p1feCi5bUXNnVUui6bmDrG0NnTfhFSiDRDxv65ZolBM%3D&reserved=0  
Ensembl Blog: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ensembl.info%2F&data=02%7C01%7Cl.goodsell%40achillestx.com%7C06520598b7a04c5c832108d530ecaf4c%7C6681f8afefec4f58b633944e0b80eb58%7C0%7C0%7C636468715487973256&sdata=pO6uWQUmnm9tLVK3bBjXxZi8RzSgxABgsxVX9V8Vh14%3D&reserved=0  



This e-mail message contains confidential information intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, please do not disseminate, distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail and then delete and discard all copies of the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Achilles Therapeutics Limited (10167668) is registered in England and Wales. The registered office is at 215 Euston Road, London, NW1 2BE, UK.  
_______________________________________________
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20171121/ceae5084/attachment.html>


More information about the Dev mailing list