[ensembl-dev] VEP 84 - question about output from flag_pick/pick_order

Black-Ziegelbein, Elizabeth A elizabeth-black at uiowa.edu
Mon Apr 4 15:56:47 BST 2016


Thanks so much Will!  That helps explain it & I will try your suggestions.

Take care,

Ann
Ann Black-Ziegelbein
Senior Application Developer
Molecular Otolaryngology and Renal Research Laboratories
University of Iowa

From: <wmclaren at gmail.com<mailto:wmclaren at gmail.com>> on behalf of Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>>
Date: Monday, April 4, 2016 at 6:02 AM
To: Ann Black-Ziegelbein <elizabeth-black at uiowa.edu<mailto:elizabeth-black at uiowa.edu>>
Cc: "dev at ensembl.org<mailto:dev at ensembl.org>" <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: VEP 84 - question about output from flag_pick/pick_order

Hi Ann,

I notice you are using the merged cache. This contains a merge of two gene sets, the one from Ensembl and the one from RefSeq. Both of these sets have, per gene, a canonical transcript assigned.

The VEP has no way to determine which of these it is you would prefer to see annotated, so the canonical transcript from each of the sets are considered equal. This means the next comparator is used to split them (rank in your case), and since these will likely be equal too, a random one is chosen.

I'd suggest either using a non-merged cache (choose either Ensembl or RefSeq, or perhaps run both independently?), or add some other comparators to your --pick_order flag to help distinguish.

Hope that helps.

Will McLaren
Ensembl Variation

On 1 April 2016 at 16:41, Black-Ziegelbein, Elizabeth A <elizabeth-black at uiowa.edu<mailto:elizabeth-black at uiowa.edu>> wrote:
Good morning,

I am using a local install of VEP 84.  We are leveraging the —flag_pick_allele and —pick_order options.

This is an example of how we are running VEP:


perl variant_effect_predictor.pl<http://variant_effect_predictor.pl> --offline --flag_pick_allele -pick_order canonical,rank  --merged --dir_cache variant_effect_predictor/cache-dir -i CDH23.1kg.phase3.v5a.EUR.NO-GT.SPLIT-LFT_ALGN.vcf.gz --plugin CADD,whole_genome_SNVs.tsv.gz,InDels.tsv.gz --vcf -o CDH23.1kg.phase3.v5a.EUR.NO-GT.SPLIT-LFT_ALGN.VEP-CADD.vcf --stats_file CDH23.1kg.phase3.v5a.EUR.NO-GT.SPLIT-LFT_ALGN.VEP-CADD.html —force_overwrite



I noticed that in annotating some of the variants, it does not seem to select the transcript  using my pick order as I would expect.  I am assuming that the canonical transcript is defined by: http://www.ensembl.org/Help/Glossary?id=346

Example Variants:


10 73558128rs41281334 GA .PASS AC=34;AN=1006

10 73558886rs4747194 GA .PASS AC=280;AN=1006



The annotation provided for 10:73558128 (rs41281334) is as follows.  The picked transcript is NM_022124.5 (which is what I expected since it is the canonical transcript according to the UCSC table query, and had high rank)


A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_022124.5|protein_coding|50/70||||7237|6847|2283|V/I|Gtc/Atc|||1||1||||4.949|0.225802

A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171934.1|protein_coding|3/22||||444|127|43|V/I|Gtc/Atc|||1||||||4.949|0.225802

A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171933.1|protein_coding|3/23||||444|127|43|V/I|Gtc/Atc|||1||||||4.949|0.225802

A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000224721|protein_coding|49/69||||6867|6862|2288|V/I|Gtc/Atc|||1|||HGNC|13733||4.949|0.225802

A|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|CDH23|ENSG00000107736|Transcript|ENST00000475158|processed_transcript|2/21||||383|||||||1|||HGNC|13733||4.949|0.225802

A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000398788|protein_coding|3/23||||444|127|43|V/I|Gtc/Atc|||1|||HGNC|13733||4.949|0.225802

The annotation provided for 10: 73558886 (rs4747194) is as follows.  The picked transcript is ENST00000398788.  QUESTION: Why was it not canonical transcript NM_022124.5 which has the same rank?


A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171934.1|protein_coding|4/22||||670|353|118|R/Q|cGg/cAg|||1||||||21.7|2.866040

A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171933.1|protein_coding|4/23||||670|353|118|R/Q|cGg/cAg|||1||||||21.7|2.866040

A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000224721|protein_coding|50/69||||7093|7088|2363|R/Q|cGg/cAg|||1|||HGNC|13733||21.7|2.866040

A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000398788|protein_coding|4/23||||670|353|118|R/Q|cGg/cAg|||1||1|HGNC|13733||21.7|2.866040

A|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|CDH23|ENSG00000107736|Transcript|ENST00000475158|processed_transcript|3/21||||609|||||||1|||HGNC|13733||21.7|2.866040

A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_022124.5|protein_coding|51/70||||7463|7073|2358|R/Q|cGg/cAg|||1||||||21.7|2.866040



Thanks so much for your help.  Please let me know if I need to post to an alternate forum.


Ann



Ann Black-Ziegelbein
Senior Application Developer
Molecular Otolaryngology and Renal Research Laboratories
University of Iowa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160404/80c16eb3/attachment.html>


More information about the Dev mailing list