[ensembl-dev] VEP 84 - question about output from flag_pick/pick_order

Will McLaren wm2 at ebi.ac.uk
Mon Apr 4 12:02:35 BST 2016


Hi Ann,

I notice you are using the merged cache. This contains a merge of two gene
sets, the one from Ensembl and the one from RefSeq. Both of these sets
have, per gene, a canonical transcript assigned.

The VEP has no way to determine which of these it is you would prefer to
see annotated, so the canonical transcript from each of the sets are
considered equal. This means the next comparator is used to split them
(rank in your case), and since these will likely be equal too, a random one
is chosen.

I'd suggest either using a non-merged cache (choose either Ensembl or
RefSeq, or perhaps run both independently?), or add some other comparators
to your --pick_order flag to help distinguish.

Hope that helps.

Will McLaren
Ensembl Variation

On 1 April 2016 at 16:41, Black-Ziegelbein, Elizabeth A <
elizabeth-black at uiowa.edu> wrote:

> Good morning,
>
> I am using a local install of VEP 84.  We are leveraging the
> —flag_pick_allele and —pick_order options.
>
> This is an example of how we are running VEP:
>
> perl variant_effect_predictor.pl --offline --flag_pick_allele -pick_order
> canonical,rank  --merged --dir_cache variant_effect_predictor/cache-dir
> -i CDH23.1kg.phase3.v5a.EUR.NO-GT.SPLIT-LFT_ALGN.vcf.gz --plugin
> CADD,whole_genome_SNVs.tsv.gz,InDels.tsv.gz --vcf -o
> CDH23.1kg.phase3.v5a.EUR.NO-GT.SPLIT-LFT_ALGN.VEP-CADD.vcf --stats_file
> CDH23.1kg.phase3.v5a.EUR.NO-GT.SPLIT-LFT_ALGN.VEP-CADD.html —force_overwrite
>
>
>
> I noticed that in annotating some of the variants, it does not seem to
> select the transcript  using my pick order as I would expect.  I am
> assuming that the canonical transcript is defined by:
> http://www.ensembl.org/Help/Glossary?id=346
>
> Example Variants:
>
> 10 73558128 rs41281334 G A . PASS AC=34;AN=1006
>
> 10 73558886 rs4747194 G A . PASS AC=280;AN=1006
>
>
> The annotation provided for 10:73558128 (rs41281334) is as follows.  The
> picked transcript is NM_022124.5 (which is what I expected since it is
> the canonical transcript according to the UCSC table query, and had high
> rank)
>
>
>
> A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_022124.5|protein_coding|50/70||||7237|6847|2283|V/I|Gtc/Atc|||1||1||||4.949|0.225802
>
>
> A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171934.1|protein_coding|3/22||||444|127|43|V/I|Gtc/Atc|||1||||||4.949|0.225802
>
>
> A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171933.1|protein_coding|3/23||||444|127|43|V/I|Gtc/Atc|||1||||||4.949|0.225802
>
>
> A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000224721|protein_coding|49/69||||6867|6862|2288|V/I|Gtc/Atc|||1|||HGNC|13733||4.949|0.225802
>
>
> A|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|CDH23|ENSG00000107736|Transcript|ENST00000475158|processed_transcript|2/21||||383|||||||1|||HGNC|13733||4.949|0.225802
>
>
> A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000398788|protein_coding|3/23||||444|127|43|V/I|Gtc/Atc|||1|||HGNC|13733||4.949|0.225802
>
> The annotation provided for 10: 73558886 (rs4747194) is as follows.  The
> picked transcript is ENST00000398788.  *QUESTION: Why was it not
> canonical transcript **NM_022124.5 which has the same rank?*
>
>
> A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171934.1|protein_coding|4/22||||670|353|118|R/Q|cGg/cAg|||1||||||21.7|2.866040
>
>
> A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_001171933.1|protein_coding|4/23||||670|353|118|R/Q|cGg/cAg|||1||||||21.7|2.866040
>
>
> A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000224721|protein_coding|50/69||||7093|7088|2363|R/Q|cGg/cAg|||1|||HGNC|13733||21.7|2.866040
>
>
> A|missense_variant|MODERATE|CDH23|ENSG00000107736|Transcript|ENST00000398788|protein_coding|4/23||||670|353|118|R/Q|cGg/cAg|||1||1|HGNC|13733||21.7|2.866040
>
>
> A|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|CDH23|ENSG00000107736|Transcript|ENST00000475158|processed_transcript|3/21||||609|||||||1|||HGNC|13733||21.7|2.866040
>
>
> A|missense_variant|MODERATE|CDH23|64072|Transcript|NM_022124.5|protein_coding|51/70||||7463|7073|2358|R/Q|cGg/cAg|||1||||||21.7|2.866040
>
>
> Thanks so much for your help.  Please let me know if I need to post to an
> alternate forum.
>
>
> Ann
>
>
>
> Ann Black-Ziegelbein
> Senior Application Developer
> Molecular Otolaryngology and Renal Research Laboratories
> University of Iowa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160404/69165ff6/attachment.html>


More information about the Dev mailing list