[ensembl-dev] VEP "--pick_order" issue

Will McLaren wm2 at ebi.ac.uk
Fri Aug 12 16:40:43 BST 2016


Thanks so much for this thorough investigation Anthony, you are correct and
the regex pattern is wrong.

I've patched a fix to the ensembl-variation GitHub repo, you can re-run
INSTALL.pl to pick up the latest code, or git pull in ensembl-variation if
you used git to set up your API.

Regards

Will

On 12 August 2016 at 09:29, FERRARI Anthony <
anthony.ferrari at lyon.unicancer.fr> wrote:

>
>
> OK thank you Will. I have run the same small exemple with GRCh38 then.
> For the first position here is what I obtain (I removed some cols for
> readability) :
>
> cache = homo_sapiens/85_GRCh38
> position (GRCh38) = 3:167266887
> —pick_order = appris,tsl,ccds,biotype
>
> Allele | SYMBOL | Gene | Feature | HGVSc | PICK | TSL | APPRIS | RefSeq
> A|ZBBX|ENSG00000169064|ENST00000307529|ENST00000307529.9:c.2254+
> 15351C>T|1|1|A2|
> A|ZBBX|ENSG00000169064|ENST00000392764|ENST00000392764.5:c.2050+
> 15351C>T||5|A2|NM_001199202.1
> A|ZBBX|ENSG00000169064|ENST00000392766|ENST00000392766.6:c.2137+
> 15351C>T||2|P3|NM_024687.3||||||
> A|ZBBX|ENSG00000169064|ENST00000392767|ENST00000392767.6:c.2050+
> 15351C>T||1|A2|
> A|ZBBX|ENSG00000169064|ENST00000455345|ENST00000455345.6:c.2254+
> 15351C>T||1|A2|NM_001199201.1
> A|ZBBX|ENSG00000169064|ENST00000464922|ENST00000464922.5:c.85-14676C>
> T||3||
> A|ZBBX|ENSG00000169064|ENST00000465071|ENST00000465071.1:n.335+
> 15351C>T||2||
> A|ZBBX|ENSG00000169064|ENST00000492642|ENST00000492642.5:c.222+
> 15351C>T||5||
> A|ZBBX|ENSG00000169064|ENST00000494898|ENST00000494898.5:c.*70-
> 14676C>T||2||
>
> So, the chosen transcript is the first one which is an APPRIS
> “alternative2”. The third transcript is a “principal3”
> and (I suppose, if I correctly understood APPRIS system) should be chosen.
>
> In VEP.pm, there is this block :
>
>       if(my ($appris) = @{$tr->get_all_Attributes('appris')}) {
>         if($appris->value =~ m/([A-Za-z])(\d+)/) {
>           my ($type, $grade) = ($1, $2);
>           # values are principal1, principal2, ..., alternative1,
> alternative2
>           # so add 10 to grade if alternate
>           $grade += 10 if substr($type, 0, 1) eq 'a';
>           $info->{appris} = $grade if $grade;
>         }
>       }
>     }
>
> The regex pattern $appris->value =~ m/([A-Za-z])(\d+)/ should be changed
> to $appris->value =~ m/([A-Za-z]+)(\d+)/
> or it only selects the last letter from the word ‘alternative’/‘principal’
> and you never get something equal to ‘a’ when you
> try to modify $grade for ‘alternative' values. At the end, the selected
> transcript is the one with the smallest number
> regardless of being 'principal' or 'alternate’.
>
>
> Best wishes,
> Anthony
>
>
>
>
>
>
> On 11 Aug 2016, at 19:45, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
> Apologies, I should have been clearer, Appris is available only for
> Ensembl transcripts on GRCh38. It is not available for RefSeq transcripts
> on any assembly.
>
> Regards
>
> Will
>
> On 11 Aug 2016 17:28, "FERRARI Anthony" <anthony.ferrari at lyon.unicancer.fr>
> wrote:
>
>>
>> I am afraid this might not be the only problem. I have now installed
>> "homo_sapiens_refseq/85_GRCh38”
>> and run :
>>
>>
>> /data-ddn/software/VEP/ensembl-tools-release-85/scripts/
>> variant_effect_predictor/variant_effect_predictor.pl \
>> --force_overwrite \
>> --refseq \
>> --fork 4 \
>> --buffer_size 50000 \
>> --dir ensembl-tools-release-85/scripts/variant_effect_predictor/cache \
>> --cache \
>> --offline \
>> --no_stats \
>> --species homo_sapiens \
>> --assembly GRCh38 \
>> --fasta /references/human_g1k_v38.fasta \
>> --variant_class \
>> --canonical \
>> --polyphen b --sift b \
>> --total_length \
>> --numbers \
>> --hgvs \
>> --appris \
>> --protein \
>> --symbol \
>> --biotype \
>> --check_existing \
>> --pick_order refseq,appris,tsl,ccds,biotype \
>> --flag_pick \
>> --format vcf \
>> --input_file input.vcf \
>> --vcf \
>> --output_file out.vcf
>>
>>
>> The APPRIS data is still missing/not used.
>> I have attached the sample VCFs to reproduce the test. There are only 3
>> lines.
>>
>> For instance in the first line (gene ZBBX), the annotation block selected
>> is the one for NM_001199201.1
>> whereas this should be the one for NM_024687.3 if we refer to this
>> webpage : http://appris.bioinfo.cnio.es/#/database/id/homo_sapiens/797
>> 40?as=hg38&sc=refseq
>>
>> Moreover the —appris flag produces no data in the VCF.
>>
>>
>> Best regards,
>> Anthony
>>
>>
>>
>> On 11 Aug 2016, at 17:33, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>> Hi Anthony,
>>
>> APPRIS is not available on GRCh37, I'm afraid, only GRCh38 for human.
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>> On 11 August 2016 at 16:20, FERRARI Anthony <
>> anthony.ferrari at lyon.unicancer.fr> wrote:
>>
>>
>> Hi,
>>
>> I am using the VEP script to annotate whole-genome SNVs. I have just
>> installed the version 85 with
>> the INSTALL.pl script and built the cache for refseq/GRCh37
>> (homo_sapiens_refseq/85_GRCh37).
>>
>> I use the following command to launch the analysis :
>>
>> variant_effect_predictor/variant_effect_predictor.pl \
>> --refseq \
>> --fork 4 \
>> --buffer_size 50000 \
>> --dir variant_effect_predictor/cache \
>> --cache \
>> --offline \
>> --no_stats \
>> --fasta /references/human_g1k_v37.fasta \
>> --variant_class \
>> --canonical \
>> --polyphen b --sift b \
>> --total_length \
>> --numbers \
>> --hgvs \
>> --appris \
>> --protein \
>> --symbol \
>> --biotype \
>> --check_existing \
>> --pick_order refseq,appris,tsl,ccds,biotype \
>> --flag_pick \
>> --format vcf \
>> --input_file ${INPUT} \
>> --vcf \
>> --output_file ${OUTPUT}
>>
>>
>> So basically I would like to flag, whenever possible, the annotation
>> block with APPRIS principal
>> isoform. In the VEP.pm module, I have inserted a few “print" statement in
>> the "pick_worst_vfoa"
>> function. It looks like I never get into this if block (line 2291) :
>>
>>      if(my ($appris) = @{$tr->get_all_Attributes('appris')}) {
>> ...
>>      }
>>
>> and the $info->{appris} is always “100”, its default value.
>>
>> APPRIS does not have any influence on the flag_pick process.
>> Do I need to install some other DB/file for VEP to be able to get to the
>> APPRIS info ?
>> (Is it an —offline or GRCh37 thing ?)
>>
>>
>> Many thanks for your help,
>> Anthony
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160812/0ff1b0ed/attachment.html>


More information about the Dev mailing list