[ensembl-dev] VEP "--pick_order" issue

FERRARI Anthony anthony.ferrari at lyon.unicancer.fr
Fri Aug 12 09:29:53 BST 2016



OK thank you Will. I have run the same small exemple with GRCh38 then.
For the first position here is what I obtain (I removed some cols for readability) :

cache = homo_sapiens/85_GRCh38
position (GRCh38) = 3:167266887
—pick_order = appris,tsl,ccds,biotype

Allele | SYMBOL | Gene | Feature | HGVSc | PICK | TSL | APPRIS | RefSeq
A|ZBBX|ENSG00000169064|ENST00000307529|ENST00000307529.9:c.2254+15351C>T|1|1|A2|
A|ZBBX|ENSG00000169064|ENST00000392764|ENST00000392764.5:c.2050+15351C>T||5|A2|NM_001199202.1
A|ZBBX|ENSG00000169064|ENST00000392766|ENST00000392766.6:c.2137+15351C>T||2|P3|NM_024687.3||||||
A|ZBBX|ENSG00000169064|ENST00000392767|ENST00000392767.6:c.2050+15351C>T||1|A2|
A|ZBBX|ENSG00000169064|ENST00000455345|ENST00000455345.6:c.2254+15351C>T||1|A2|NM_001199201.1
A|ZBBX|ENSG00000169064|ENST00000464922|ENST00000464922.5:c.85-14676C>T||3||
A|ZBBX|ENSG00000169064|ENST00000465071|ENST00000465071.1:n.335+15351C>T||2||
A|ZBBX|ENSG00000169064|ENST00000492642|ENST00000492642.5:c.222+15351C>T||5||
A|ZBBX|ENSG00000169064|ENST00000494898|ENST00000494898.5:c.*70-14676C>T||2||

So, the chosen transcript is the first one which is an APPRIS “alternative2”. The third transcript is a “principal3”
and (I suppose, if I correctly understood APPRIS system) should be chosen.

In VEP.pm, there is this block :

      if(my ($appris) = @{$tr->get_all_Attributes('appris')}) {
        if($appris->value =~ m/([A-Za-z])(\d+)/) {
          my ($type, $grade) = ($1, $2);
          # values are principal1, principal2, ..., alternative1, alternative2
          # so add 10 to grade if alternate
          $grade += 10 if substr($type, 0, 1) eq 'a';
          $info->{appris} = $grade if $grade;
        }
      }
    }

The regex pattern $appris->value =~ m/([A-Za-z])(\d+)/ should be changed to $appris->value =~ m/([A-Za-z]+)(\d+)/
or it only selects the last letter from the word ‘alternative’/‘principal’ and you never get something equal to ‘a’ when you
try to modify $grade for ‘alternative' values. At the end, the selected transcript is the one with the smallest number
regardless of being 'principal' or 'alternate’.


Best wishes,
Anthony






On 11 Aug 2016, at 19:45, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:


Apologies, I should have been clearer, Appris is available only for Ensembl transcripts on GRCh38. It is not available for RefSeq transcripts on any assembly.

Regards

Will

On 11 Aug 2016 17:28, "FERRARI Anthony" <anthony.ferrari at lyon.unicancer.fr<mailto:anthony.ferrari at lyon.unicancer.fr>> wrote:

I am afraid this might not be the only problem. I have now installed "homo_sapiens_refseq/85_GRCh38”
and run :


/data-ddn/software/VEP/ensembl-tools-release-85/scripts/variant_effect_predictor/variant_effect_predictor.pl<http://variant_effect_predictor.pl/> \
--force_overwrite \
--refseq \
--fork 4 \
--buffer_size 50000 \
--dir ensembl-tools-release-85/scripts/variant_effect_predictor/cache \
--cache \
--offline \
--no_stats \
--species homo_sapiens \
--assembly GRCh38 \
--fasta /references/human_g1k_v38.fasta \
--variant_class \
--canonical \
--polyphen b --sift b \
--total_length \
--numbers \
--hgvs \
--appris \
--protein \
--symbol \
--biotype \
--check_existing \
--pick_order refseq,appris,tsl,ccds,biotype \
--flag_pick \
--format vcf \
--input_file input.vcf \
--vcf \
--output_file out.vcf


The APPRIS data is still missing/not used.
I have attached the sample VCFs to reproduce the test. There are only 3 lines.

For instance in the first line (gene ZBBX), the annotation block selected is the one for NM_001199201.1
whereas this should be the one for NM_024687.3 if we refer to this webpage : http://appris.bioinfo.cnio.es/#/database/id/homo_sapiens/79740?as=hg38&sc=refseq

Moreover the —appris flag produces no data in the VCF.


Best regards,
Anthony



On 11 Aug 2016, at 17:33, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:

Hi Anthony,

APPRIS is not available on GRCh37, I'm afraid, only GRCh38 for human.

Regards

Will McLaren
Ensembl Variation

On 11 August 2016 at 16:20, FERRARI Anthony <anthony.ferrari at lyon.unicancer.fr<mailto:anthony.ferrari at lyon.unicancer.fr>> wrote:

Hi,

I am using the VEP script to annotate whole-genome SNVs. I have just installed the version 85 with
the INSTALL.pl script and built the cache for refseq/GRCh37 (homo_sapiens_refseq/85_GRCh37).

I use the following command to launch the analysis :

variant_effect_predictor/variant_effect_predictor.pl<http://variant_effect_predictor.pl/> \
--refseq \
--fork 4 \
--buffer_size 50000 \
--dir variant_effect_predictor/cache \
--cache \
--offline \
--no_stats \
--fasta /references/human_g1k_v37.fasta \
--variant_class \
--canonical \
--polyphen b --sift b \
--total_length \
--numbers \
--hgvs \
--appris \
--protein \
--symbol \
--biotype \
--check_existing \
--pick_order refseq,appris,tsl,ccds,biotype \
--flag_pick \
--format vcf \
--input_file ${INPUT} \
--vcf \
--output_file ${OUTPUT}


So basically I would like to flag, whenever possible, the annotation block with APPRIS principal
isoform. In the VEP.pm module, I have inserted a few “print" statement in the "pick_worst_vfoa"
function. It looks like I never get into this if block (line 2291) :

     if(my ($appris) = @{$tr->get_all_Attributes('appris')}) {
...
     }

and the $info->{appris} is always “100”, its default value.

APPRIS does not have any influence on the flag_pick process.
Do I need to install some other DB/file for VEP to be able to get to the APPRIS info ?
(Is it an —offline or GRCh37 thing ?)


Many thanks for your help,
Anthony
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160812/8ba01665/attachment.html>


More information about the Dev mailing list