[ensembl-dev] VEP, all columns from dbNSFP are empty

Will McLaren wm2 at ebi.ac.uk
Tue Jan 12 16:48:22 GMT 2016


Hi Jean-Philippe,

The dbNSFP plugin is configured to match the Ensembl transcript identifier
from the variant consequence it is analysing. You are using the RefSeq
transcript cache, so the transcript ID won't match that on the dbNSFP line,
so you won't get any output.

You can try modifying the plugin to skip the check on transcript ID; change
the following lines (267-271) of ${VEP_PLUGINS_DIR}/dbNSFP.pm from:

    next unless
      defined($tmp_data->{alt}) &&
      $tmp_data->{alt} eq $allele &&
      defined($tmp_data->{Ensembl_transcriptid}) &&
      $tmp_data->{Ensembl_transcriptid} =~ /$tr_id($|;)/;

to:

    next unless
      defined($tmp_data->{alt}) &&
      $tmp_data->{alt} eq $allele;

I will check into whether this transcript check is required at all - if not
I will remove it in future versions of the plugin.

Regards

Will McLaren
Ensembl Variation

On 12 January 2016 at 15:34, Jean-Philippe Villemin <jpvillemin at gmail.com>
wrote:

> Hi,
>
> I'm working with VEP 83 under MacOsX.
>
> I meet some troubles with VEP using dbNSFP 2.9. (for hg19, I downloaded
> from this page http://snpeff.sourceforge.net/SnpSift.html#dbNSFP)
> I'm using MaxtEnScan and dbscSNV plugins too. Here, everything is ok.
>
> My script makes a vcf and txt ouput. For both of them, data from dbNSFP
> are empty.
> The column names from dbNSFP are present in the ouputs but with no values
> inside the respective columns.
> No errors are output when vep is running on dbNSFP 2.9.
>
> This globally how I use VEP to get txt output:
>
> -i ${INPUT_PATH}${SAMPLE_FILE}
> -o ${OUTPUT_PATH}${SAMPLE_FILE}.vep.txt
> --verbose
> --no_progress
> --cache
> --refseq
> --offline
> --fork 6
> --no_stats
> --buffer_size 10000
> --dir_plugins ${VEP_PLUGINS_DIR}
> --plugin
> dbNSFP,${DBNSFP_PATH_29},PROVEAN_score,SIFT_score,ExAC_NFE_AC,genename
> --plugin dbscSNV,${DBSCSNV_PATH}
> --plugin MaxEntScan,${MAXENTSCAN_DIR}
> --fasta ${FASTA_PATH}
> --assembly ${VEP_DB_VERSION}
> --dir_cache ${VEP_DB_PATH}
> --fields
> Uploaded_variation,Allele,Location,Consequence,IMPACT,PROVEAN_score,SIFT_score,ExAC_NFE_AC,SYMBOL,genename,Extra,MaxEntScan_ref,MaxEntScan_alt,MaxEntScan_diff,ada_score,rf_score
> --no_escape
> --force_overwrite
>
> I check with tabix too if my dbNSFP29.gz had the values for gename for
> example.
>
> So , with :
>
> *tabix dbNSFP29.gz 1:109446750-109446750*
>
> I retrieve "GPSM2" for the genename column.
>
> But when I look to the Vep outputs (vcf or txt file) in "Extra Column" or
> "genename Column", there is no value found.
>
> Is this a bug or Am I doing something wrong ?
>
> I join 4 files in a zip :
>
> - COMMAND_VEP.txt : how I execute VEP.
> - LOG_VEP.txt : VEP output
> - OUTPUT_TABIX.txt : tabix output on my dbNSFP29.gz for
> 1:109446750-109446750
> - SU4184.final.vcf.vep & SU4184.final.vcf.vep.txt : output from vep.
>
> Thanks for you help,
>
>
> --
> *Jean-Philippe Villemin   *- Bioinformatics, Software Engineer -
>
> IURC (Institut Universitaire de Recherche Clinique)
> 641 avenue du Doyen Gaston Giraud
> 34093 Montpellier Cedex 5, France
>
> *jpvillemin at gmail.com <jpvillemin at gmail.com>*
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160112/70768f13/attachment.html>


More information about the Dev mailing list