[ensembl-dev] Loftee and VCFCols Difficulties via port 3337

Will McLaren wm2 at ebi.ac.uk
Thu Oct 22 09:44:24 BST 2015


Hi Alex,

Are you using the latest version of VEP? We have in version 82 an
alternative FASTA indexer that is more robust to index corruption issues;
it is also faster and allows you to use bgzipped FASTA files which take up
much less disk space.

http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html

Regards

Will McLaren
Ensembl Variation

On 22 October 2015 at 07:39, Alex Beesley <Alex.Beesley at telethonkids.org.au>
wrote:

> Hi Konrad
>
> Yes, I completely deleted the fasta index and let VEP recreate it (running
> to completion), but the problem persists. I even scoured my file system
> for other possible copies of the fasta and index, and specifically pointed
> the VEP script at the explicit fasta file, but the issue persists. The
> ONLY way I can generate the apparently correct LoF calls for these sites
> is to use VEP in online mode. Even wierder, I noticed that when running in
> offline mode, a small number of the sites flip between LC and HC from run
> to run!! (though the majority still remain LC and thus incorrect).
>
>
> These incorrect LC calls are essentially what I get if I try running VEP
> offline WITHOUT any fasta file. Hence, it seems as though, although VEP is
> correctly using the local fasta file (i.e. It looks for the file, detects
> it, creates the index), the LoF plugin is not recognising the local fasta
> file (i.e. The LoF output is the same as if I didn¹t even use a fasta
> file). To refresh I am using a GRChr37 homo_sapiens cache (NOT
> homo_sapiens_merged and NOT homo_sapiens_refeq). And I have tried with
> both a compressed and uncompressed fasta file options.
>
> All ideas welcome :)
>
> Cheers
> alex
>
> On 16/10/2015 11:58 pm, "dev-bounces at ensembl.org on behalf of
> dev-request at ensembl.org" <dev-bounces at ensembl.org on behalf of
> dev-request at ensembl.org> wrote:
>
> >------------------------------
> >
> >Message: 2
> >Date: Fri, 16 Oct 2015 11:57:43 -0400
> >From: Konrad Karczewski <konradk at broadinstitute.org>
> >Subject: Re: [ensembl-dev] Loftee and VCFCols Difficulties via port
> >       3337
> >To: Ensembl developers list <dev at ensembl.org>
> >Message-ID: <D18E3FB8-4CF2-4C7E-BF6D-3328E969C72E at broadinstitute.org>
> >Content-Type: text/plain; charset="windows-1252"
> >
> >Hello!
> >
> >Right, I understand fasta is not required offline for VEP, but it is for
> >LOFTEE (we dig into it to check splice site sequence). Just let me
> >confirm: are you deleting the index, letting VEP recreate it AND letting
> >VEP complete its run? (Not just the index creation but going all the way
> >to "Finished!") This tripped me up for the longest time. One hacky
> >workaround I've found for this is to delete the index and then remove
> >write permissions on the directory the FASTA lives in, which will force
> >VEP to recreate the index every time. Not the most efficient, but I do
> >that on my end for various reasons.
> >
> >-Konrad
> >
> >>On Oct 12, 2015, at 2:03 AM, Alex Beesley
> >><Alex.Beesley at telethonkids.org.au> wrote:
> >>Hi Konrad & Will
> >>Thanks for both your comments/suggestions re the loftee plugin.
> >>Unfortunately I have been UNABLE to fix the problem by reinstalling or
> >>recreating the fasta index.
> >>In fact the LoF call problem persists if you run offline without using a
> >>fasta file at all (a fasta file is not required offline).
> >>FYI I actually had considerable trouble using VEP at all with the GRChr37
> >>fasta downloaded for ensemble-82 via the API, and in the end had to use
> >>the ?NO_HTSLIB flag. With that in place
> >>VEP will run OK offline using the local fasta file but the LoF calls are
> >>still wrong (LC instead of HC). Same deal if I delete the index and let
> >>VEP recreate it.
> >>Clearly there is some difference between the databases that are
> >>downloaded
> >>via the API for GRChr37 vs what is online.
> >>This seems like it could have more serious ramifications for other uses
> >>as
> >>well (i.e. It may not just be a problem with the LoF plugin??).
> >>I?ve confirmed the error using both VEP78 and VEP81 with loftee too.
> >>Any suggestions welcome!
> >>Cheers
> >>Alex
> >>>------------------------------
> >>>Message: 2
> >>>Date: Fri, 09 Oct 2015 12:44:30 -0700 (PDT)
> >>>From: "Konrad Karczewski" <konradk at broadinstitute.org>
> >>>Subject: Re: [ensembl-dev] FW: Loftee and VCFCols Difficulties via
> >>>     port 3337
> >>>To: "Ensembl developers list" <dev at ensembl.org>
> >>>Message-ID: <1444419869904.9698efa2 at Nodemailer>
> >>>Content-Type: text/plain; charset="utf-8"
> >>>Hi all,
> >>>Developer of LOFTEE here - I've seen this kind of thing before (Issue
> >>>#2). The issue is actually with the FASTA index file created by
> >>>VEP/BioPerl. When you're in online mode, it's getting the right sequence
> >>>of the splice site, but when offline with a malformed index, it always
> >>>returns NN resulting in many NON_CAN_SPLICE and NON_CAN_SPLICE_SURR
> >>>annotations.
> >>>I suggest deleting
> >>>the?Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.index file and
> >>>recreating it: to do this, just run VEP on a small test file. Important
> >>>note: you must let VEP run to completion, even though Checking/Creating
> >>>FASTA Index is near the beginning and it starts writing one at that
> >>>time,
> >>>it can be a corrupt index if you cancel it at that point. I typically
> >>>just annotate a single variant so it finishes quickly. Don't ask how I
> >>>figured all this out...
> >>>Hope that helps!
> >>>-Konrad
> >>>On Fri, Oct 9, 2015 at 4:40 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
> >>>>Hi Alex,
> >>>>Regarding issue 1, have you considered using VCF output instead of the
> >>>>default tab-delimited output?
> >>>>http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcfout
> >>>>Have you tried contacting the VAX authors? Michael Yourshaw is usually
> >>>>very
> >>>>responsive when I have communicated with him in the past.
> >>>>I'm sure you can appreciate we have to prioritise debugging and fixing
> >>>>our
> >>>>own code, but please do get back to us if you still have any
> >>>>outstanding
> >>>>issues.
> >>>>You may also like to try another available LoF plugin, LOFTEE from
> >>>>Daniel
> >>>>MacArthur's lab: https://github.com/konradjk/loftee
> >>>>Regards
> >>>>Will McLaren
> >>>>Ensembl Variation
> >>>>On 9 October 2015 at 03:22, Alex Beesley
> >>>><Alex.Beesley at telethonkids.org.au>
> >>>>wrote:
> >>>>>Dear Team
> >>>>>I am experiencing significant difficulties with both the LoF.pm and
> >>>>>VCFCols.pm plugins with VEP (FYI I am using a GRCh37 cache downloaded
> >>>>>using
> >>>>>the installer script and default settings (ensembl-tools release-82)).
> >>>>># Issue 1
> >>>>>I want to use VCFCols.pm in order to obtain the original REF and ALT
> >>>>>alleles from the VCF (to aid with interpretation of complex variants).
> >>>>>However it seems that the only way to run VCFCols.pm plugin is in the
> >>>>>online mode ? if one tries to run it in offline mode (see first code
> >>>>>example below), VEP returns an error relating to
> >>>>>"$config->{ga}->fetch_by_transcript_stable_id($transcript_id)?.
> >>>>>However,
> >>>>>when running online (see second code example), it is extremely slow.
> >>>>>This
> >>>>>is incredibly frustrating because I do not wish to use any of the VAX
> >>>>>functionality or its related databases, I simply wish to grab the
> >>>>>original
> >>>>>REF, ALT and other VCF column headers (including the genotypes and
> >>>>>FORMAT
> >>>>>fields) in my VEP output. Is there another way to grab the original
> >>>>>VCF
> >>>>>columns in the VEP output other than using VCFCols.pm? Or a way to
> >>>>>modify
> >>>>>the plugin such that it can work offline?
> >>>>>perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
> >>>>>${INPUT_VCF%*.vcf}.vep --cache --assembly GRCh37 --offline \
> >>>>>        --force_overwrite --check_existing --fork 24 \
> >>>>>        --everything --flag_pick \
> >>>>>        --plugin CADD,${CADD_SNV},${CADD_INDEL} \
> >>>>>        --plugin ExAC,${EXAC} \
> >>>>>?-plugin VCFCols \
> >>>>>        --plugin
> >>>>>LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_
> >>>>>an
> >>>>>cestor.fa.gz
> >>>>>\
> >>>>>        --fields
> >>>>>Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_fla
> >>>>>gs
> >>>>>,CADD_RAW,CADD_PHRED,ExAC_AF
> >>>>>perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
> >>>>>${INPUT_VCF%*.vcf}.ONLINE.vep --cache --assembly GRCh37 --port 3337 \
> >>>>>        --force_overwrite --check_existing --fork 24 \
> >>>>>        --everything --flag_pick \
> >>>>>        --plugin CADD,${CADD_SNV},${CADD_INDEL} \
> >>>>>        --plugin ExAC,${EXAC} \
> >>>>>?-plugin VCFCols \
> >>>>>        --plugin
> >>>>>LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_
> >>>>>an
> >>>>>cestor.fa.gz
> >>>>>\
> >>>>>        --fields
> >>>>>Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_fla
> >>>>>gs
> >>>>>,CADD_RAW,CADD_PHRED,ExAC_AF
> >>>>># Issue 2
> >>>>>When running VEP in either of the two modes shown above, I obtain
> >>>>>different confidence calls from the LoF.pm in regards to frameshift
> >>>>>mutations. Specifically, for the example shown below, the LoF.pm
> >>>>>plugin
> >>>>>will call the variant HC (high confidence) in ONLINE mode, but LC (low
> >>>>>confidence) when running offline. The particular flag thrown up for
> >>>>>the LC
> >>>>>call relates to non-canonical intron splice sites, however I have
> >>>>>checked
> >>>>>this particular variant on UCSC and the splice appear to be canonical,
> >>>>>thus
> >>>>>the ONLINE vep output is correct, and the offline appears to be
> >>>>>incorrect.
> >>>>>Since I am using a local cache (and I have also tried using a local
> >>>>>fasta
> >>>>>file), I am at a loss to explain why I would get completely different
> >>>>>results by these two approaches for a LoF call. As mentioned above, my
> >>>>>cache was downloaded using the installer script and default settings
> >>>>>(ensembl-tools release-82).
> >>>>># Running Offline
> >>>>>#Uploaded_variation               Consequence        IMPACT  LoF
> >>>>>10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   LC
> >>>>>NON_CAN_SPLICE_SURR
> >>>>>10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   LC
> >>>>>NON_CAN_SPLICE_SURR
> >>>>># Running Online
> >>>>>#Uploaded_variation               Consequence        IMPACT  LoF
> >>>>>10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   HC
> >>>>>10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   HC
> >>>>>I appreciate that neither VCFCols.pm nor LoF.pm were developed by your
> >>>>>team, but I would be very grateful if you could help me on these
> >>>>>issues
> >>>>>as I have been struggling to get VEP customised for my needs for some
> >>>>>time
> >>>>>now. In regards to issue 1, I believe a lot of your users would
> >>>>>benefit
> >>>>>from a tool that could grab the original VCF headers in the VEP
> >>>>>output, and
> >>>>>in regards to the second issue, there must be something strange going
> >>>>>on in
> >>>>>regards to compatibility with the downloaded caches and the online
> >>>>>databases but I am at a loss to explain it.
> >>>>>Many thanks in advance
> >>>>>Alex  Beesley
> >>>>>Telethon Kids Institute
> >>>>>Perth, Western Australia
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151022/914c9b68/attachment.html>


More information about the Dev mailing list