[ensembl-dev] Loftee and VCFCols Difficulties via port 3337

Alex Beesley Alex.Beesley at telethonkids.org.au
Thu Oct 22 07:39:33 BST 2015


Hi Konrad

Yes, I completely deleted the fasta index and let VEP recreate it (running
to completion), but the problem persists. I even scoured my file system
for other possible copies of the fasta and index, and specifically pointed
the VEP script at the explicit fasta file, but the issue persists. The
ONLY way I can generate the apparently correct LoF calls for these sites
is to use VEP in online mode. Even wierder, I noticed that when running in
offline mode, a small number of the sites flip between LC and HC from run
to run!! (though the majority still remain LC and thus incorrect).


These incorrect LC calls are essentially what I get if I try running VEP
offline WITHOUT any fasta file. Hence, it seems as though, although VEP is
correctly using the local fasta file (i.e. It looks for the file, detects
it, creates the index), the LoF plugin is not recognising the local fasta
file (i.e. The LoF output is the same as if I didn¹t even use a fasta
file). To refresh I am using a GRChr37 homo_sapiens cache (NOT
homo_sapiens_merged and NOT homo_sapiens_refeq). And I have tried with
both a compressed and uncompressed fasta file options.

All ideas welcome :)

Cheers
alex

On 16/10/2015 11:58 pm, "dev-bounces at ensembl.org on behalf of
dev-request at ensembl.org" <dev-bounces at ensembl.org on behalf of
dev-request at ensembl.org> wrote:

>------------------------------
>
>Message: 2
>Date: Fri, 16 Oct 2015 11:57:43 -0400
>From: Konrad Karczewski <konradk at broadinstitute.org>
>Subject: Re: [ensembl-dev] Loftee and VCFCols Difficulties via port
>	3337
>To: Ensembl developers list <dev at ensembl.org>
>Message-ID: <D18E3FB8-4CF2-4C7E-BF6D-3328E969C72E at broadinstitute.org>
>Content-Type: text/plain; charset="windows-1252"
>
>Hello!
>
>Right, I understand fasta is not required offline for VEP, but it is for
>LOFTEE (we dig into it to check splice site sequence). Just let me
>confirm: are you deleting the index, letting VEP recreate it AND letting
>VEP complete its run? (Not just the index creation but going all the way
>to "Finished!") This tripped me up for the longest time. One hacky
>workaround I've found for this is to delete the index and then remove
>write permissions on the directory the FASTA lives in, which will force
>VEP to recreate the index every time. Not the most efficient, but I do
>that on my end for various reasons.
>
>-Konrad
>
>>On Oct 12, 2015, at 2:03 AM, Alex Beesley
>><Alex.Beesley at telethonkids.org.au> wrote:
>>Hi Konrad & Will
>>Thanks for both your comments/suggestions re the loftee plugin.
>>Unfortunately I have been UNABLE to fix the problem by reinstalling or
>>recreating the fasta index.
>>In fact the LoF call problem persists if you run offline without using a
>>fasta file at all (a fasta file is not required offline).
>>FYI I actually had considerable trouble using VEP at all with the GRChr37
>>fasta downloaded for ensemble-82 via the API, and in the end had to use
>>the ?NO_HTSLIB flag. With that in place
>>VEP will run OK offline using the local fasta file but the LoF calls are
>>still wrong (LC instead of HC). Same deal if I delete the index and let
>>VEP recreate it.
>>Clearly there is some difference between the databases that are
>>downloaded
>>via the API for GRChr37 vs what is online.
>>This seems like it could have more serious ramifications for other uses
>>as
>>well (i.e. It may not just be a problem with the LoF plugin??).
>>I?ve confirmed the error using both VEP78 and VEP81 with loftee too.
>>Any suggestions welcome!
>>Cheers
>>Alex
>>>------------------------------
>>>Message: 2
>>>Date: Fri, 09 Oct 2015 12:44:30 -0700 (PDT)
>>>From: "Konrad Karczewski" <konradk at broadinstitute.org>
>>>Subject: Re: [ensembl-dev] FW: Loftee and VCFCols Difficulties via
>>>	port 3337
>>>To: "Ensembl developers list" <dev at ensembl.org>
>>>Message-ID: <1444419869904.9698efa2 at Nodemailer>
>>>Content-Type: text/plain; charset="utf-8"
>>>Hi all,
>>>Developer of LOFTEE here - I've seen this kind of thing before (Issue
>>>#2). The issue is actually with the FASTA index file created by
>>>VEP/BioPerl. When you're in online mode, it's getting the right sequence
>>>of the splice site, but when offline with a malformed index, it always
>>>returns NN resulting in many NON_CAN_SPLICE and NON_CAN_SPLICE_SURR
>>>annotations.
>>>I suggest deleting
>>>the?Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.index file and
>>>recreating it: to do this, just run VEP on a small test file. Important
>>>note: you must let VEP run to completion, even though Checking/Creating
>>>FASTA Index is near the beginning and it starts writing one at that
>>>time,
>>>it can be a corrupt index if you cancel it at that point. I typically
>>>just annotate a single variant so it finishes quickly. Don't ask how I
>>>figured all this out...
>>>Hope that helps!
>>>-Konrad
>>>On Fri, Oct 9, 2015 at 4:40 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>Hi Alex,
>>>>Regarding issue 1, have you considered using VCF output instead of the
>>>>default tab-delimited output?
>>>>http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcfout
>>>>Have you tried contacting the VAX authors? Michael Yourshaw is usually
>>>>very
>>>>responsive when I have communicated with him in the past.
>>>>I'm sure you can appreciate we have to prioritise debugging and fixing
>>>>our
>>>>own code, but please do get back to us if you still have any
>>>>outstanding
>>>>issues.
>>>>You may also like to try another available LoF plugin, LOFTEE from
>>>>Daniel
>>>>MacArthur's lab: https://github.com/konradjk/loftee
>>>>Regards
>>>>Will McLaren
>>>>Ensembl Variation
>>>>On 9 October 2015 at 03:22, Alex Beesley
>>>><Alex.Beesley at telethonkids.org.au>
>>>>wrote:
>>>>>Dear Team
>>>>>I am experiencing significant difficulties with both the LoF.pm and
>>>>>VCFCols.pm plugins with VEP (FYI I am using a GRCh37 cache downloaded
>>>>>using
>>>>>the installer script and default settings (ensembl-tools release-82)).
>>>>># Issue 1
>>>>>I want to use VCFCols.pm in order to obtain the original REF and ALT
>>>>>alleles from the VCF (to aid with interpretation of complex variants).
>>>>>However it seems that the only way to run VCFCols.pm plugin is in the
>>>>>online mode ? if one tries to run it in offline mode (see first code
>>>>>example below), VEP returns an error relating to
>>>>>"$config->{ga}->fetch_by_transcript_stable_id($transcript_id)?.
>>>>>However,
>>>>>when running online (see second code example), it is extremely slow.
>>>>>This
>>>>>is incredibly frustrating because I do not wish to use any of the VAX
>>>>>functionality or its related databases, I simply wish to grab the
>>>>>original
>>>>>REF, ALT and other VCF column headers (including the genotypes and
>>>>>FORMAT
>>>>>fields) in my VEP output. Is there another way to grab the original
>>>>>VCF
>>>>>columns in the VEP output other than using VCFCols.pm? Or a way to
>>>>>modify
>>>>>the plugin such that it can work offline?
>>>>>perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
>>>>>${INPUT_VCF%*.vcf}.vep --cache --assembly GRCh37 --offline \
>>>>>        --force_overwrite --check_existing --fork 24 \
>>>>>        --everything --flag_pick \
>>>>>        --plugin CADD,${CADD_SNV},${CADD_INDEL} \
>>>>>        --plugin ExAC,${EXAC} \
>>>>>?-plugin VCFCols \
>>>>>        --plugin
>>>>>LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_
>>>>>an
>>>>>cestor.fa.gz
>>>>>\
>>>>>        --fields
>>>>>Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_fla
>>>>>gs
>>>>>,CADD_RAW,CADD_PHRED,ExAC_AF
>>>>>perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
>>>>>${INPUT_VCF%*.vcf}.ONLINE.vep --cache --assembly GRCh37 --port 3337 \
>>>>>        --force_overwrite --check_existing --fork 24 \
>>>>>        --everything --flag_pick \
>>>>>        --plugin CADD,${CADD_SNV},${CADD_INDEL} \
>>>>>        --plugin ExAC,${EXAC} \
>>>>>?-plugin VCFCols \
>>>>>        --plugin
>>>>>LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_
>>>>>an
>>>>>cestor.fa.gz
>>>>>\
>>>>>        --fields
>>>>>Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_fla
>>>>>gs
>>>>>,CADD_RAW,CADD_PHRED,ExAC_AF
>>>>># Issue 2
>>>>>When running VEP in either of the two modes shown above, I obtain
>>>>>different confidence calls from the LoF.pm in regards to frameshift
>>>>>mutations. Specifically, for the example shown below, the LoF.pm
>>>>>plugin
>>>>>will call the variant HC (high confidence) in ONLINE mode, but LC (low
>>>>>confidence) when running offline. The particular flag thrown up for
>>>>>the LC
>>>>>call relates to non-canonical intron splice sites, however I have
>>>>>checked
>>>>>this particular variant on UCSC and the splice appear to be canonical,
>>>>>thus
>>>>>the ONLINE vep output is correct, and the offline appears to be
>>>>>incorrect.
>>>>>Since I am using a local cache (and I have also tried using a local
>>>>>fasta
>>>>>file), I am at a loss to explain why I would get completely different
>>>>>results by these two approaches for a LoF call. As mentioned above, my
>>>>>cache was downloaded using the installer script and default settings
>>>>>(ensembl-tools release-82).
>>>>># Running Offline
>>>>>#Uploaded_variation               Consequence        IMPACT  LoF
>>>>>10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   LC
>>>>>NON_CAN_SPLICE_SURR
>>>>>10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   LC
>>>>>NON_CAN_SPLICE_SURR
>>>>># Running Online
>>>>>#Uploaded_variation               Consequence        IMPACT  LoF
>>>>>10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   HC
>>>>>10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   HC
>>>>>I appreciate that neither VCFCols.pm nor LoF.pm were developed by your
>>>>>team, but I would be very grateful if you could help me on these
>>>>>issues
>>>>>as I have been struggling to get VEP customised for my needs for some
>>>>>time
>>>>>now. In regards to issue 1, I believe a lot of your users would
>>>>>benefit
>>>>>from a tool that could grab the original VCF headers in the VEP
>>>>>output, and
>>>>>in regards to the second issue, there must be something strange going
>>>>>on in
>>>>>regards to compatibility with the downloaded caches and the online
>>>>>databases but I am at a loss to explain it.
>>>>>Many thanks in advance
>>>>>Alex  Beesley
>>>>>Telethon Kids Institute
>>>>>Perth, Western Australia





More information about the Dev mailing list