[ensembl-dev] Loftee and VCFCols Difficulties via port 3337

Konrad Karczewski konradk at broadinstitute.org
Fri Oct 16 16:57:43 BST 2015


Hello!

Right, I understand fasta is not required offline for VEP, but it is for LOFTEE (we dig into it to check splice site sequence). Just let me confirm: are you deleting the index, letting VEP recreate it AND letting VEP complete its run? (Not just the index creation but going all the way to "Finished!") This tripped me up for the longest time. One hacky workaround I've found for this is to delete the index and then remove write permissions on the directory the FASTA lives in, which will force VEP to recreate the index every time. Not the most efficient, but I do that on my end for various reasons.

-Konrad

> On Oct 12, 2015, at 2:03 AM, Alex Beesley <Alex.Beesley at telethonkids.org.au> wrote:
> 
> Hi Konrad & Will
> 
> Thanks for both your comments/suggestions re the loftee plugin.
> Unfortunately I have been UNABLE to fix the problem by reinstalling or
> recreating the fasta index.
> In fact the LoF call problem persists if you run offline without using a
> fasta file at all (a fasta file is not required offline).
> 
> FYI I actually had considerable trouble using VEP at all with the GRChr37
> fasta downloaded for ensemble-82 via the API, and in the end had to use
> the ‹NO_HTSLIB flag. With that in place
> VEP will run OK offline using the local fasta file but the LoF calls are
> still wrong (LC instead of HC). Same deal if I delete the index and let
> VEP recreate it.
> 
> Clearly there is some difference between the databases that are downloaded
> via the API for GRChr37 vs what is online.
> This seems like it could have more serious ramifications for other uses as
> well (i.e. It may not just be a problem with the LoF plugin??).
> 
> I¹ve confirmed the error using both VEP78 and VEP81 with loftee too.
> 
> Any suggestions welcome!
> Cheers
> Alex
> 
> 
> 
> 
>> ------------------------------
>> 
>> Message: 2
>> Date: Fri, 09 Oct 2015 12:44:30 -0700 (PDT)
>> From: "Konrad Karczewski" <konradk at broadinstitute.org>
>> Subject: Re: [ensembl-dev] FW: Loftee and VCFCols Difficulties via
>> 	port 3337
>> To: "Ensembl developers list" <dev at ensembl.org>
>> Message-ID: <1444419869904.9698efa2 at Nodemailer>
>> Content-Type: text/plain; charset="utf-8"
>> 
>> Hi all,
>> 
>> 
>> 
>> 
>> Developer of LOFTEE here - I've seen this kind of thing before (Issue
>> #2). The issue is actually with the FASTA index file created by
>> VEP/BioPerl. When you're in online mode, it's getting the right sequence
>> of the splice site, but when offline with a malformed index, it always
>> returns NN resulting in many NON_CAN_SPLICE and NON_CAN_SPLICE_SURR
>> annotations.
>> 
>> 
>> 
>> 
>> I suggest deleting
>> the?Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.index file and
>> recreating it: to do this, just run VEP on a small test file. Important
>> note: you must let VEP run to completion, even though Checking/Creating
>> FASTA Index is near the beginning and it starts writing one at that time,
>> it can be a corrupt index if you cancel it at that point. I typically
>> just annotate a single variant so it finishes quickly. Don't ask how I
>> figured all this out...
>> 
>> 
>> 
>> 
>> Hope that helps!
>> 
>> 
>> -Konrad
>> 
>> On Fri, Oct 9, 2015 at 4:40 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>> 
>>> Hi Alex,
>>> Regarding issue 1, have you considered using VCF output instead of the
>>> default tab-delimited output?
>>> http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcfout
>>> Have you tried contacting the VAX authors? Michael Yourshaw is usually
>>> very
>>> responsive when I have communicated with him in the past.
>>> I'm sure you can appreciate we have to prioritise debugging and fixing
>>> our
>>> own code, but please do get back to us if you still have any outstanding
>>> issues.
>>> You may also like to try another available LoF plugin, LOFTEE from
>>> Daniel
>>> MacArthur's lab: https://github.com/konradjk/loftee
>>> Regards
>>> Will McLaren
>>> Ensembl Variation
>>> On 9 October 2015 at 03:22, Alex Beesley
>>> <Alex.Beesley at telethonkids.org.au>
>>> wrote:
>>>> Dear Team
>>>> 
>>>> I am experiencing significant difficulties with both the LoF.pm and
>>>> VCFCols.pm plugins with VEP (FYI I am using a GRCh37 cache downloaded
>>>> using
>>>> the installer script and default settings (ensembl-tools release-82)).
>>>> 
>>>> # Issue 1
>>>> I want to use VCFCols.pm in order to obtain the original REF and ALT
>>>> alleles from the VCF (to aid with interpretation of complex variants).
>>>> However it seems that the only way to run VCFCols.pm plugin is in the
>>>> online mode ? if one tries to run it in offline mode (see first code
>>>> example below), VEP returns an error relating to
>>>> "$config->{ga}->fetch_by_transcript_stable_id($transcript_id)?.
>>>> However,
>>>> when running online (see second code example), it is extremely slow.
>>>> This
>>>> is incredibly frustrating because I do not wish to use any of the VAX
>>>> functionality or its related databases, I simply wish to grab the
>>>> original
>>>> REF, ALT and other VCF column headers (including the genotypes and
>>>> FORMAT
>>>> fields) in my VEP output. Is there another way to grab the original VCF
>>>> columns in the VEP output other than using VCFCols.pm? Or a way to
>>>> modify
>>>> the plugin such that it can work offline?
>>>> 
>>>> perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
>>>> ${INPUT_VCF%*.vcf}.vep --cache --assembly GRCh37 --offline \
>>>> 
>>>>        --force_overwrite --check_existing --fork 24 \
>>>> 
>>>>        --everything --flag_pick \
>>>> 
>>>>        --plugin CADD,${CADD_SNV},${CADD_INDEL} \
>>>> 
>>>>        --plugin ExAC,${EXAC} \
>>>> 
>>>> ?-plugin VCFCols \
>>>> 
>>>>        --plugin
>>>> 
>>>> LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_an
>>>> cestor.fa.gz
>>>> \
>>>> 
>>>>        --fields
>>>> 
>>>> Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_flags
>>>> ,CADD_RAW,CADD_PHRED,ExAC_AF
>>>> 
>>>> 
>>>> 
>>>> perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
>>>> ${INPUT_VCF%*.vcf}.ONLINE.vep --cache --assembly GRCh37 --port 3337 \
>>>> 
>>>>        --force_overwrite --check_existing --fork 24 \
>>>> 
>>>>        --everything --flag_pick \
>>>> 
>>>>        --plugin CADD,${CADD_SNV},${CADD_INDEL} \
>>>> 
>>>>        --plugin ExAC,${EXAC} \
>>>> 
>>>> ?-plugin VCFCols \
>>>> 
>>>>        --plugin
>>>> 
>>>> LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_an
>>>> cestor.fa.gz
>>>> \
>>>> 
>>>>        --fields
>>>> 
>>>> Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_flags
>>>> ,CADD_RAW,CADD_PHRED,ExAC_AF
>>>> 
>>>> 
>>>> 
>>>> # Issue 2
>>>> When running VEP in either of the two modes shown above, I obtain
>>>> different confidence calls from the LoF.pm in regards to frameshift
>>>> mutations. Specifically, for the example shown below, the LoF.pm plugin
>>>> will call the variant HC (high confidence) in ONLINE mode, but LC (low
>>>> confidence) when running offline. The particular flag thrown up for
>>>> the LC
>>>> call relates to non-canonical intron splice sites, however I have
>>>> checked
>>>> this particular variant on UCSC and the splice appear to be canonical,
>>>> thus
>>>> the ONLINE vep output is correct, and the offline appears to be
>>>> incorrect.
>>>> Since I am using a local cache (and I have also tried using a local
>>>> fasta
>>>> file), I am at a loss to explain why I would get completely different
>>>> results by these two approaches for a LoF call. As mentioned above, my
>>>> cache was downloaded using the installer script and default settings
>>>> (ensembl-tools release-82).
>>>> 
>>>> 
>>>> 
>>>> # Running Offline
>>>> 
>>>> #Uploaded_variation               Consequence        IMPACT  LoF
>>>> 
>>>> 10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   LC
>>>> NON_CAN_SPLICE_SURR
>>>> 
>>>> 10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   LC
>>>> NON_CAN_SPLICE_SURR
>>>> 
>>>> 
>>>> # Running Online
>>>> 
>>>> #Uploaded_variation               Consequence        IMPACT  LoF
>>>> 
>>>> 10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   HC
>>>> 
>>>> 10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   HC
>>>> 
>>>> 
>>>> 
>>>> I appreciate that neither VCFCols.pm nor LoF.pm were developed by your
>>>> team, but I would be very grateful if you could help me on these issues
>>>> as I have been struggling to get VEP customised for my needs for some
>>>> time
>>>> now. In regards to issue 1, I believe a lot of your users would benefit
>>>> from a tool that could grab the original VCF headers in the VEP
>>>> output, and
>>>> in regards to the second issue, there must be something strange going
>>>> on in
>>>> regards to compatibility with the downloaded caches and the online
>>>> databases but I am at a loss to explain it.
>>>> 
>>>> 
>>>> Many thanks in advance
>>>> 
>>>> Alex  Beesley
>>>> 
>>>> Telethon Kids Institute
>>>> Perth, Western Australia
>>>> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151016/0ef92b4f/attachment.html>


More information about the Dev mailing list