[ensembl-dev] FW: Loftee and VCFCols Difficulties via port 3337

Konrad Karczewski konradk at broadinstitute.org
Fri Oct 9 20:44:30 BST 2015


Hi all,




Developer of LOFTEE here - I've seen this kind of thing before (Issue #2). The issue is actually with the FASTA index file created by VEP/BioPerl. When you're in online mode, it's getting the right sequence of the splice site, but when offline with a malformed index, it always returns NN resulting in many NON_CAN_SPLICE and NON_CAN_SPLICE_SURR annotations.




I suggest deleting the Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.index file and recreating it: to do this, just run VEP on a small test file. Important note: you must let VEP run to completion, even though Checking/Creating FASTA Index is near the beginning and it starts writing one at that time, it can be a corrupt index if you cancel it at that point. I typically just annotate a single variant so it finishes quickly. Don't ask how I figured all this out...




Hope that helps!


-Konrad

On Fri, Oct 9, 2015 at 4:40 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Alex,
> Regarding issue 1, have you considered using VCF output instead of the
> default tab-delimited output?
> http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcfout
> Have you tried contacting the VAX authors? Michael Yourshaw is usually very
> responsive when I have communicated with him in the past.
> I'm sure you can appreciate we have to prioritise debugging and fixing our
> own code, but please do get back to us if you still have any outstanding
> issues.
> You may also like to try another available LoF plugin, LOFTEE from Daniel
> MacArthur's lab: https://github.com/konradjk/loftee
> Regards
> Will McLaren
> Ensembl Variation
> On 9 October 2015 at 03:22, Alex Beesley <Alex.Beesley at telethonkids.org.au>
> wrote:
>> Dear Team
>>
>> I am experiencing significant difficulties with both the LoF.pm and
>> VCFCols.pm plugins with VEP (FYI I am using a GRCh37 cache downloaded using
>> the installer script and default settings (ensembl-tools release-82)).
>>
>> # Issue 1
>> I want to use VCFCols.pm in order to obtain the original REF and ALT
>> alleles from the VCF (to aid with interpretation of complex variants).
>> However it seems that the only way to run VCFCols.pm plugin is in the
>> online mode – if one tries to run it in offline mode (see first code
>> example below), VEP returns an error relating to
>> "$config->{ga}->fetch_by_transcript_stable_id($transcript_id)”. However,
>> when running online (see second code example), it is extremely slow. This
>> is incredibly frustrating because I do not wish to use any of the VAX
>> functionality or its related databases, I simply wish to grab the original
>> REF, ALT and other VCF column headers (including the genotypes and FORMAT
>> fields) in my VEP output. Is there another way to grab the original VCF
>> columns in the VEP output other than using VCFCols.pm? Or a way to modify
>> the plugin such that it can work offline?
>>
>> perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
>> ${INPUT_VCF%*.vcf}.vep --cache --assembly GRCh37 --offline \
>>
>>         --force_overwrite --check_existing --fork 24 \
>>
>>         --everything --flag_pick \
>>
>>         --plugin CADD,${CADD_SNV},${CADD_INDEL} \
>>
>>         --plugin ExAC,${EXAC} \
>>
>> —-plugin VCFCols \
>>
>>         --plugin
>> LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_ancestor.fa.gz
>> \
>>
>>         --fields
>> Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_flags,CADD_RAW,CADD_PHRED,ExAC_AF
>>
>>
>>
>> perl ${VEP}/variant_effect_predictor.pl -i ${INPUT_VCF} -o
>> ${INPUT_VCF%*.vcf}.ONLINE.vep --cache --assembly GRCh37 --port 3337 \
>>
>>         --force_overwrite --check_existing --fork 24 \
>>
>>         --everything --flag_pick \
>>
>>         --plugin CADD,${CADD_SNV},${CADD_INDEL} \
>>
>>         --plugin ExAC,${EXAC} \
>>
>> —-plugin VCFCols \
>>
>>         --plugin
>> LoF,human_ancestor_fa:/home/san/alex/.vep/Plugins/loftee-master/human_ancestor.fa.gz
>> \
>>
>>         --fields
>> Uploaded_variation,Location,REF,ALT,INFO,FORMAT,LoF,LoF_filter,LoF_flags,CADD_RAW,CADD_PHRED,ExAC_AF
>>
>>
>>
>> # Issue 2
>> When running VEP in either of the two modes shown above, I obtain
>> different confidence calls from the LoF.pm in regards to frameshift
>> mutations. Specifically, for the example shown below, the LoF.pm plugin
>> will call the variant HC (high confidence) in ONLINE mode, but LC (low
>> confidence) when running offline. The particular flag thrown up for the LC
>> call relates to non-canonical intron splice sites, however I have checked
>> this particular variant on UCSC and the splice appear to be canonical, thus
>> the ONLINE vep output is correct, and the offline appears to be incorrect.
>> Since I am using a local cache (and I have also tried using a local fasta
>> file), I am at a loss to explain why I would get completely different
>> results by these two approaches for a LoF call. As mentioned above, my
>> cache was downloaded using the installer script and default settings
>> (ensembl-tools release-82).
>>
>>
>>
>> # Running Offline
>>
>> #Uploaded_variation               Consequence        IMPACT  LoF
>>
>> 10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   LC
>> NON_CAN_SPLICE_SURR
>>
>> 10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   LC
>> NON_CAN_SPLICE_SURR
>>
>>
>> # Running Online
>>
>> #Uploaded_variation               Consequence        IMPACT  LoF
>>
>> 10_126691951_C/- - 10:126691951 - frameshift_variant  HIGH   HC
>>
>> 10_126692023_G/- - 10:126692023 - frameshift_variant  HIGH   HC
>>
>>
>>
>> I appreciate that neither VCFCols.pm nor LoF.pm were developed by your
>> team, but I would be very grateful if you could help me on these issues
>> as I have been struggling to get VEP customised for my needs for some time
>> now. In regards to issue 1, I believe a lot of your users would benefit
>> from a tool that could grab the original VCF headers in the VEP output, and
>> in regards to the second issue, there must be something strange going on in
>> regards to compatibility with the downloaded caches and the online
>> databases but I am at a loss to explain it.
>>
>>
>> Many thanks in advance
>>
>> Alex  Beesley
>>
>> Telethon Kids Institute
>> Perth, Western Australia
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151009/df714e1c/attachment.html>


More information about the Dev mailing list