[ensembl-dev] VEP on 37, but Gencode 25?

Will McLaren wm2 at ebi.ac.uk
Mon Oct 3 11:42:28 BST 2016


Hi Konrad,

Here's a plugin that should give you SIFT/PolyPhen for your use case. The
field names are separated to be different from the "main" ones, hopefully
that makes sense.

It's currently only on the dev and release/86 branch, but it should work
fine with any recent VEP release.

https://github.com/Ensembl/VEP_plugins/blob/dev/PolyPhen_SIFT.pm

Will

On 30 September 2016 at 11:03, Will McLaren <wm2 at ebi.ac.uk> wrote:

> We're straying into the realm of the hack here!
>
> I'm struggling to get it working in the manner you're trying, I can get
> past the error you're seeing but I'm now encountering new ones.
>
> Is there a reason you can't use the new code? The following works nicely
> for me:
>
> perl vep.pl -gff gencode.v24lift37.annotation.gff3.gz -fa
> Homo_sapiens.GRCh37.dna.toplevel.fa.gz -i example_GRCh37.vcf -force -sift
> b -poly b -database -port 3337 -db 85 -transcript_filter "_source_cache"
>
> The final flag filters out transcripts loaded from the DB so the variants
> are not annotated against these too.
>
> I'm working on getting a plugin going to work with a separate cache of
> SIFT/PolyPhen data, but it may take a little while to get published.
>
> Will
>
> On 29 September 2016 at 15:48, Konrad Karczewski <
> konradk at broadinstitute.org> wrote:
>
>> Hmm, that's interesting. When I added info.txt, now everything failed
>> with:
>>
>> Can't call method "db" on unblessed reference at
>> /humgen/atgu1/fs03/DM-Lab/vep/ensembl-tools-release-85/scrip
>> ts/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariation.pm
>> line 324.
>>
>> -Konrad
>>
>> On September 29, 2016 at 9:34:18 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>>
>> We might be able to write a plugin to read the data from a pair of table
>> dump files.
>>
>> Let me have a go at doing that, as you are not the only person requesting
>> similar at the moment!
>>
>> Will
>>
>> On 29 September 2016 at 14:21, Konrad Karczewski <
>> konradk at broadinstitute.org> wrote:
>>
>>> Great, thanks! Will check that out.
>>>
>>> Is that to say there's no way to get the SIFT and PolyPhen annotations
>>> locally? Happy to do some legwork if it means I can recreate the entire
>>> thing with this new annotation set!
>>>
>>> -Konrad
>>>
>>> On September 29, 2016 at 3:52:47 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>>>
>>> You'd also need to copy over the homo_sapiens/85_GRCh37/info.txt file,
>>> this contains the column headers for the _var files, hence the warnings
>>> when it finds data that doesn't match its best guess of those headers.
>>>
>>> RE: SIFT and PolyPhen, if you use --cache instead of --offline you
>>> *might* find that it is able to retrieve SIFT and PolyPhen matrices from
>>> the database server. I've tested this with the new code but not the version
>>> you're on. You might also want to use "--host useastdb.ensembl.org",
>>> assuming you're East Coast, this will give you the fastest (public) DB
>>> connection.
>>>
>>> Will
>>>
>>> On 28 September 2016 at 21:00, Konrad Karczewski <
>>> konradk at broadinstitute.org> wrote:
>>>
>>>> Ok, I think I got that mostly working (sorted it properly and converted
>>>> transcript_type to transcript_biotype, appears to have worked). I then
>>>> pulled the _var and _reg caches over as-is from 85 (not sure if wise).
>>>>
>>>> Now when I run it, it appears to complete without error, but I'm
>>>> running into many of these warnings:
>>>>
>>>> Use of uninitialized value in list assignment at
>>>> /humgen/atgu1/fs03/DM-Lab/vep/ensembl-tools-release-85/scrip
>>>> ts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line
>>>> 5344, <DUMP> line 1.
>>>>
>>>> Also, SIFT and PolyPhen don't appear to get output alongside it. Is
>>>> that expected (or perhaps related to above warnings)? Anything I can do to
>>>> get those in there?
>>>>
>>>> -Konrad
>>>>
>>>> On September 27, 2016 at 10:58:54 AM, Will McLaren (wm2 at ebi.ac.uk)
>>>> wrote:
>>>>
>>>> You can try running it with --verbose, it will give you some error
>>>> logging.
>>>>
>>>> Will
>>>>
>>>> On 27 September 2016 at 15:56, Konrad Karczewski <
>>>> konradk at broadinstitute.org> wrote:
>>>>
>>>>> Ok good to know - I actually tried it, but I think something is being
>>>>> odd. It gets through the whole thing (going back and forth between
>>>>> chromosomes like you said, so I can try to fix that), but then appears to
>>>>> finish:
>>>>>
>>>>> 2016-09-26 16:12:30 - Processing chromosome Y
>>>>> WARNING: Could not find chromosome named M in FASTA file
>>>>> 2016-09-26 16:12:52 - All done!
>>>>>
>>>>> But the output directory (either ~/.vep or the directory I pointed to
>>>>> with --dir) are empty. Is this a related issue? Thought you might want to
>>>>> know to add a bit of error logging if so.
>>>>>
>>>>> -Konrad
>>>>>
>>>>> On September 27, 2016 at 8:30:15 AM, Will McLaren (wm2 at ebi.ac.uk)
>>>>> wrote:
>>>>>
>>>>> In theory this should work, but the gtf2vep.pl script doesn't seem to
>>>>> work too well with this particular GFF (it was designed really to work with
>>>>> GFF/GTFs as produced by Ensembl or NCBI). Probably with some tweaks it
>>>>> could be made to work - I believe the major issues are caused by features
>>>>> being out of the order that the script expects.
>>>>>
>>>>> The new code uses a much more robust system for constructing
>>>>> transcripts and has been tested with GFFs from Ensembl, NCBI and GENCODE.
>>>>>
>>>>> Will
>>>>>
>>>>> On 27 September 2016 at 13:22, Konrad Karczewski <
>>>>> konradk at broadinstitute.org> wrote:
>>>>>
>>>>>> I just also realized - would creating a cache from this gff file
>>>>>> (using gtf2vep.pl) not be recommended?
>>>>>>
>>>>>> -Konrad
>>>>>>
>>>>>> On September 27, 2016 at 5:16:42 AM, Will McLaren (wm2 at ebi.ac.uk)
>>>>>> wrote:
>>>>>>
>>>>>> Hi Konrad,
>>>>>>
>>>>>> The beta ensembl-vep code [1] supports annotation directly from a GFF
>>>>>> file, such as the one available from the GENCODE website [2].
>>>>>>
>>>>>> $ curl ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/releas
>>>>>> e_25/GRCh37_mapping/gencode.v25lift37.annotation.gff3.gz | gzip -dc
>>>>>> | grep -v "#" | sort -k1,1 -k4,4n -k5,5n | bgzip -c >
>>>>>> gencode.v25lift37.annotation.gff3.gz
>>>>>> $ tabix -p gff gencode.v25lift37.annotation.gff3.gz
>>>>>> $ perl vep.pl -i variants.vcf -gff gencode.v25lift37.annotation.gff3.gz
>>>>>> -fasta homo_sapiens.fa
>>>>>>
>>>>>> This comes with limitations as the GFF file contains only the
>>>>>> transcript structure and not any of the additional annotations. However I
>>>>>> do know of someone successfully using LOFTEE with this exact setup.
>>>>>>
>>>>>> Of course usual beta caveats apply, so if you do use it and find bugs
>>>>>> please report on the GitHub page.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Will McLaren
>>>>>> Ensembl Variation
>>>>>>
>>>>>> [1] : https://github.com/willmclaren/ensembl-vep
>>>>>> [2] : http://www.gencodegenes.org/releases/25lift37.html
>>>>>>
>>>>>> On 26 September 2016 at 20:40, Konrad Karczewski <
>>>>>> konradk at broadinstitute.org> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> When running VEP 85 on GRCh37, I believe the process has been to
>>>>>>> annotate against Gencode 19 (the info.txt seems to confirm this). Realizing
>>>>>>> the ridiculousness of my request, is there any chance there is a cache
>>>>>>> floating around for Gencode 25lift37? Would go a long way for ExAC
>>>>>>> releases.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> -Konrad
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161003/7093501f/attachment.html>


More information about the Dev mailing list