[ensembl-dev] VEP on 37, but Gencode 25?

Konrad Karczewski konradk at broadinstitute.org
Thu Sep 29 15:48:34 BST 2016


Hmm, that's interesting. When I added info.txt, now everything failed with:

Can't call method "db" on unblessed reference at
/humgen/atgu1/fs03/DM-Lab/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariation.pm
line 324.

-Konrad

On September 29, 2016 at 9:34:18 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:

We might be able to write a plugin to read the data from a pair of table
dump files.

Let me have a go at doing that, as you are not the only person requesting
similar at the moment!

Will

On 29 September 2016 at 14:21, Konrad Karczewski <konradk at broadinstitute.org
> wrote:

> Great, thanks! Will check that out.
>
> Is that to say there's no way to get the SIFT and PolyPhen annotations
> locally? Happy to do some legwork if it means I can recreate the entire
> thing with this new annotation set!
>
> -Konrad
>
> On September 29, 2016 at 3:52:47 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>
> You'd also need to copy over the homo_sapiens/85_GRCh37/info.txt file,
> this contains the column headers for the _var files, hence the warnings
> when it finds data that doesn't match its best guess of those headers.
>
> RE: SIFT and PolyPhen, if you use --cache instead of --offline you *might*
> find that it is able to retrieve SIFT and PolyPhen matrices from the
> database server. I've tested this with the new code but not the version
> you're on. You might also want to use "--host useastdb.ensembl.org",
> assuming you're East Coast, this will give you the fastest (public) DB
> connection.
>
> Will
>
> On 28 September 2016 at 21:00, Konrad Karczewski <
> konradk at broadinstitute.org> wrote:
>
>> Ok, I think I got that mostly working (sorted it properly and converted
>> transcript_type to transcript_biotype, appears to have worked). I then
>> pulled the _var and _reg caches over as-is from 85 (not sure if wise).
>>
>> Now when I run it, it appears to complete without error, but I'm running
>> into many of these warnings:
>>
>> Use of uninitialized value in list assignment at
>> /humgen/atgu1/fs03/DM-Lab/vep/ensembl-tools-release-85/scrip
>> ts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line
>> 5344, <DUMP> line 1.
>>
>> Also, SIFT and PolyPhen don't appear to get output alongside it. Is that
>> expected (or perhaps related to above warnings)? Anything I can do to get
>> those in there?
>>
>> -Konrad
>>
>> On September 27, 2016 at 10:58:54 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>>
>> You can try running it with --verbose, it will give you some error
>> logging.
>>
>> Will
>>
>> On 27 September 2016 at 15:56, Konrad Karczewski <
>> konradk at broadinstitute.org> wrote:
>>
>>> Ok good to know - I actually tried it, but I think something is being
>>> odd. It gets through the whole thing (going back and forth between
>>> chromosomes like you said, so I can try to fix that), but then appears to
>>> finish:
>>>
>>> 2016-09-26 16:12:30 - Processing chromosome Y
>>> WARNING: Could not find chromosome named M in FASTA file
>>> 2016-09-26 16:12:52 - All done!
>>>
>>> But the output directory (either ~/.vep or the directory I pointed to
>>> with --dir) are empty. Is this a related issue? Thought you might want to
>>> know to add a bit of error logging if so.
>>>
>>> -Konrad
>>>
>>> On September 27, 2016 at 8:30:15 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>>>
>>> In theory this should work, but the gtf2vep.pl script doesn't seem to
>>> work too well with this particular GFF (it was designed really to work with
>>> GFF/GTFs as produced by Ensembl or NCBI). Probably with some tweaks it
>>> could be made to work - I believe the major issues are caused by features
>>> being out of the order that the script expects.
>>>
>>> The new code uses a much more robust system for constructing transcripts
>>> and has been tested with GFFs from Ensembl, NCBI and GENCODE.
>>>
>>> Will
>>>
>>> On 27 September 2016 at 13:22, Konrad Karczewski <
>>> konradk at broadinstitute.org> wrote:
>>>
>>>> I just also realized - would creating a cache from this gff file (using
>>>> gtf2vep.pl) not be recommended?
>>>>
>>>> -Konrad
>>>>
>>>> On September 27, 2016 at 5:16:42 AM, Will McLaren (wm2 at ebi.ac.uk)
>>>> wrote:
>>>>
>>>> Hi Konrad,
>>>>
>>>> The beta ensembl-vep code [1] supports annotation directly from a GFF
>>>> file, such as the one available from the GENCODE website [2].
>>>>
>>>> $ curl ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/releas
>>>> e_25/GRCh37_mapping/gencode.v25lift37.annotation.gff3.gz | gzip -dc |
>>>> grep -v "#" | sort -k1,1 -k4,4n -k5,5n | bgzip -c >
>>>> gencode.v25lift37.annotation.gff3.gz
>>>> $ tabix -p gff gencode.v25lift37.annotation.gff3.gz
>>>> $ perl vep.pl -i variants.vcf -gff gencode.v25lift37.annotation.gff3.gz
>>>> -fasta homo_sapiens.fa
>>>>
>>>> This comes with limitations as the GFF file contains only the
>>>> transcript structure and not any of the additional annotations. However I
>>>> do know of someone successfully using LOFTEE with this exact setup.
>>>>
>>>> Of course usual beta caveats apply, so if you do use it and find bugs
>>>> please report on the GitHub page.
>>>>
>>>> Regards
>>>>
>>>> Will McLaren
>>>> Ensembl Variation
>>>>
>>>> [1] : https://github.com/willmclaren/ensembl-vep
>>>> [2] : http://www.gencodegenes.org/releases/25lift37.html
>>>>
>>>> On 26 September 2016 at 20:40, Konrad Karczewski <
>>>> konradk at broadinstitute.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> When running VEP 85 on GRCh37, I believe the process has been to
>>>>> annotate against Gencode 19 (the info.txt seems to confirm this). Realizing
>>>>> the ridiculousness of my request, is there any chance there is a cache
>>>>> floating around for Gencode 25lift37? Would go a long way for ExAC
>>>>> releases.
>>>>>
>>>>> Thanks!
>>>>> -Konrad
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160929/92ac38ee/attachment.html>


More information about the Dev mailing list