[ensembl-dev] VEP on 37, but Gencode 25?

Konrad Karczewski konradk at broadinstitute.org
Thu Sep 29 14:21:23 BST 2016


Great, thanks! Will check that out.

Is that to say there's no way to get the SIFT and PolyPhen annotations
locally? Happy to do some legwork if it means I can recreate the entire
thing with this new annotation set!

-Konrad

On September 29, 2016 at 3:52:47 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:

You'd also need to copy over the homo_sapiens/85_GRCh37/info.txt file, this
contains the column headers for the _var files, hence the warnings when it
finds data that doesn't match its best guess of those headers.

RE: SIFT and PolyPhen, if you use --cache instead of --offline you *might*
find that it is able to retrieve SIFT and PolyPhen matrices from the
database server. I've tested this with the new code but not the version
you're on. You might also want to use "--host useastdb.ensembl.org",
assuming you're East Coast, this will give you the fastest (public) DB
connection.

Will

On 28 September 2016 at 21:00, Konrad Karczewski <konradk at broadinstitute.org
> wrote:

> Ok, I think I got that mostly working (sorted it properly and converted
> transcript_type to transcript_biotype, appears to have worked). I then
> pulled the _var and _reg caches over as-is from 85 (not sure if wise).
>
> Now when I run it, it appears to complete without error, but I'm running
> into many of these warnings:
>
> Use of uninitialized value in list assignment at
> /humgen/atgu1/fs03/DM-Lab/vep/ensembl-tools-release-85/
> scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line
> 5344, <DUMP> line 1.
>
> Also, SIFT and PolyPhen don't appear to get output alongside it. Is that
> expected (or perhaps related to above warnings)? Anything I can do to get
> those in there?
>
> -Konrad
>
> On September 27, 2016 at 10:58:54 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>
> You can try running it with --verbose, it will give you some error logging.
>
> Will
>
> On 27 September 2016 at 15:56, Konrad Karczewski <
> konradk at broadinstitute.org> wrote:
>
>> Ok good to know - I actually tried it, but I think something is being
>> odd. It gets through the whole thing (going back and forth between
>> chromosomes like you said, so I can try to fix that), but then appears to
>> finish:
>>
>> 2016-09-26 16:12:30 - Processing chromosome Y
>> WARNING: Could not find chromosome named M in FASTA file
>> 2016-09-26 16:12:52 - All done!
>>
>> But the output directory (either ~/.vep or the directory I pointed to
>> with --dir) are empty. Is this a related issue? Thought you might want to
>> know to add a bit of error logging if so.
>>
>> -Konrad
>>
>> On September 27, 2016 at 8:30:15 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>>
>> In theory this should work, but the gtf2vep.pl script doesn't seem to
>> work too well with this particular GFF (it was designed really to work with
>> GFF/GTFs as produced by Ensembl or NCBI). Probably with some tweaks it
>> could be made to work - I believe the major issues are caused by features
>> being out of the order that the script expects.
>>
>> The new code uses a much more robust system for constructing transcripts
>> and has been tested with GFFs from Ensembl, NCBI and GENCODE.
>>
>> Will
>>
>> On 27 September 2016 at 13:22, Konrad Karczewski <
>> konradk at broadinstitute.org> wrote:
>>
>>> I just also realized - would creating a cache from this gff file (using
>>> gtf2vep.pl) not be recommended?
>>>
>>> -Konrad
>>>
>>> On September 27, 2016 at 5:16:42 AM, Will McLaren (wm2 at ebi.ac.uk) wrote:
>>>
>>> Hi Konrad,
>>>
>>> The beta ensembl-vep code [1] supports annotation directly from a GFF
>>> file, such as the one available from the GENCODE website [2].
>>>
>>> $ curl ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/releas
>>> e_25/GRCh37_mapping/gencode.v25lift37.annotation.gff3.gz | gzip -dc |
>>> grep -v "#" | sort -k1,1 -k4,4n -k5,5n | bgzip -c >
>>> gencode.v25lift37.annotation.gff3.gz
>>> $ tabix -p gff gencode.v25lift37.annotation.gff3.gz
>>> $ perl vep.pl -i variants.vcf -gff gencode.v25lift37.annotation.gff3.gz
>>> -fasta homo_sapiens.fa
>>>
>>> This comes with limitations as the GFF file contains only the transcript
>>> structure and not any of the additional annotations. However I do know of
>>> someone successfully using LOFTEE with this exact setup.
>>>
>>> Of course usual beta caveats apply, so if you do use it and find bugs
>>> please report on the GitHub page.
>>>
>>> Regards
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>> [1] : https://github.com/willmclaren/ensembl-vep
>>> [2] : http://www.gencodegenes.org/releases/25lift37.html
>>>
>>> On 26 September 2016 at 20:40, Konrad Karczewski <
>>> konradk at broadinstitute.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> When running VEP 85 on GRCh37, I believe the process has been to
>>>> annotate against Gencode 19 (the info.txt seems to confirm this). Realizing
>>>> the ridiculousness of my request, is there any chance there is a cache
>>>> floating around for Gencode 25lift37? Would go a long way for ExAC
>>>> releases.
>>>>
>>>> Thanks!
>>>> -Konrad
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160929/53d4ea3a/attachment.html>


More information about the Dev mailing list