[ensembl-dev] Annotating variants against custom transcripts
Shawn Yost
yostshawn at gmail.com
Mon Jan 22 13:24:31 GMT 2018
Hi,
Thank you for the help. I did add 'biotype=protein_coding' and it
seems to have worked now. I did receive the following error/warning
multiple times:
Use of uninitialized value in hash element at
/opt/gridware/apps/vep/91/modules/Bio/EnsEMBL/VEP/Stats.pm line 365,
<__ANONIO__> line 470.
However I do have annotations for all 392 variants I tested this with.
Do you know what function this is referring to?
Thanks,
Shawn
On Mon, Jan 22, 2018 at 1:04 PM, Helen Schuilenburg <helens at ebi.ac.uk> wrote:
> Hi Shawn
>
> You are correct there is a problem reading the GFF.
>
> When using GFF with VEP, transcripts require a Sequence Ontology biotype to
> be defined in order to be parsed by VEP.
>
> The simplest way to define this is using an attribute named "biotype" on the
> transcript entity e.g. biotype=protein_coding
>
> There is documentation of using GFF with VEP
>
> https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff"
>
> Helen
>
>
>
> On 22/01/2018 11:30, Shawn Yost wrote:
>>
>> Hi,
>> I was able to install the latest version of VEP. I am not getting the
>> following error when running VEP:
>>
>> WARNING: Unable to determine biotype of CART37A24648
>>
>> The command I used was:
>> vep -i testing_annout.txt.hg19_multianno.vcf --gff blah.gff.gz --cache
>> -dir vep/ --hgvs --cache_version 75 --offline --force_overwrite
>> --fasta human_g1k_v37.fasta -o tmp
>>
>> The gff v3 file "blah.gff.gz" looks like this (zcat blah.gff.gz):
>> 2 . exon 73612886 73613320 . +
>> . ID=EXON37A24648.1;Parent=CART37A24648
>> 2 . transcript 73612886 73837046 .
>> + . ID=CART37A24648;hgnc_id=428;gene_symbol=ALMS1
>> 2 . CDS 73612997 73613320 . +
>> 0 ID=CDS37A24648.1;Parent=CART37A24648
>> 2 . CDS 73635750 73635875 . +
>> 0 ID=CDS37A24648.2;Parent=CART37A24648
>> 2 . exon 73635750 73635875 . +
>> . ID=EXON37A24648.2;Parent=CART37A24648
>> 2 . CDS 73646251 73646446 . +
>> 0 ID=CDS37A24648.3;Parent=CART37A24648
>> 2 . exon 73646251 73646446 . +
>> . ID=EXON37A24648.3;Parent=CART37A24648
>> 2 . CDS 73649985 73650102 . +
>> 2 ID=CDS37A24648.4;Parent=CART37A24648
>> 2 . exon 73649985 73650102 . +
>> . ID=EXON37A24648.4;Parent=CART37A24648
>> 2 . CDS 73651558 73652030 . +
>> 1 ID=CDS37A24648.5;Parent=CART37A24648
>> 2 . exon 73651558 73652030 . +
>> . ID=EXON37A24648.5;Parent=CART37A24648
>> 2 . CDS 73653581 73653681 . +
>> 2 ID=CDS37A24648.6;Parent=CART37A24648
>> 2 . exon 73653581 73653681 . +
>> . ID=EXON37A24648.6;Parent=CART37A24648
>> 2 . CDS 73659326 73659419 . +
>> 0 ID=CDS37A24648.7;Parent=CART37A24648
>> 2 . exon 73659326 73659419 . +
>> . ID=EXON37A24648.7;Parent=CART37A24648
>> 2 . CDS 73675090 73681194 . +
>> 2 ID=CDS37A24648.8;Parent=CART37A24648
>> 2 . exon 73675090 73681194 . +
>> . ID=EXON37A24648.8;Parent=CART37A24648
>> 2 . CDS 73682289 73682422 . +
>> 2 ID=CDS37A24648.9;Parent=CART37A24648
>> 2 . exon 73682289 73682422 . +
>> . ID=EXON37A24648.9;Parent=CART37A24648
>> 2 . CDS 73716761 73718625 . +
>> 0 ID=CDS37A24648.10;Parent=CART37A24648
>> 2 . exon 73716761 73718625 . +
>> . ID=EXON37A24648.10;Parent=CART37A24648
>> 2 . CDS 73746902 73747143 . +
>> 1 ID=CDS37A24648.11;Parent=CART37A24648
>> 2 . exon 73746902 73747143 . +
>> . ID=EXON37A24648.11;Parent=CART37A24648
>> 2 . CDS 73761951 73762076 . +
>> 2 ID=CDS37A24648.12;Parent=CART37A24648
>> 2 . exon 73761951 73762076 . +
>> . ID=EXON37A24648.12;Parent=CART37A24648
>> 2 . CDS 73777394 73777564 . +
>> 2 ID=CDS37A24648.13;Parent=CART37A24648
>> 2 . exon 73777394 73777564 . +
>> . ID=EXON37A24648.13;Parent=CART37A24648
>> 2 . CDS 73784347 73784481 . +
>> 2 ID=CDS37A24648.14;Parent=CART37A24648
>> 2 . exon 73784347 73784481 . +
>> . ID=EXON37A24648.14;Parent=CART37A24648
>> 2 . CDS 73786099 73786269 . +
>> 2 ID=CDS37A24648.15;Parent=CART37A24648
>> 2 . exon 73786099 73786269 . +
>> . ID=EXON37A24648.15;Parent=CART37A24648
>> 2 . CDS 73799389 73800551 . +
>> 2 ID=CDS37A24648.16;Parent=CART37A24648
>> 2 . exon 73799389 73800551 . +
>> . ID=EXON37A24648.16;Parent=CART37A24648
>> 2 . CDS 73826528 73826648 . +
>> 0 ID=CDS37A24648.17;Parent=CART37A24648
>> 2 . exon 73826528 73826648 . +
>> . ID=EXON37A24648.17;Parent=CART37A24648
>> 2 . CDS 73827805 73828008 . +
>> 2 ID=CDS37A24648.18;Parent=CART37A24648
>> 2 . exon 73827805 73828008 . +
>> . ID=EXON37A24648.18;Parent=CART37A24648
>> 2 . CDS 73828322 73828563 . +
>> 2 ID=CDS37A24648.19;Parent=CART37A24648
>> 2 . exon 73828322 73828563 . +
>> . ID=EXON37A24648.19;Parent=CART37A24648
>> 2 . CDS 73829312 73829495 . +
>> 0 ID=CDS37A24648.20;Parent=CART37A24648
>> 2 . exon 73829312 73829495 . +
>> . ID=EXON37A24648.20;Parent=CART37A24648
>> 2 . CDS 73830368 73830431 . +
>> 2 ID=CDS37A24648.21;Parent=CART37A24648
>> 2 . exon 73830368 73830431 . +
>> . ID=EXON37A24648.21;Parent=CART37A24648
>> 2 . CDS 73835602 73835701 . +
>> 1 ID=CDS37A24648.22;Parent=CART37A24648
>> 2 . exon 73835602 73835701 . +
>> . ID=EXON37A24648.22;Parent=CART37A24648
>> 2 . CDS 73836695 73836739 . +
>> 0 ID=CDS37A24648.23;Parent=CART37A24648
>> 2 . exon 73836695 73837046 . +
>> . ID=EXON37A24648.23;Parent=CART37A24648
>>
>>
>>
>> Is there a problem with the gff file I am inputting?
>>
>> Thanks,
>> Shawn
>>
>>
>> On Thu, Jan 18, 2018 at 3:11 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>>>
>>> Hi Shawn,
>>>
>>> the Bio::DB::HTS module should have been installed by INSTALL.pl. Did you
>>> set the DYLD_LIBRARY_PATH environment variable as asked during the
>>> installation?
>>>
>>> You can also install Bio::DB::HTS manually:
>>> https://github.com/Ensembl/Bio-DB-HTS.
>>>
>>> Anja
>>>
>>> On 18 Jan 2018, at 14:46, Shawn Yost <yostshawn at gmail.com> wrote:
>>>
>>> Hi,
>>> I've installed the latest version of vep and I'm not getting a new
>>> error. This error occurred also during the install/test step of vep:
>>>
>>> -------------------- EXCEPTION --------------------
>>> MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module
>>> installed
>>>
>>> STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new
>>>
>>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
>>> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom
>>>
>>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
>>> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all
>>>
>>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
>>> STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources
>>>
>>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
>>> STACK Bio::EnsEMBL::VEP::Runner::init
>>>
>>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
>>> STACK Bio::EnsEMBL::VEP::Runner::next_output_line
>>>
>>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:356
>>> STACK toplevel ./t/Runner.t:739
>>> Date (localtime) = Thu Jan 18 14:26:33 2018
>>> Ensembl API version = 91
>>>
>>>
>>>
>>> I'm using a conda env to install/run VEP. Inside of the
>>> 'ensembl-vep/' directory Tabix does exist:
>>>
>>> ls -la Bio/DB/HTS/Tabix*
>>>
>>> -rw-rw---- 1 syost cancgene 6721 Jan 18 14:24 Bio/DB/HTS/Tabix.pm
>>>
>>> Bio/DB/HTS/Tabix:
>>> total 12
>>> drwxrwx--- 2 syost cancgene 4096 Jan 18 14:24 .
>>> drwxrwx--- 5 syost cancgene 4096 Jan 18 14:24 ..
>>> -rw-rw---- 1 syost cancgene 2974 Jan 18 14:24 Iterator.pm
>>>
>>>
>>> I've also installed Tabix separately and it works. Do you have any
>>> suggestions? Why isn't it recognizing the Tabix.pm?
>>>
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>>
>>>
>>> On Tue, Jan 16, 2018 at 2:21 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>>>
>>> Hi Shawn,
>>>
>>> I noticed that you are not using the supported VEP code. You can install
>>> the
>>> new code by following the instructions here:
>>> http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html
>>>
>>> The new VEP code supports annotations against a GFF or GTF file:
>>> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#other
>>>
>>> Anja
>>>
>>>
>>> On 16 Jan 2018, at 14:06, Shawn Yost <yostshawn at gmail.com> wrote:
>>>
>>> Hi,
>>> I would like to annotate my VCF file against a custom transcript
>>> database.
>>> I've created both a GFF v3 file and a GTF file (see below) and I have
>>> been
>>> unsuccessful in getting VEP to annotate against these transcripts. The
>>> examples below are before using sort + bgzip + tabix (so that is not the
>>> problem).
>>>
>>> I'm currently running VEP v85.
>>>
>>> The command I used was:
>>> variant_effect_predictor.pl -i IN.vcf --custom test.gff.gz,,gff -fasta
>>> test.fa --cache -o OUT -dir /path/to/cache --hgvs --cache_version 75
>>> --offline --force_overwrite
>>>
>>> In the outputted file I only see ENSTs and can't find the transcripts I
>>> inputted along with them. The same thing occurs if I run --custom
>>> test.gtf.gz,,gtf. If I change the command to --custom
>>> test.gtf.gz,,gtf,overlap it will tell me if it overlaps the inputted
>>> transcript but it doesn't annotate against the transcript.
>>>
>>>
>>> Is there a problem with the command options I am using? Is there a
>>> problem
>>> with the inputted files? How do I get VEP to annotate the variant
>>> against
>>> my custom transcript (i.e. 412662 2:73613056 A GENE1
>>> TRANSCRIPT1 Transcript synonymous_variant ....)?
>>>
>>>
>>>
>>> Example output:
>>> 241004 2:73613032-73613049 - ENSG00000116127 ENST00000264448
>>> Transcript inframe_deletion 147-164 36-53 12-18
>>> LEEEEEE/L ctGGAGGAGGAGGAGGAGGAg/ctg -
>>>
>>> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000264448.6:c.36_53delNNNNNNNNNNNNNNNNNN;HGVSp=ENSP00000264448.6:p.Glu23_Glu28del
>>> 412662 2:73613056 A ENSG00000116127 ENST00000377715
>>> Transcript
>>> synonymous_variant 171 60 20 E gaG/gaA -
>>>
>>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000377715.1:c.60N>A;HGVSp=ENST00000377715.1:c.60N>A(p.%3D)
>>> 412662 2:73613056 A ENSG00000116127 ENST00000409009
>>> Transcript
>>> synonymous_variant 171 60 20 E gaG/gaA -
>>>
>>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000409009.1:c.60N>A;HGVSp=ENST00000409009.1:c.60N>A(p.%3D)
>>> 412662 2:73613056 A ENSG00000116127 ENST00000264448
>>> Transcript
>>> synonymous_variant 171 60 20 E gaG/gaA -
>>>
>>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000264448.6:c.60N>A;HGVSp=ENST00000264448.6:c.60N>A(p.%3D)
>>> 402364 2:73613066-73613071 - ENSG00000116127 ENST00000377715
>>> Transcript inframe_deletion 181-186 70-75 24-25 EE/-
>>> GAGGAA/- -
>>>
>>> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000377715.1:c.70_75delNNNNNN;HGVSp=ENSP00000366944.1:p.Glu27_Glu28del
>>>
>>>
>>>
>>> GFF v3 file:
>>> 15 . transcript 74701625 74726300 . -
>>> . ID=TRANSCRIPT1;Alias=10741;Name=SEMA7A
>>> 15 . exon 74726082 74726300 . - .
>>> ID=EXON37A10411.1;Parent=TRANSCRIPT1
>>> 15 . exon 74711142 74711293 . - .
>>> ID=EXON37A10411.2;Parent=TRANSCRIPT1
>>> 15 . exon 74710609 74710650 . - .
>>> ID=EXON37A10411.3;Parent=TRANSCRIPT1
>>> 15 . exon 74710218 74710310 . - .
>>> ID=EXON37A10411.4;Parent=TRANSCRIPT1
>>> 15 . exon 74709932 74710016 . - .
>>> ID=EXON37A10411.5;Parent=TRANSCRIPT1
>>> 15 . exon 74709676 74709786 . - .
>>> ID=EXON37A10411.6;Parent=TRANSCRIPT1
>>> 15 . exon 74708916 74709055 . - .
>>> ID=EXON37A10411.7;Parent=TRANSCRIPT1
>>> 15 . exon 74708142 74708326 . - .
>>> ID=EXON37A10411.8;Parent=TRANSCRIPT1
>>> 15 . exon 74707179 74707287 . - .
>>> ID=EXON37A10411.9;Parent=TRANSCRIPT1
>>> 15 . exon 74706888 74707086 . - .
>>> ID=EXON37A10411.10;Parent=TRANSCRIPT1
>>> 15 . exon 74704226 74704353 . - .
>>> ID=EXON37A10411.11;Parent=TRANSCRIPT1
>>> 15 . exon 74703897 74704051 . - .
>>> ID=EXON37A10411.12;Parent=TRANSCRIPT1
>>> 15 . exon 74703636 74703697 . - .
>>> ID=EXON37A10411.13;Parent=TRANSCRIPT1
>>> 15 . exon 74701625 74703326 . - .
>>> ID=EXON37A10411.14;Parent=TRANSCRIPT1
>>> 15 . CDS 74726082 74726259 . - 0
>>> ID=CDS37A10411.1;Parent=TRANSCRIPT1
>>> 15 . CDS 74711142 74711293 . - 2
>>> ID=CDS37A10411.2;Parent=TRANSCRIPT1
>>> 15 . CDS 74710609 74710650 . - 0
>>> ID=CDS37A10411.3;Parent=TRANSCRIPT1
>>> 15 . CDS 74710218 74710310 . - 0
>>> ID=CDS37A10411.4;Parent=TRANSCRIPT1
>>> 15 . CDS 74709932 74710016 . - 0
>>> ID=CDS37A10411.5;Parent=TRANSCRIPT1
>>> 15 . CDS 74709676 74709786 . - 2
>>> ID=CDS37A10411.6;Parent=TRANSCRIPT1
>>> 15 . CDS 74708916 74709055 . - 2
>>> ID=CDS37A10411.7;Parent=TRANSCRIPT1
>>> 15 . CDS 74708142 74708326 . - 0
>>> ID=CDS37A10411.8;Parent=TRANSCRIPT1
>>> 15 . CDS 74707179 74707287 . - 1
>>> ID=CDS37A10411.9;Parent=TRANSCRIPT1
>>> 15 . CDS 74706888 74707086 . - 0
>>> ID=CDS37A10411.10;Parent=TRANSCRIPT1
>>> 15 . CDS 74704226 74704353 . - 2
>>> ID=CDS37A10411.11;Parent=TRANSCRIPT1
>>> 15 . CDS 74703897 74704051 . - 0
>>> ID=CDS37A10411.12;Parent=TRANSCRIPT1
>>> 15 . CDS 74703636 74703697 . - 1
>>> ID=CDS37A10411.13;Parent=TRANSCRIPT1
>>> 15 . CDS 74702965 74703326 . - 2
>>> ID=CDS37A10411.14;Parent=TRANSCRIPT1
>>>
>>>
>>> GTF file:
>>> TRANSCRIPT1 15 - 74701624 74726300 74702964
>>> 74726259 14
>>>
>>> 74701624,74703635,74703896,74704225,74706887,74707178,74708141,74708915,74709675,74709931,74710217,74710608,74711141,74726081,
>>>
>>> 74703326,74703697,74704051,74704353,74707086,74707287,74708326,74709055,74709786,74710016,74710310,74710650,74711293,74726300,
>>> 0 HGNC:10741 cmplcmpl 1,2,0,1,0,2,0,1,1,0,0,0,1,0,
>>> TRANSCRIPT2 17 - 8108048 8113944 8108188 8113542 9
>>> 8108048,8108533,8109808,8110067,8110493,8110888,8111055,8113494,8113847,
>>> 8108362,8108708,8109957,8110206,8110685,8110943,8111158,8113567,8113944,
>>> 0 HGNC:11390 cmpl cmpl0,2,0,2,2,1,0,0,-1,
>>> TRANSCRIPT3 11 - 73711325 73720282 73712456
>>> 73718087 7
>>> 73711325,73714871,73715528,73716774,73717213,73717961,73720022,
>>> 73712571,73715052,73715630,73716978,73717424,73718182,73720282, 0
>>> HGNC:12519 cmplcmpl 2,1,1,1,0,0,-1,
>>> TRANSCRIPT4 11 - 123594634 123612391 123596704
>>> 123601596 9
>>>
>>> 123594634,123598183,123598840,123599833,123600322,123601194,123610824,123611124,123612256,
>>>
>>> 123597699,123598303,123598970,123599922,123600533,123601693,123610900,123611244,123612391,
>>> 0 HGNC:12994 cmpl cmpl 1,1,0,1,0,0,-1,-1,-1,
>>> TRANSCRIPT5 19 - 51994480 52005043 51994894
>>> 52004987 8
>>> 51994480,52000133,52000602,52001271,52002423,52002691,52003173,52004560,
>>> 51995083,52000230,52000699,52001541,52002471,52002970,52003554,52005043,
>>> 0 HGNC:15482 cmpl cmpl 0,2,1,1,1,1,1,0,
>>> TRANSCRIPT6 1 + 228194722 228248972 228194829
>>> 228247166 4 228194722,228210367,228238356,228246686,
>>> 228194900,228210609,228238622,228248972, 0 HGNC:15983
>>> cmpl
>>> cmpl 0,2,1,0,
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thank you for your help,
>>> Shawn
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list