[ensembl-dev] Annotating variants against custom transcripts
Helen Schuilenburg
helens at ebi.ac.uk
Mon Jan 22 13:04:28 GMT 2018
Hi Shawn
You are correct there is a problem reading the GFF.
When using GFF with VEP, transcripts require a Sequence Ontology biotype
to be defined in order to be parsed by VEP.
The simplest way to define this is using an attribute named "biotype" on
the transcript entity e.g. biotype=protein_coding
There is documentation of using GFF with VEP
https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff"
Helen
On 22/01/2018 11:30, Shawn Yost wrote:
> Hi,
> I was able to install the latest version of VEP. I am not getting the
> following error when running VEP:
>
> WARNING: Unable to determine biotype of CART37A24648
>
> The command I used was:
> vep -i testing_annout.txt.hg19_multianno.vcf --gff blah.gff.gz --cache
> -dir vep/ --hgvs --cache_version 75 --offline --force_overwrite
> --fasta human_g1k_v37.fasta -o tmp
>
> The gff v3 file "blah.gff.gz" looks like this (zcat blah.gff.gz):
> 2 . exon 73612886 73613320 . +
> . ID=EXON37A24648.1;Parent=CART37A24648
> 2 . transcript 73612886 73837046 .
> + . ID=CART37A24648;hgnc_id=428;gene_symbol=ALMS1
> 2 . CDS 73612997 73613320 . +
> 0 ID=CDS37A24648.1;Parent=CART37A24648
> 2 . CDS 73635750 73635875 . +
> 0 ID=CDS37A24648.2;Parent=CART37A24648
> 2 . exon 73635750 73635875 . +
> . ID=EXON37A24648.2;Parent=CART37A24648
> 2 . CDS 73646251 73646446 . +
> 0 ID=CDS37A24648.3;Parent=CART37A24648
> 2 . exon 73646251 73646446 . +
> . ID=EXON37A24648.3;Parent=CART37A24648
> 2 . CDS 73649985 73650102 . +
> 2 ID=CDS37A24648.4;Parent=CART37A24648
> 2 . exon 73649985 73650102 . +
> . ID=EXON37A24648.4;Parent=CART37A24648
> 2 . CDS 73651558 73652030 . +
> 1 ID=CDS37A24648.5;Parent=CART37A24648
> 2 . exon 73651558 73652030 . +
> . ID=EXON37A24648.5;Parent=CART37A24648
> 2 . CDS 73653581 73653681 . +
> 2 ID=CDS37A24648.6;Parent=CART37A24648
> 2 . exon 73653581 73653681 . +
> . ID=EXON37A24648.6;Parent=CART37A24648
> 2 . CDS 73659326 73659419 . +
> 0 ID=CDS37A24648.7;Parent=CART37A24648
> 2 . exon 73659326 73659419 . +
> . ID=EXON37A24648.7;Parent=CART37A24648
> 2 . CDS 73675090 73681194 . +
> 2 ID=CDS37A24648.8;Parent=CART37A24648
> 2 . exon 73675090 73681194 . +
> . ID=EXON37A24648.8;Parent=CART37A24648
> 2 . CDS 73682289 73682422 . +
> 2 ID=CDS37A24648.9;Parent=CART37A24648
> 2 . exon 73682289 73682422 . +
> . ID=EXON37A24648.9;Parent=CART37A24648
> 2 . CDS 73716761 73718625 . +
> 0 ID=CDS37A24648.10;Parent=CART37A24648
> 2 . exon 73716761 73718625 . +
> . ID=EXON37A24648.10;Parent=CART37A24648
> 2 . CDS 73746902 73747143 . +
> 1 ID=CDS37A24648.11;Parent=CART37A24648
> 2 . exon 73746902 73747143 . +
> . ID=EXON37A24648.11;Parent=CART37A24648
> 2 . CDS 73761951 73762076 . +
> 2 ID=CDS37A24648.12;Parent=CART37A24648
> 2 . exon 73761951 73762076 . +
> . ID=EXON37A24648.12;Parent=CART37A24648
> 2 . CDS 73777394 73777564 . +
> 2 ID=CDS37A24648.13;Parent=CART37A24648
> 2 . exon 73777394 73777564 . +
> . ID=EXON37A24648.13;Parent=CART37A24648
> 2 . CDS 73784347 73784481 . +
> 2 ID=CDS37A24648.14;Parent=CART37A24648
> 2 . exon 73784347 73784481 . +
> . ID=EXON37A24648.14;Parent=CART37A24648
> 2 . CDS 73786099 73786269 . +
> 2 ID=CDS37A24648.15;Parent=CART37A24648
> 2 . exon 73786099 73786269 . +
> . ID=EXON37A24648.15;Parent=CART37A24648
> 2 . CDS 73799389 73800551 . +
> 2 ID=CDS37A24648.16;Parent=CART37A24648
> 2 . exon 73799389 73800551 . +
> . ID=EXON37A24648.16;Parent=CART37A24648
> 2 . CDS 73826528 73826648 . +
> 0 ID=CDS37A24648.17;Parent=CART37A24648
> 2 . exon 73826528 73826648 . +
> . ID=EXON37A24648.17;Parent=CART37A24648
> 2 . CDS 73827805 73828008 . +
> 2 ID=CDS37A24648.18;Parent=CART37A24648
> 2 . exon 73827805 73828008 . +
> . ID=EXON37A24648.18;Parent=CART37A24648
> 2 . CDS 73828322 73828563 . +
> 2 ID=CDS37A24648.19;Parent=CART37A24648
> 2 . exon 73828322 73828563 . +
> . ID=EXON37A24648.19;Parent=CART37A24648
> 2 . CDS 73829312 73829495 . +
> 0 ID=CDS37A24648.20;Parent=CART37A24648
> 2 . exon 73829312 73829495 . +
> . ID=EXON37A24648.20;Parent=CART37A24648
> 2 . CDS 73830368 73830431 . +
> 2 ID=CDS37A24648.21;Parent=CART37A24648
> 2 . exon 73830368 73830431 . +
> . ID=EXON37A24648.21;Parent=CART37A24648
> 2 . CDS 73835602 73835701 . +
> 1 ID=CDS37A24648.22;Parent=CART37A24648
> 2 . exon 73835602 73835701 . +
> . ID=EXON37A24648.22;Parent=CART37A24648
> 2 . CDS 73836695 73836739 . +
> 0 ID=CDS37A24648.23;Parent=CART37A24648
> 2 . exon 73836695 73837046 . +
> . ID=EXON37A24648.23;Parent=CART37A24648
>
>
>
> Is there a problem with the gff file I am inputting?
>
> Thanks,
> Shawn
>
>
> On Thu, Jan 18, 2018 at 3:11 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>> Hi Shawn,
>>
>> the Bio::DB::HTS module should have been installed by INSTALL.pl. Did you
>> set the DYLD_LIBRARY_PATH environment variable as asked during the
>> installation?
>>
>> You can also install Bio::DB::HTS manually:
>> https://github.com/Ensembl/Bio-DB-HTS.
>>
>> Anja
>>
>> On 18 Jan 2018, at 14:46, Shawn Yost <yostshawn at gmail.com> wrote:
>>
>> Hi,
>> I've installed the latest version of vep and I'm not getting a new
>> error. This error occurred also during the install/test step of vep:
>>
>> -------------------- EXCEPTION --------------------
>> MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module
>> installed
>>
>> STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
>> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
>> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
>> STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
>> STACK Bio::EnsEMBL::VEP::Runner::init
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
>> STACK Bio::EnsEMBL::VEP::Runner::next_output_line
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:356
>> STACK toplevel ./t/Runner.t:739
>> Date (localtime) = Thu Jan 18 14:26:33 2018
>> Ensembl API version = 91
>>
>>
>>
>> I'm using a conda env to install/run VEP. Inside of the
>> 'ensembl-vep/' directory Tabix does exist:
>>
>> ls -la Bio/DB/HTS/Tabix*
>>
>> -rw-rw---- 1 syost cancgene 6721 Jan 18 14:24 Bio/DB/HTS/Tabix.pm
>>
>> Bio/DB/HTS/Tabix:
>> total 12
>> drwxrwx--- 2 syost cancgene 4096 Jan 18 14:24 .
>> drwxrwx--- 5 syost cancgene 4096 Jan 18 14:24 ..
>> -rw-rw---- 1 syost cancgene 2974 Jan 18 14:24 Iterator.pm
>>
>>
>> I've also installed Tabix separately and it works. Do you have any
>> suggestions? Why isn't it recognizing the Tabix.pm?
>>
>>
>> Thanks,
>> Shawn
>>
>>
>>
>>
>> On Tue, Jan 16, 2018 at 2:21 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>>
>> Hi Shawn,
>>
>> I noticed that you are not using the supported VEP code. You can install the
>> new code by following the instructions here:
>> http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html
>>
>> The new VEP code supports annotations against a GFF or GTF file:
>> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#other
>>
>> Anja
>>
>>
>> On 16 Jan 2018, at 14:06, Shawn Yost <yostshawn at gmail.com> wrote:
>>
>> Hi,
>> I would like to annotate my VCF file against a custom transcript database.
>> I've created both a GFF v3 file and a GTF file (see below) and I have been
>> unsuccessful in getting VEP to annotate against these transcripts. The
>> examples below are before using sort + bgzip + tabix (so that is not the
>> problem).
>>
>> I'm currently running VEP v85.
>>
>> The command I used was:
>> variant_effect_predictor.pl -i IN.vcf --custom test.gff.gz,,gff -fasta
>> test.fa --cache -o OUT -dir /path/to/cache --hgvs --cache_version 75
>> --offline --force_overwrite
>>
>> In the outputted file I only see ENSTs and can't find the transcripts I
>> inputted along with them. The same thing occurs if I run --custom
>> test.gtf.gz,,gtf. If I change the command to --custom
>> test.gtf.gz,,gtf,overlap it will tell me if it overlaps the inputted
>> transcript but it doesn't annotate against the transcript.
>>
>>
>> Is there a problem with the command options I am using? Is there a problem
>> with the inputted files? How do I get VEP to annotate the variant against
>> my custom transcript (i.e. 412662 2:73613056 A GENE1
>> TRANSCRIPT1 Transcript synonymous_variant ....)?
>>
>>
>>
>> Example output:
>> 241004 2:73613032-73613049 - ENSG00000116127 ENST00000264448
>> Transcript inframe_deletion 147-164 36-53 12-18
>> LEEEEEE/L ctGGAGGAGGAGGAGGAGGAg/ctg -
>> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000264448.6:c.36_53delNNNNNNNNNNNNNNNNNN;HGVSp=ENSP00000264448.6:p.Glu23_Glu28del
>> 412662 2:73613056 A ENSG00000116127 ENST00000377715 Transcript
>> synonymous_variant 171 60 20 E gaG/gaA -
>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000377715.1:c.60N>A;HGVSp=ENST00000377715.1:c.60N>A(p.%3D)
>> 412662 2:73613056 A ENSG00000116127 ENST00000409009 Transcript
>> synonymous_variant 171 60 20 E gaG/gaA -
>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000409009.1:c.60N>A;HGVSp=ENST00000409009.1:c.60N>A(p.%3D)
>> 412662 2:73613056 A ENSG00000116127 ENST00000264448 Transcript
>> synonymous_variant 171 60 20 E gaG/gaA -
>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000264448.6:c.60N>A;HGVSp=ENST00000264448.6:c.60N>A(p.%3D)
>> 402364 2:73613066-73613071 - ENSG00000116127 ENST00000377715
>> Transcript inframe_deletion 181-186 70-75 24-25 EE/-
>> GAGGAA/- -
>> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000377715.1:c.70_75delNNNNNN;HGVSp=ENSP00000366944.1:p.Glu27_Glu28del
>>
>>
>>
>> GFF v3 file:
>> 15 . transcript 74701625 74726300 . -
>> . ID=TRANSCRIPT1;Alias=10741;Name=SEMA7A
>> 15 . exon 74726082 74726300 . - .
>> ID=EXON37A10411.1;Parent=TRANSCRIPT1
>> 15 . exon 74711142 74711293 . - .
>> ID=EXON37A10411.2;Parent=TRANSCRIPT1
>> 15 . exon 74710609 74710650 . - .
>> ID=EXON37A10411.3;Parent=TRANSCRIPT1
>> 15 . exon 74710218 74710310 . - .
>> ID=EXON37A10411.4;Parent=TRANSCRIPT1
>> 15 . exon 74709932 74710016 . - .
>> ID=EXON37A10411.5;Parent=TRANSCRIPT1
>> 15 . exon 74709676 74709786 . - .
>> ID=EXON37A10411.6;Parent=TRANSCRIPT1
>> 15 . exon 74708916 74709055 . - .
>> ID=EXON37A10411.7;Parent=TRANSCRIPT1
>> 15 . exon 74708142 74708326 . - .
>> ID=EXON37A10411.8;Parent=TRANSCRIPT1
>> 15 . exon 74707179 74707287 . - .
>> ID=EXON37A10411.9;Parent=TRANSCRIPT1
>> 15 . exon 74706888 74707086 . - .
>> ID=EXON37A10411.10;Parent=TRANSCRIPT1
>> 15 . exon 74704226 74704353 . - .
>> ID=EXON37A10411.11;Parent=TRANSCRIPT1
>> 15 . exon 74703897 74704051 . - .
>> ID=EXON37A10411.12;Parent=TRANSCRIPT1
>> 15 . exon 74703636 74703697 . - .
>> ID=EXON37A10411.13;Parent=TRANSCRIPT1
>> 15 . exon 74701625 74703326 . - .
>> ID=EXON37A10411.14;Parent=TRANSCRIPT1
>> 15 . CDS 74726082 74726259 . - 0
>> ID=CDS37A10411.1;Parent=TRANSCRIPT1
>> 15 . CDS 74711142 74711293 . - 2
>> ID=CDS37A10411.2;Parent=TRANSCRIPT1
>> 15 . CDS 74710609 74710650 . - 0
>> ID=CDS37A10411.3;Parent=TRANSCRIPT1
>> 15 . CDS 74710218 74710310 . - 0
>> ID=CDS37A10411.4;Parent=TRANSCRIPT1
>> 15 . CDS 74709932 74710016 . - 0
>> ID=CDS37A10411.5;Parent=TRANSCRIPT1
>> 15 . CDS 74709676 74709786 . - 2
>> ID=CDS37A10411.6;Parent=TRANSCRIPT1
>> 15 . CDS 74708916 74709055 . - 2
>> ID=CDS37A10411.7;Parent=TRANSCRIPT1
>> 15 . CDS 74708142 74708326 . - 0
>> ID=CDS37A10411.8;Parent=TRANSCRIPT1
>> 15 . CDS 74707179 74707287 . - 1
>> ID=CDS37A10411.9;Parent=TRANSCRIPT1
>> 15 . CDS 74706888 74707086 . - 0
>> ID=CDS37A10411.10;Parent=TRANSCRIPT1
>> 15 . CDS 74704226 74704353 . - 2
>> ID=CDS37A10411.11;Parent=TRANSCRIPT1
>> 15 . CDS 74703897 74704051 . - 0
>> ID=CDS37A10411.12;Parent=TRANSCRIPT1
>> 15 . CDS 74703636 74703697 . - 1
>> ID=CDS37A10411.13;Parent=TRANSCRIPT1
>> 15 . CDS 74702965 74703326 . - 2
>> ID=CDS37A10411.14;Parent=TRANSCRIPT1
>>
>>
>> GTF file:
>> TRANSCRIPT1 15 - 74701624 74726300 74702964
>> 74726259 14
>> 74701624,74703635,74703896,74704225,74706887,74707178,74708141,74708915,74709675,74709931,74710217,74710608,74711141,74726081,
>> 74703326,74703697,74704051,74704353,74707086,74707287,74708326,74709055,74709786,74710016,74710310,74710650,74711293,74726300,
>> 0 HGNC:10741 cmplcmpl 1,2,0,1,0,2,0,1,1,0,0,0,1,0,
>> TRANSCRIPT2 17 - 8108048 8113944 8108188 8113542 9
>> 8108048,8108533,8109808,8110067,8110493,8110888,8111055,8113494,8113847,
>> 8108362,8108708,8109957,8110206,8110685,8110943,8111158,8113567,8113944,
>> 0 HGNC:11390 cmpl cmpl0,2,0,2,2,1,0,0,-1,
>> TRANSCRIPT3 11 - 73711325 73720282 73712456
>> 73718087 7
>> 73711325,73714871,73715528,73716774,73717213,73717961,73720022,
>> 73712571,73715052,73715630,73716978,73717424,73718182,73720282, 0
>> HGNC:12519 cmplcmpl 2,1,1,1,0,0,-1,
>> TRANSCRIPT4 11 - 123594634 123612391 123596704
>> 123601596 9
>> 123594634,123598183,123598840,123599833,123600322,123601194,123610824,123611124,123612256,
>> 123597699,123598303,123598970,123599922,123600533,123601693,123610900,123611244,123612391,
>> 0 HGNC:12994 cmpl cmpl 1,1,0,1,0,0,-1,-1,-1,
>> TRANSCRIPT5 19 - 51994480 52005043 51994894
>> 52004987 8
>> 51994480,52000133,52000602,52001271,52002423,52002691,52003173,52004560,
>> 51995083,52000230,52000699,52001541,52002471,52002970,52003554,52005043,
>> 0 HGNC:15482 cmpl cmpl 0,2,1,1,1,1,1,0,
>> TRANSCRIPT6 1 + 228194722 228248972 228194829
>> 228247166 4 228194722,228210367,228238356,228246686,
>> 228194900,228210609,228238622,228248972, 0 HGNC:15983 cmpl
>> cmpl 0,2,1,0,
>>
>>
>>
>>
>>
>>
>>
>> Thank you for your help,
>> Shawn
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list