[ensembl-dev] Annotating variants against custom transcripts

Helen Schuilenburg helens at ebi.ac.uk
Mon Jan 22 13:04:28 GMT 2018


Hi Shawn

You are correct there is a problem reading the GFF.

When using GFF with VEP, transcripts require a Sequence Ontology biotype 
to be defined in order to be parsed by VEP.

The simplest way to define this is using an attribute named "biotype" on 
the transcript entity e.g. biotype=protein_coding

There is documentation of using GFF with VEP

https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff"

Helen


On 22/01/2018 11:30, Shawn Yost wrote:
> Hi,
> I was able to install the latest version of VEP. I am not getting the
> following error when running VEP:
>
> WARNING: Unable to determine biotype of CART37A24648
>
> The command I used was:
> vep -i testing_annout.txt.hg19_multianno.vcf --gff blah.gff.gz --cache
> -dir vep/ --hgvs --cache_version 75 --offline --force_overwrite
> --fasta human_g1k_v37.fasta -o tmp
>
> The gff v3 file "blah.gff.gz" looks like this (zcat blah.gff.gz):
> 2       .       exon    73612886        73613320        .       +
>   .       ID=EXON37A24648.1;Parent=CART37A24648
> 2       .       transcript      73612886        73837046        .
>   +       .       ID=CART37A24648;hgnc_id=428;gene_symbol=ALMS1
> 2       .       CDS     73612997        73613320        .       +
>   0       ID=CDS37A24648.1;Parent=CART37A24648
> 2       .       CDS     73635750        73635875        .       +
>   0       ID=CDS37A24648.2;Parent=CART37A24648
> 2       .       exon    73635750        73635875        .       +
>   .       ID=EXON37A24648.2;Parent=CART37A24648
> 2       .       CDS     73646251        73646446        .       +
>   0       ID=CDS37A24648.3;Parent=CART37A24648
> 2       .       exon    73646251        73646446        .       +
>   .       ID=EXON37A24648.3;Parent=CART37A24648
> 2       .       CDS     73649985        73650102        .       +
>   2       ID=CDS37A24648.4;Parent=CART37A24648
> 2       .       exon    73649985        73650102        .       +
>   .       ID=EXON37A24648.4;Parent=CART37A24648
> 2       .       CDS     73651558        73652030        .       +
>   1       ID=CDS37A24648.5;Parent=CART37A24648
> 2       .       exon    73651558        73652030        .       +
>   .       ID=EXON37A24648.5;Parent=CART37A24648
> 2       .       CDS     73653581        73653681        .       +
>   2       ID=CDS37A24648.6;Parent=CART37A24648
> 2       .       exon    73653581        73653681        .       +
>   .       ID=EXON37A24648.6;Parent=CART37A24648
> 2       .       CDS     73659326        73659419        .       +
>   0       ID=CDS37A24648.7;Parent=CART37A24648
> 2       .       exon    73659326        73659419        .       +
>   .       ID=EXON37A24648.7;Parent=CART37A24648
> 2       .       CDS     73675090        73681194        .       +
>   2       ID=CDS37A24648.8;Parent=CART37A24648
> 2       .       exon    73675090        73681194        .       +
>   .       ID=EXON37A24648.8;Parent=CART37A24648
> 2       .       CDS     73682289        73682422        .       +
>   2       ID=CDS37A24648.9;Parent=CART37A24648
> 2       .       exon    73682289        73682422        .       +
>   .       ID=EXON37A24648.9;Parent=CART37A24648
> 2       .       CDS     73716761        73718625        .       +
>   0       ID=CDS37A24648.10;Parent=CART37A24648
> 2       .       exon    73716761        73718625        .       +
>   .       ID=EXON37A24648.10;Parent=CART37A24648
> 2       .       CDS     73746902        73747143        .       +
>   1       ID=CDS37A24648.11;Parent=CART37A24648
> 2       .       exon    73746902        73747143        .       +
>   .       ID=EXON37A24648.11;Parent=CART37A24648
> 2       .       CDS     73761951        73762076        .       +
>   2       ID=CDS37A24648.12;Parent=CART37A24648
> 2       .       exon    73761951        73762076        .       +
>   .       ID=EXON37A24648.12;Parent=CART37A24648
> 2       .       CDS     73777394        73777564        .       +
>   2       ID=CDS37A24648.13;Parent=CART37A24648
> 2       .       exon    73777394        73777564        .       +
>   .       ID=EXON37A24648.13;Parent=CART37A24648
> 2       .       CDS     73784347        73784481        .       +
>   2       ID=CDS37A24648.14;Parent=CART37A24648
> 2       .       exon    73784347        73784481        .       +
>   .       ID=EXON37A24648.14;Parent=CART37A24648
> 2       .       CDS     73786099        73786269        .       +
>   2       ID=CDS37A24648.15;Parent=CART37A24648
> 2       .       exon    73786099        73786269        .       +
>   .       ID=EXON37A24648.15;Parent=CART37A24648
> 2       .       CDS     73799389        73800551        .       +
>   2       ID=CDS37A24648.16;Parent=CART37A24648
> 2       .       exon    73799389        73800551        .       +
>   .       ID=EXON37A24648.16;Parent=CART37A24648
> 2       .       CDS     73826528        73826648        .       +
>   0       ID=CDS37A24648.17;Parent=CART37A24648
> 2       .       exon    73826528        73826648        .       +
>   .       ID=EXON37A24648.17;Parent=CART37A24648
> 2       .       CDS     73827805        73828008        .       +
>   2       ID=CDS37A24648.18;Parent=CART37A24648
> 2       .       exon    73827805        73828008        .       +
>   .       ID=EXON37A24648.18;Parent=CART37A24648
> 2       .       CDS     73828322        73828563        .       +
>   2       ID=CDS37A24648.19;Parent=CART37A24648
> 2       .       exon    73828322        73828563        .       +
>   .       ID=EXON37A24648.19;Parent=CART37A24648
> 2       .       CDS     73829312        73829495        .       +
>   0       ID=CDS37A24648.20;Parent=CART37A24648
> 2       .       exon    73829312        73829495        .       +
>   .       ID=EXON37A24648.20;Parent=CART37A24648
> 2       .       CDS     73830368        73830431        .       +
>   2       ID=CDS37A24648.21;Parent=CART37A24648
> 2       .       exon    73830368        73830431        .       +
>   .       ID=EXON37A24648.21;Parent=CART37A24648
> 2       .       CDS     73835602        73835701        .       +
>   1       ID=CDS37A24648.22;Parent=CART37A24648
> 2       .       exon    73835602        73835701        .       +
>   .       ID=EXON37A24648.22;Parent=CART37A24648
> 2       .       CDS     73836695        73836739        .       +
>   0       ID=CDS37A24648.23;Parent=CART37A24648
> 2       .       exon    73836695        73837046        .       +
>   .       ID=EXON37A24648.23;Parent=CART37A24648
>
>
>
> Is there a problem with the gff file I am inputting?
>
> Thanks,
>    Shawn
>
>
> On Thu, Jan 18, 2018 at 3:11 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>> Hi Shawn,
>>
>> the Bio::DB::HTS module should have been installed by INSTALL.pl. Did you
>> set the DYLD_LIBRARY_PATH environment variable as asked during the
>> installation?
>>
>> You can also install Bio::DB::HTS manually:
>> https://github.com/Ensembl/Bio-DB-HTS.
>>
>> Anja
>>
>> On 18 Jan 2018, at 14:46, Shawn Yost <yostshawn at gmail.com> wrote:
>>
>> Hi,
>> I've installed the latest version of vep and I'm not getting a new
>> error. This error occurred also during the install/test step of vep:
>>
>> -------------------- EXCEPTION --------------------
>> MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module
>> installed
>>
>> STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
>> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
>> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
>> STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
>> STACK Bio::EnsEMBL::VEP::Runner::init
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
>> STACK Bio::EnsEMBL::VEP::Runner::next_output_line
>> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:356
>> STACK toplevel ./t/Runner.t:739
>> Date (localtime)    = Thu Jan 18 14:26:33 2018
>> Ensembl API version = 91
>>
>>
>>
>> I'm using a conda env to install/run VEP.  Inside of the
>> 'ensembl-vep/' directory Tabix does exist:
>>
>> ls -la Bio/DB/HTS/Tabix*
>>
>> -rw-rw---- 1 syost cancgene 6721 Jan 18 14:24 Bio/DB/HTS/Tabix.pm
>>
>> Bio/DB/HTS/Tabix:
>> total 12
>> drwxrwx--- 2 syost cancgene 4096 Jan 18 14:24 .
>> drwxrwx--- 5 syost cancgene 4096 Jan 18 14:24 ..
>> -rw-rw---- 1 syost cancgene 2974 Jan 18 14:24 Iterator.pm
>>
>>
>> I've also installed Tabix separately and it works.  Do you have any
>> suggestions?  Why isn't it recognizing the Tabix.pm?
>>
>>
>> Thanks,
>>   Shawn
>>
>>
>>
>>
>> On Tue, Jan 16, 2018 at 2:21 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>>
>> Hi Shawn,
>>
>> I noticed that you are not using the supported VEP code. You can install the
>> new code by following the instructions here:
>> http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html
>>
>> The new VEP code supports annotations against a GFF or GTF file:
>> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#other
>>
>> Anja
>>
>>
>> On 16 Jan 2018, at 14:06, Shawn Yost <yostshawn at gmail.com> wrote:
>>
>> Hi,
>> I would like to annotate my VCF file against a custom transcript database.
>> I've created both a GFF v3 file and a GTF file (see below) and I have been
>> unsuccessful in getting VEP to annotate against these transcripts. The
>> examples below are before using sort + bgzip + tabix (so that is not the
>> problem).
>>
>> I'm currently running VEP v85.
>>
>> The command I used was:
>> variant_effect_predictor.pl -i IN.vcf --custom test.gff.gz,,gff -fasta
>> test.fa --cache -o OUT -dir /path/to/cache --hgvs --cache_version 75
>> --offline --force_overwrite
>>
>> In the outputted file I only see ENSTs and can't find the transcripts I
>> inputted along with them.  The same thing occurs if I run --custom
>> test.gtf.gz,,gtf.  If I change the command to --custom
>> test.gtf.gz,,gtf,overlap it will tell me if it overlaps the inputted
>> transcript but it doesn't annotate against the transcript.
>>
>>
>> Is there a problem with the command options I am using? Is there a problem
>> with the inputted files?  How do I get VEP to annotate the variant against
>> my custom transcript (i.e. 412662  2:73613056      A       GENE1
>> TRANSCRIPT1 Transcript      synonymous_variant ....)?
>>
>>
>>
>> Example output:
>> 241004  2:73613032-73613049     -       ENSG00000116127 ENST00000264448
>> Transcript      inframe_deletion        147-164 36-53   12-18
>> LEEEEEE/L       ctGGAGGAGGAGGAGGAGGAg/ctg       -
>> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000264448.6:c.36_53delNNNNNNNNNNNNNNNNNN;HGVSp=ENSP00000264448.6:p.Glu23_Glu28del
>> 412662  2:73613056      A       ENSG00000116127 ENST00000377715 Transcript
>> synonymous_variant      171     60      20      E  gaG/gaA  -
>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000377715.1:c.60N>A;HGVSp=ENST00000377715.1:c.60N>A(p.%3D)
>> 412662  2:73613056      A       ENSG00000116127 ENST00000409009 Transcript
>> synonymous_variant      171     60      20      E  gaG/gaA  -
>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000409009.1:c.60N>A;HGVSp=ENST00000409009.1:c.60N>A(p.%3D)
>> 412662  2:73613056      A       ENSG00000116127 ENST00000264448 Transcript
>> synonymous_variant      171     60      20      E  gaG/gaA  -
>> IMPACT=LOW;STRAND=1;HGVSc=ENST00000264448.6:c.60N>A;HGVSp=ENST00000264448.6:c.60N>A(p.%3D)
>> 402364  2:73613066-73613071     -       ENSG00000116127 ENST00000377715
>> Transcript      inframe_deletion        181-186 70-75   24-25       EE/-
>> GAGGAA/-        -
>> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000377715.1:c.70_75delNNNNNN;HGVSp=ENSP00000366944.1:p.Glu27_Glu28del
>>
>>
>>
>> GFF v3 file:
>> 15      .       transcript      74701625        74726300        .       -
>> .       ID=TRANSCRIPT1;Alias=10741;Name=SEMA7A
>> 15      .       exon    74726082        74726300        .       -       .
>> ID=EXON37A10411.1;Parent=TRANSCRIPT1
>> 15      .       exon    74711142        74711293        .       -       .
>> ID=EXON37A10411.2;Parent=TRANSCRIPT1
>> 15      .       exon    74710609        74710650        .       -       .
>> ID=EXON37A10411.3;Parent=TRANSCRIPT1
>> 15      .       exon    74710218        74710310        .       -       .
>> ID=EXON37A10411.4;Parent=TRANSCRIPT1
>> 15      .       exon    74709932        74710016        .       -       .
>> ID=EXON37A10411.5;Parent=TRANSCRIPT1
>> 15      .       exon    74709676        74709786        .       -       .
>> ID=EXON37A10411.6;Parent=TRANSCRIPT1
>> 15      .       exon    74708916        74709055        .       -       .
>> ID=EXON37A10411.7;Parent=TRANSCRIPT1
>> 15      .       exon    74708142        74708326        .       -       .
>> ID=EXON37A10411.8;Parent=TRANSCRIPT1
>> 15      .       exon    74707179        74707287        .       -       .
>> ID=EXON37A10411.9;Parent=TRANSCRIPT1
>> 15      .       exon    74706888        74707086        .       -       .
>> ID=EXON37A10411.10;Parent=TRANSCRIPT1
>> 15      .       exon    74704226        74704353        .       -       .
>> ID=EXON37A10411.11;Parent=TRANSCRIPT1
>> 15      .       exon    74703897        74704051        .       -       .
>> ID=EXON37A10411.12;Parent=TRANSCRIPT1
>> 15      .       exon    74703636        74703697        .       -       .
>> ID=EXON37A10411.13;Parent=TRANSCRIPT1
>> 15      .       exon    74701625        74703326        .       -       .
>> ID=EXON37A10411.14;Parent=TRANSCRIPT1
>> 15      .       CDS     74726082        74726259        .       -       0
>> ID=CDS37A10411.1;Parent=TRANSCRIPT1
>> 15      .       CDS     74711142        74711293        .       -       2
>> ID=CDS37A10411.2;Parent=TRANSCRIPT1
>> 15      .       CDS     74710609        74710650        .       -       0
>> ID=CDS37A10411.3;Parent=TRANSCRIPT1
>> 15      .       CDS     74710218        74710310        .       -       0
>> ID=CDS37A10411.4;Parent=TRANSCRIPT1
>> 15      .       CDS     74709932        74710016        .       -       0
>> ID=CDS37A10411.5;Parent=TRANSCRIPT1
>> 15      .       CDS     74709676        74709786        .       -       2
>> ID=CDS37A10411.6;Parent=TRANSCRIPT1
>> 15      .       CDS     74708916        74709055        .       -       2
>> ID=CDS37A10411.7;Parent=TRANSCRIPT1
>> 15      .       CDS     74708142        74708326        .       -       0
>> ID=CDS37A10411.8;Parent=TRANSCRIPT1
>> 15      .       CDS     74707179        74707287        .       -       1
>> ID=CDS37A10411.9;Parent=TRANSCRIPT1
>> 15      .       CDS     74706888        74707086        .       -       0
>> ID=CDS37A10411.10;Parent=TRANSCRIPT1
>> 15      .       CDS     74704226        74704353        .       -       2
>> ID=CDS37A10411.11;Parent=TRANSCRIPT1
>> 15      .       CDS     74703897        74704051        .       -       0
>> ID=CDS37A10411.12;Parent=TRANSCRIPT1
>> 15      .       CDS     74703636        74703697        .       -       1
>> ID=CDS37A10411.13;Parent=TRANSCRIPT1
>> 15      .       CDS     74702965        74703326        .       -       2
>> ID=CDS37A10411.14;Parent=TRANSCRIPT1
>>
>>
>> GTF file:
>> TRANSCRIPT1    15   -       74701624        74726300        74702964
>> 74726259        14
>> 74701624,74703635,74703896,74704225,74706887,74707178,74708141,74708915,74709675,74709931,74710217,74710608,74711141,74726081,
>> 74703326,74703697,74704051,74704353,74707086,74707287,74708326,74709055,74709786,74710016,74710310,74710650,74711293,74726300,
>> 0       HGNC:10741      cmplcmpl    1,2,0,1,0,2,0,1,1,0,0,0,1,0,
>> TRANSCRIPT2    17   -       8108048 8113944 8108188 8113542 9
>> 8108048,8108533,8109808,8110067,8110493,8110888,8111055,8113494,8113847,
>> 8108362,8108708,8109957,8110206,8110685,8110943,8111158,8113567,8113944,
>> 0       HGNC:11390      cmpl    cmpl0,2,0,2,2,1,0,0,-1,
>> TRANSCRIPT3    11   -       73711325        73720282        73712456
>> 73718087        7
>> 73711325,73714871,73715528,73716774,73717213,73717961,73720022,
>> 73712571,73715052,73715630,73716978,73717424,73718182,73720282, 0
>> HGNC:12519      cmplcmpl    2,1,1,1,0,0,-1,
>> TRANSCRIPT4    11   -       123594634       123612391       123596704
>> 123601596       9
>> 123594634,123598183,123598840,123599833,123600322,123601194,123610824,123611124,123612256,
>> 123597699,123598303,123598970,123599922,123600533,123601693,123610900,123611244,123612391,
>> 0       HGNC:12994      cmpl    cmpl    1,1,0,1,0,0,-1,-1,-1,
>> TRANSCRIPT5    19   -       51994480        52005043        51994894
>> 52004987        8
>> 51994480,52000133,52000602,52001271,52002423,52002691,52003173,52004560,
>> 51995083,52000230,52000699,52001541,52002471,52002970,52003554,52005043,
>> 0  HGNC:15482       cmpl    cmpl    0,2,1,1,1,1,1,0,
>> TRANSCRIPT6    1    +       228194722       228248972       228194829
>> 228247166       4       228194722,228210367,228238356,228246686,
>> 228194900,228210609,228238622,228248972,        0       HGNC:15983      cmpl
>> cmpl    0,2,1,0,
>>
>>
>>
>>
>>
>>
>>
>> Thank you for your help,
>>   Shawn
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list