[ensembl-dev] Annotating variants against custom transcripts

Shawn Yost yostshawn at gmail.com
Mon Jan 22 11:30:42 GMT 2018


Hi,
I was able to install the latest version of VEP. I am not getting the
following error when running VEP:

WARNING: Unable to determine biotype of CART37A24648

The command I used was:
vep -i testing_annout.txt.hg19_multianno.vcf --gff blah.gff.gz --cache
-dir vep/ --hgvs --cache_version 75 --offline --force_overwrite
--fasta human_g1k_v37.fasta -o tmp

The gff v3 file "blah.gff.gz" looks like this (zcat blah.gff.gz):
2       .       exon    73612886        73613320        .       +
 .       ID=EXON37A24648.1;Parent=CART37A24648
2       .       transcript      73612886        73837046        .
 +       .       ID=CART37A24648;hgnc_id=428;gene_symbol=ALMS1
2       .       CDS     73612997        73613320        .       +
 0       ID=CDS37A24648.1;Parent=CART37A24648
2       .       CDS     73635750        73635875        .       +
 0       ID=CDS37A24648.2;Parent=CART37A24648
2       .       exon    73635750        73635875        .       +
 .       ID=EXON37A24648.2;Parent=CART37A24648
2       .       CDS     73646251        73646446        .       +
 0       ID=CDS37A24648.3;Parent=CART37A24648
2       .       exon    73646251        73646446        .       +
 .       ID=EXON37A24648.3;Parent=CART37A24648
2       .       CDS     73649985        73650102        .       +
 2       ID=CDS37A24648.4;Parent=CART37A24648
2       .       exon    73649985        73650102        .       +
 .       ID=EXON37A24648.4;Parent=CART37A24648
2       .       CDS     73651558        73652030        .       +
 1       ID=CDS37A24648.5;Parent=CART37A24648
2       .       exon    73651558        73652030        .       +
 .       ID=EXON37A24648.5;Parent=CART37A24648
2       .       CDS     73653581        73653681        .       +
 2       ID=CDS37A24648.6;Parent=CART37A24648
2       .       exon    73653581        73653681        .       +
 .       ID=EXON37A24648.6;Parent=CART37A24648
2       .       CDS     73659326        73659419        .       +
 0       ID=CDS37A24648.7;Parent=CART37A24648
2       .       exon    73659326        73659419        .       +
 .       ID=EXON37A24648.7;Parent=CART37A24648
2       .       CDS     73675090        73681194        .       +
 2       ID=CDS37A24648.8;Parent=CART37A24648
2       .       exon    73675090        73681194        .       +
 .       ID=EXON37A24648.8;Parent=CART37A24648
2       .       CDS     73682289        73682422        .       +
 2       ID=CDS37A24648.9;Parent=CART37A24648
2       .       exon    73682289        73682422        .       +
 .       ID=EXON37A24648.9;Parent=CART37A24648
2       .       CDS     73716761        73718625        .       +
 0       ID=CDS37A24648.10;Parent=CART37A24648
2       .       exon    73716761        73718625        .       +
 .       ID=EXON37A24648.10;Parent=CART37A24648
2       .       CDS     73746902        73747143        .       +
 1       ID=CDS37A24648.11;Parent=CART37A24648
2       .       exon    73746902        73747143        .       +
 .       ID=EXON37A24648.11;Parent=CART37A24648
2       .       CDS     73761951        73762076        .       +
 2       ID=CDS37A24648.12;Parent=CART37A24648
2       .       exon    73761951        73762076        .       +
 .       ID=EXON37A24648.12;Parent=CART37A24648
2       .       CDS     73777394        73777564        .       +
 2       ID=CDS37A24648.13;Parent=CART37A24648
2       .       exon    73777394        73777564        .       +
 .       ID=EXON37A24648.13;Parent=CART37A24648
2       .       CDS     73784347        73784481        .       +
 2       ID=CDS37A24648.14;Parent=CART37A24648
2       .       exon    73784347        73784481        .       +
 .       ID=EXON37A24648.14;Parent=CART37A24648
2       .       CDS     73786099        73786269        .       +
 2       ID=CDS37A24648.15;Parent=CART37A24648
2       .       exon    73786099        73786269        .       +
 .       ID=EXON37A24648.15;Parent=CART37A24648
2       .       CDS     73799389        73800551        .       +
 2       ID=CDS37A24648.16;Parent=CART37A24648
2       .       exon    73799389        73800551        .       +
 .       ID=EXON37A24648.16;Parent=CART37A24648
2       .       CDS     73826528        73826648        .       +
 0       ID=CDS37A24648.17;Parent=CART37A24648
2       .       exon    73826528        73826648        .       +
 .       ID=EXON37A24648.17;Parent=CART37A24648
2       .       CDS     73827805        73828008        .       +
 2       ID=CDS37A24648.18;Parent=CART37A24648
2       .       exon    73827805        73828008        .       +
 .       ID=EXON37A24648.18;Parent=CART37A24648
2       .       CDS     73828322        73828563        .       +
 2       ID=CDS37A24648.19;Parent=CART37A24648
2       .       exon    73828322        73828563        .       +
 .       ID=EXON37A24648.19;Parent=CART37A24648
2       .       CDS     73829312        73829495        .       +
 0       ID=CDS37A24648.20;Parent=CART37A24648
2       .       exon    73829312        73829495        .       +
 .       ID=EXON37A24648.20;Parent=CART37A24648
2       .       CDS     73830368        73830431        .       +
 2       ID=CDS37A24648.21;Parent=CART37A24648
2       .       exon    73830368        73830431        .       +
 .       ID=EXON37A24648.21;Parent=CART37A24648
2       .       CDS     73835602        73835701        .       +
 1       ID=CDS37A24648.22;Parent=CART37A24648
2       .       exon    73835602        73835701        .       +
 .       ID=EXON37A24648.22;Parent=CART37A24648
2       .       CDS     73836695        73836739        .       +
 0       ID=CDS37A24648.23;Parent=CART37A24648
2       .       exon    73836695        73837046        .       +
 .       ID=EXON37A24648.23;Parent=CART37A24648



Is there a problem with the gff file I am inputting?

Thanks,
  Shawn


On Thu, Jan 18, 2018 at 3:11 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
> Hi Shawn,
>
> the Bio::DB::HTS module should have been installed by INSTALL.pl. Did you
> set the DYLD_LIBRARY_PATH environment variable as asked during the
> installation?
>
> You can also install Bio::DB::HTS manually:
> https://github.com/Ensembl/Bio-DB-HTS.
>
> Anja
>
> On 18 Jan 2018, at 14:46, Shawn Yost <yostshawn at gmail.com> wrote:
>
> Hi,
> I've installed the latest version of vep and I'm not getting a new
> error. This error occurred also during the install/test step of vep:
>
> -------------------- EXCEPTION --------------------
> MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module
> installed
>
> STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new
> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom
> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
> STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all
> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
> STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources
> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
> STACK Bio::EnsEMBL::VEP::Runner::init
> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
> STACK Bio::EnsEMBL::VEP::Runner::next_output_line
> /mnt/scratch/DGE/GENSUSC/syost/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:356
> STACK toplevel ./t/Runner.t:739
> Date (localtime)    = Thu Jan 18 14:26:33 2018
> Ensembl API version = 91
>
>
>
> I'm using a conda env to install/run VEP.  Inside of the
> 'ensembl-vep/' directory Tabix does exist:
>
> ls -la Bio/DB/HTS/Tabix*
>
> -rw-rw---- 1 syost cancgene 6721 Jan 18 14:24 Bio/DB/HTS/Tabix.pm
>
> Bio/DB/HTS/Tabix:
> total 12
> drwxrwx--- 2 syost cancgene 4096 Jan 18 14:24 .
> drwxrwx--- 5 syost cancgene 4096 Jan 18 14:24 ..
> -rw-rw---- 1 syost cancgene 2974 Jan 18 14:24 Iterator.pm
>
>
> I've also installed Tabix separately and it works.  Do you have any
> suggestions?  Why isn't it recognizing the Tabix.pm?
>
>
> Thanks,
>  Shawn
>
>
>
>
> On Tue, Jan 16, 2018 at 2:21 PM, Anja Thormann <anja at ebi.ac.uk> wrote:
>
> Hi Shawn,
>
> I noticed that you are not using the supported VEP code. You can install the
> new code by following the instructions here:
> http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html
>
> The new VEP code supports annotations against a GFF or GTF file:
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#other
>
> Anja
>
>
> On 16 Jan 2018, at 14:06, Shawn Yost <yostshawn at gmail.com> wrote:
>
> Hi,
> I would like to annotate my VCF file against a custom transcript database.
> I've created both a GFF v3 file and a GTF file (see below) and I have been
> unsuccessful in getting VEP to annotate against these transcripts. The
> examples below are before using sort + bgzip + tabix (so that is not the
> problem).
>
> I'm currently running VEP v85.
>
> The command I used was:
> variant_effect_predictor.pl -i IN.vcf --custom test.gff.gz,,gff -fasta
> test.fa --cache -o OUT -dir /path/to/cache --hgvs --cache_version 75
> --offline --force_overwrite
>
> In the outputted file I only see ENSTs and can't find the transcripts I
> inputted along with them.  The same thing occurs if I run --custom
> test.gtf.gz,,gtf.  If I change the command to --custom
> test.gtf.gz,,gtf,overlap it will tell me if it overlaps the inputted
> transcript but it doesn't annotate against the transcript.
>
>
> Is there a problem with the command options I am using? Is there a problem
> with the inputted files?  How do I get VEP to annotate the variant against
> my custom transcript (i.e. 412662  2:73613056      A       GENE1
> TRANSCRIPT1 Transcript      synonymous_variant ....)?
>
>
>
> Example output:
> 241004  2:73613032-73613049     -       ENSG00000116127 ENST00000264448
> Transcript      inframe_deletion        147-164 36-53   12-18
> LEEEEEE/L       ctGGAGGAGGAGGAGGAGGAg/ctg       -
> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000264448.6:c.36_53delNNNNNNNNNNNNNNNNNN;HGVSp=ENSP00000264448.6:p.Glu23_Glu28del
> 412662  2:73613056      A       ENSG00000116127 ENST00000377715 Transcript
> synonymous_variant      171     60      20      E  gaG/gaA  -
> IMPACT=LOW;STRAND=1;HGVSc=ENST00000377715.1:c.60N>A;HGVSp=ENST00000377715.1:c.60N>A(p.%3D)
> 412662  2:73613056      A       ENSG00000116127 ENST00000409009 Transcript
> synonymous_variant      171     60      20      E  gaG/gaA  -
> IMPACT=LOW;STRAND=1;HGVSc=ENST00000409009.1:c.60N>A;HGVSp=ENST00000409009.1:c.60N>A(p.%3D)
> 412662  2:73613056      A       ENSG00000116127 ENST00000264448 Transcript
> synonymous_variant      171     60      20      E  gaG/gaA  -
> IMPACT=LOW;STRAND=1;HGVSc=ENST00000264448.6:c.60N>A;HGVSp=ENST00000264448.6:c.60N>A(p.%3D)
> 402364  2:73613066-73613071     -       ENSG00000116127 ENST00000377715
> Transcript      inframe_deletion        181-186 70-75   24-25       EE/-
> GAGGAA/-        -
> IMPACT=MODERATE;STRAND=1;HGVSc=ENST00000377715.1:c.70_75delNNNNNN;HGVSp=ENSP00000366944.1:p.Glu27_Glu28del
>
>
>
> GFF v3 file:
> 15      .       transcript      74701625        74726300        .       -
> .       ID=TRANSCRIPT1;Alias=10741;Name=SEMA7A
> 15      .       exon    74726082        74726300        .       -       .
> ID=EXON37A10411.1;Parent=TRANSCRIPT1
> 15      .       exon    74711142        74711293        .       -       .
> ID=EXON37A10411.2;Parent=TRANSCRIPT1
> 15      .       exon    74710609        74710650        .       -       .
> ID=EXON37A10411.3;Parent=TRANSCRIPT1
> 15      .       exon    74710218        74710310        .       -       .
> ID=EXON37A10411.4;Parent=TRANSCRIPT1
> 15      .       exon    74709932        74710016        .       -       .
> ID=EXON37A10411.5;Parent=TRANSCRIPT1
> 15      .       exon    74709676        74709786        .       -       .
> ID=EXON37A10411.6;Parent=TRANSCRIPT1
> 15      .       exon    74708916        74709055        .       -       .
> ID=EXON37A10411.7;Parent=TRANSCRIPT1
> 15      .       exon    74708142        74708326        .       -       .
> ID=EXON37A10411.8;Parent=TRANSCRIPT1
> 15      .       exon    74707179        74707287        .       -       .
> ID=EXON37A10411.9;Parent=TRANSCRIPT1
> 15      .       exon    74706888        74707086        .       -       .
> ID=EXON37A10411.10;Parent=TRANSCRIPT1
> 15      .       exon    74704226        74704353        .       -       .
> ID=EXON37A10411.11;Parent=TRANSCRIPT1
> 15      .       exon    74703897        74704051        .       -       .
> ID=EXON37A10411.12;Parent=TRANSCRIPT1
> 15      .       exon    74703636        74703697        .       -       .
> ID=EXON37A10411.13;Parent=TRANSCRIPT1
> 15      .       exon    74701625        74703326        .       -       .
> ID=EXON37A10411.14;Parent=TRANSCRIPT1
> 15      .       CDS     74726082        74726259        .       -       0
> ID=CDS37A10411.1;Parent=TRANSCRIPT1
> 15      .       CDS     74711142        74711293        .       -       2
> ID=CDS37A10411.2;Parent=TRANSCRIPT1
> 15      .       CDS     74710609        74710650        .       -       0
> ID=CDS37A10411.3;Parent=TRANSCRIPT1
> 15      .       CDS     74710218        74710310        .       -       0
> ID=CDS37A10411.4;Parent=TRANSCRIPT1
> 15      .       CDS     74709932        74710016        .       -       0
> ID=CDS37A10411.5;Parent=TRANSCRIPT1
> 15      .       CDS     74709676        74709786        .       -       2
> ID=CDS37A10411.6;Parent=TRANSCRIPT1
> 15      .       CDS     74708916        74709055        .       -       2
> ID=CDS37A10411.7;Parent=TRANSCRIPT1
> 15      .       CDS     74708142        74708326        .       -       0
> ID=CDS37A10411.8;Parent=TRANSCRIPT1
> 15      .       CDS     74707179        74707287        .       -       1
> ID=CDS37A10411.9;Parent=TRANSCRIPT1
> 15      .       CDS     74706888        74707086        .       -       0
> ID=CDS37A10411.10;Parent=TRANSCRIPT1
> 15      .       CDS     74704226        74704353        .       -       2
> ID=CDS37A10411.11;Parent=TRANSCRIPT1
> 15      .       CDS     74703897        74704051        .       -       0
> ID=CDS37A10411.12;Parent=TRANSCRIPT1
> 15      .       CDS     74703636        74703697        .       -       1
> ID=CDS37A10411.13;Parent=TRANSCRIPT1
> 15      .       CDS     74702965        74703326        .       -       2
> ID=CDS37A10411.14;Parent=TRANSCRIPT1
>
>
> GTF file:
> TRANSCRIPT1    15   -       74701624        74726300        74702964
> 74726259        14
> 74701624,74703635,74703896,74704225,74706887,74707178,74708141,74708915,74709675,74709931,74710217,74710608,74711141,74726081,
> 74703326,74703697,74704051,74704353,74707086,74707287,74708326,74709055,74709786,74710016,74710310,74710650,74711293,74726300,
> 0       HGNC:10741      cmplcmpl    1,2,0,1,0,2,0,1,1,0,0,0,1,0,
> TRANSCRIPT2    17   -       8108048 8113944 8108188 8113542 9
> 8108048,8108533,8109808,8110067,8110493,8110888,8111055,8113494,8113847,
> 8108362,8108708,8109957,8110206,8110685,8110943,8111158,8113567,8113944,
> 0       HGNC:11390      cmpl    cmpl0,2,0,2,2,1,0,0,-1,
> TRANSCRIPT3    11   -       73711325        73720282        73712456
> 73718087        7
> 73711325,73714871,73715528,73716774,73717213,73717961,73720022,
> 73712571,73715052,73715630,73716978,73717424,73718182,73720282, 0
> HGNC:12519      cmplcmpl    2,1,1,1,0,0,-1,
> TRANSCRIPT4    11   -       123594634       123612391       123596704
> 123601596       9
> 123594634,123598183,123598840,123599833,123600322,123601194,123610824,123611124,123612256,
> 123597699,123598303,123598970,123599922,123600533,123601693,123610900,123611244,123612391,
> 0       HGNC:12994      cmpl    cmpl    1,1,0,1,0,0,-1,-1,-1,
> TRANSCRIPT5    19   -       51994480        52005043        51994894
> 52004987        8
> 51994480,52000133,52000602,52001271,52002423,52002691,52003173,52004560,
> 51995083,52000230,52000699,52001541,52002471,52002970,52003554,52005043,
> 0  HGNC:15482       cmpl    cmpl    0,2,1,1,1,1,1,0,
> TRANSCRIPT6    1    +       228194722       228248972       228194829
> 228247166       4       228194722,228210367,228238356,228246686,
> 228194900,228210609,228238622,228248972,        0       HGNC:15983      cmpl
> cmpl    0,2,1,0,
>
>
>
>
>
>
>
> Thank you for your help,
>  Shawn
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>



More information about the Dev mailing list