[ensembl-dev] Fwd: Custom annotation
Laurent Gil
lgil at ebi.ac.uk
Tue Feb 27 11:01:26 GMT 2018
Dear Lucas,
It looks like your GFF file is missing some entries with the following
IDs (these IDs are also present in the attribute "Parent" of other child
entries in your GFF file):
PF3D7_0108400.1,PF3D7_0108400.2, PF3D7_0105400.1,PF3D7_0105400.2,
PF3D7_0205700.2,PF3D7_0205700.1, PF3D7_0216700.1,PF3D7_0216700.2,
PF3D7_0210100.1,PF3D7_0210100.2, PF3D7_0208700.1,PF3D7_0208700.2,
PF3D7_0219400.2,PF3D7_0219400.1, PF3D7_0206900.1,PF3D7_0206900.2,
PF3D7_0202600.1,PF3D7_0202600.2
or these entries have a "type" (gene, CDS, mRNA, exon, etc ..) not
supported by the VEP.
Here is the list of "type" supported by the VEP tool:
http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gfftypes
(click on the link "[Show supported types]").
Best regards,
Laurent
Ensembl Variation
On 26/02/2018 12:18, Lucas Michel wrote:
>
>
> I am working on the annotation of some variants on Plasmodium
> Falciparum with VEP. Since there is no reference for it in the
> ensemble page (or I haven't found it) I am using a custom GFF and
> fasta file but i can not get it to work properly.
>
> VEP version:
> Versions:
> ensembl : 91.18ee742
> ensembl-funcgen : 91.4681d69
> ensembl-io : 91.923d668
> ensembl-variation : 91.c78d8b4
> ensembl-vep : 91.3
>
>
>
> Example fragments:
>
> VCF file:
>
> CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> 1.2B_sample 10G_sample A7K9_sample C2_sample E5K9_sample
> Pf3D7_01_v3 12 . A C 371.78 .
> AC=5;AF=0.500;AN=10;BaseQRankSum=-2.571;DP=166;Dels=0.00;ExcessHet=11.9728;FS=0.000;HaplotypeScore=13.0379;MLEAC=5;MLEAF=0.500;MQ=12.68;MQ0=0;MQRankSum=-5.041;QD=2.24;ReadPosRankSum=0.232;SOR=2.263
> GT:AD:DP:GQ:PL 0/1:19,5:24:16:16,0,280 0/1:16,11:27:90:90,0,187
> 0/1:20,10:30:63:63,0,248 0/1:39,18:57:99:137,0,486
> 0/1:16,12:28:99:100,0,193
> Pf3D7_01_v3 30 . A G 39.92 .
> AC=3;AF=0.300;AN=10;BaseQRankSum=-7.495;DP=591;Dels=0.00;ExcessHet=4.7712;FS=0.000;HaplotypeScore=76.2682;MLEAC=3;MLEAF=0.300;MQ=12.22;MQ0=0;MQRankSum=-4.844;QD=0.13;ReadPosRankSum=-1.706;SOR=0.405
> GT:AD:DP:GQ:PL 0/1:56,13:69:25:25,0,749 0/0:74,14:91:7:0,7,948
> 0/1:89,24:114:33:33,0,1035 0/0:167,27:197:94:0,94,2087
> 0/1:93,24:119:18:18,0,1089
> Pf3D7_01_v3 37 . G A 10.87 .
> AC=2;AF=0.200;AN=10;BaseQRankSum=7.016;DP=745;Dels=0.03;ExcessHet=3.5218;FS=0.000;HaplotypeScore=95.8754;MLEAC=2;MLEAF=0.200;MQ=12.26;MQ0=0;MQRankSum=-4.419;QD=0.03;ReadPosRankSum=1.936;SOR=0.962
> GT:AD:DP:GQ:PL 0/0:76,10:87:72:0,72,1134
> 0/0:102,8:111:99:0,151,1577 0/0:110,24:138:89:0,89,1381
> 0/1:191,50:243:39:39,0,2375 0/1:118,25:146:5:5,0,1533
> Pf3D7_01_v3 58 . A G 162.69 .
> AC=1;AF=0.100;AN=10;BaseQRankSum=-7.121;DP=928;Dels=0.01;ExcessHet=3.0103;FS=0.000;HaplotypeScore=147.7849;MLEAC=1;MLEAF=0.100;MQ=12.57;MQ0=0;MQRankSum=-5.615;QD=0.68;ReadPosRankSum=0.938;SOR=0.445
> GT:AD:DP:GQ:PL 0/0:102,11:114:99:0,115,1407
> 0/0:141,17:159:99:0,160,1896 0/0:162,32:198:66:0,66,1942
> 0/1:185,56:247:99:195,0,1990 0/0:170,30:201:51:0,51,2099
> Pf3D7_01_v3 72 . G A 447.03 .
> AC=3;AF=0.300;AN=10;BaseQRankSum=7.927;DP=995;Dels=0.03;ExcessHet=4.7712;FS=0.000;HaplotypeScore=152.4765;MLEAC=3;MLEAF=0.300;MQ=12.70;MQ0=0;MQRankSum=-5.905;QD=0.67;ReadPosRankSum=4.279;SOR=1.157
> GT:AD:DP:GQ:PL 0/0:112,15:129:99:0,121,1642
> 0/0:144,21:168:99:0,155,2064 0/1:168,42:210:17:17,0,2079
> 0/1:186,53:243:99:107,0,2331 0/1:158,60:219:99:360,0,1938
>
> GFF file:
>
> Pf3D7_01_v3 EuPathDB gene 29510 37126 . + .
> ID=PF3D7_0100100;description=erythrocyte membrane protein 1%2C
> PfEMP1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB exon 29510 34762 . + .
> ID=exon_PF3D7_0100100-E1;Parent=PF3D7_0100100.1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB CDS 29510 34762 . + 0
> ID=PF3D7_0100100.1-p1-CDS1;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_0100100.1-p1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB mRNA 29510 37126 . + .
> ID=PF3D7_0100100.1;Parent=PF3D7_0100100;Ontology_term=GO:0050839,GO:0004872,GO:0020002,GO:0020030,GO:0016021,GO:0020033,GO:0020035,GO:0020013,GO:0009405,GO:0016337;description=erythrocyte
> membrane protein 1%2C PfEMP1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB CDS 35888 37126 . + 0
> ID=PF3D7_0100100.1-p1-CDS2;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_0100100.1-p1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB exon 35888 37126 . + .
> ID=exon_PF3D7_0100100-E2;Parent=PF3D7_0100100.1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB gene 38982 40207 . - .
> ID=PF3D7_0100200;description=rifin;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB mRNA 38982 40207 . - .
> ID=PF3D7_0100200.1;Parent=PF3D7_0100200;Ontology_term=GO:0020036,GO:0020002,GO:0020003,GO:0020033,GO:0020035,GO:0020013;description=rifin;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB exon 38982 39923 . - .
> ID=exon_PF3D7_0100200-E2;Parent=PF3D7_0100200.1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB CDS 38982 39923 . - 1
> ID=PF3D7_0100200.1-p1-CDS2;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_0100200.1-p1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB exon 40154 40207 . - .
> ID=exon_PF3D7_0100200-E1;Parent=PF3D7_0100200.1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB CDS 40154 40207 . - 0
> ID=PF3D7_0100200.1-p1-CDS1;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_0100200.1-p1;biotype=protein_coding
> Pf3D7_01_v3 EuPathDB mRNA 42367 46507 . - .
> ID=PF3D7_0100300.1;Parent=PF3D7_0100300;Ontology_term=GO:0004872,GO:0016021,GO:0009405;description=erythrocyte
> membrane protein 1%2C PfEMP1;biotype=protein_coding
>
> FASTA file:
>
> >Pf3D7_01_v3 | organism=Plasmodium_falciparum_3D7 | version=2015-06-18
> | length=640851 | SO=chromosome
> TGAACCCTAAAACCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAACCCTAAA
> CCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAAACCTAAACCCT
> GAACCCTAAACCCTGAACCCTGAACCCTAACCCTAAACCCTAAACCTAAAACCCTGAACC
> CTAAACCCTGAACCCTGAACCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAA
> ACCCTGAACCCTAAACCCTAAACCCTGAACCCTGAACCCTAAA...
>
> --
>
> I use VEP with the following command:
>
> vep -i All_SNP.vcf -gff custom.gff.gz -fasta ref.fasta.gz -o
> anofile.csv --force_overwrite
>
> And I consisteltly get the following error:
>
> WARNING: Parent entries with the following IDs were not found or
> skipped due to invalid types: PF3D7_0108400.1,PF3D7_0108400.2,
> PF3D7_0105400.1,PF3D7_0105400.2
> WARNING: Parent entries with the following IDs were not found or
> skipped due to invalid types: PF3D7_0205700.2,PF3D7_0205700.1,
> PF3D7_0216700.1,PF3D7_0216700.2, PF3D7_0210100.1,PF3D7_0210100.2,
> PF3D7_0208700.1,PF3D7_0208700.2, PF3D7_0219400.2,PF3D7_0219400.1,
> PF3D7_0206900.1,PF3D7_0206900.2, PF3D7_0202600.1,PF3D7_0202600.2
> Can't use an undefined value as an ARRAY reference at
> /home/lucas/Programs/ensembl-vep/Bio/EnsEMBL/Transcript.pm line 1617,
> <__ANONIO__> line 9772.
>
> I added the biotype argument manually on the gff since i thought the
> problem might lie thre but I have had no success.
>
> Any help would be much appreciated.
>
> Thank you very much,
>
> Lucas
>
>
>
> Lucas Michel Todó
>
> Bioinformatics Researcher
>
> **
>
> **
>
> *IS**Global*
>
> Carrer Roselló 149, 1ª Planta
> Barcelona 08036
>
> T.: +34 93 2275400 ext. 4080 <tel:+34%20932%2027%2054%2000>
>
> E.: lucas.michel at isglobal.org <mailto:lucas.michel at isglobal.org>
>
>
>
> --
>
> Lucas Michel Todó
>
> Bioinformatics Researcher
>
> **
>
> **
>
> *IS**Global*
>
> Carrer Roselló 149, 1ª Planta
> Barcelona 08036
>
> T.: +34 93 2275400 ext. 4080 <tel:+34%20932%2027%2054%2000>
>
> E.: lucas.michel at isglobal.org <mailto:lucas.michel at isglobal.org>
>
> This message is intended exclusively for its addressee and may contain
> information that is CONFIDENTIAL and protected by professional
> privilege. If you are not the intended recipient you are hereby
> notified that any dissemination, copy or disclosure of this
> communication is strictly prohibited by law. If this message has been
> received in error, please immediately notify us via e-mail and delete it.
>
> DATA PROTECTION. We inform you that your personal data, including your
> e-mail address and data included in your email correspondence, are
> included in the ISGlobal Foundation files. Your personal data will be
> used for the purpose of contacting you and sending information on the
> activities of the above foundations. You can exercise your rights of
> access, rectification, cancellation and opposition by contacting the
> following address: lopd at isglobal.org <mailto:lopd at isglobal.org>.
> ISGlobal Privacy Policy at www.isglobal.org <http://www.isglobal.org/>.
>
> -----------------------------------------------------------------------------------------------------------------------------
>
> CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente
> a su destinatario y puede contener información confidencial, por lo
> que la utilización, divulgación y/o copia sin autorización está
> prohibida por la legislación vigente. Si ha recibido este mensaje por
> error, le rogamos lo comunique inmediatamente por esta misma vía y
> proceda a su destrucción.
>
> PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este
> envío, incluida su dirección de e-mail, forman parte de ficheros de
> titularidad de la Fundación ISGlobal para cualquier finalidades de
> contacto, relación institucional y/o envío de información sobre sus
> actividades. Los datos que usted nos pueda facilitar contestando este
> correo quedarán incorporados en los correspondientes ficheros,
> autorizando el uso de su dirección de e-mail para las finalidades
> citadas. Puede ejercer los derechos de acceso, rectificación,
> cancelación y oposición dirigiéndose alopd at isglobal.org
> <mailto:lopd at isglobal.org>__. Política de privacidad
> enwww.isglobal.org <http://www.isglobal.org/>.
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180227/d5370f8b/attachment.html>
More information about the Dev
mailing list