[ensembl-dev] Fwd: Custom annotation
Lucas Michel
lucas.michel at isglobal.org
Mon Feb 26 12:18:34 GMT 2018
I am working on the annotation of some variants on Plasmodium Falciparum
with VEP. Since there is no reference for it in the ensemble page (or I
haven't found it) I am using a custom GFF and fasta file but i can not get
it to work properly.
VEP version:
Versions:
ensembl : 91.18ee742
ensembl-funcgen : 91.4681d69
ensembl-io : 91.923d668
ensembl-variation : 91.c78d8b4
ensembl-vep : 91.3
Example fragments:
VCF file:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
1.2B_sample 10G_sample A7K9_sample C2_sample E5K9_sample
Pf3D7_01_v3 12 . A C 371.78 . AC=5;AF=0.500;AN=10;
BaseQRankSum=-2.571;DP=166;Dels=0.00;ExcessHet=11.9728;
FS=0.000;HaplotypeScore=13.0379;MLEAC=5;MLEAF=0.500;MQ=
12.68;MQ0=0;MQRankSum=-5.041;QD=2.24;ReadPosRankSum=0.232;SOR=2.263
GT:AD:DP:GQ:PL 0/1:19,5:24:16:16,0,280 0/1:16,11:27:90:90,0,187
0/1:20,10:30:63:63,0,248 0/1:39,18:57:99:137,0,486
0/1:16,12:28:99:100,0,193
Pf3D7_01_v3 30 . A G 39.92 . AC=3;AF=0.300;AN=10;
BaseQRankSum=-7.495;DP=591;Dels=0.00;ExcessHet=4.7712;FS=
0.000;HaplotypeScore=76.2682;MLEAC=3;MLEAF=0.300;MQ=12.22;
MQ0=0;MQRankSum=-4.844;QD=0.13;ReadPosRankSum=-1.706;SOR=0.405
GT:AD:DP:GQ:PL 0/1:56,13:69:25:25,0,749 0/0:74,14:91:7:0,7,948
0/1:89,24:114:33:33,0,1035 0/0:167,27:197:94:0,94,2087
0/1:93,24:119:18:18,0,1089
Pf3D7_01_v3 37 . G A 10.87 . AC=2;AF=0.200;AN=10;
BaseQRankSum=7.016;DP=745;Dels=0.03;ExcessHet=3.5218;FS=
0.000;HaplotypeScore=95.8754;MLEAC=2;MLEAF=0.200;MQ=12.26;
MQ0=0;MQRankSum=-4.419;QD=0.03;ReadPosRankSum=1.936;SOR=0.962
GT:AD:DP:GQ:PL 0/0:76,10:87:72:0,72,1134
0/0:102,8:111:99:0,151,1577 0/0:110,24:138:89:0,89,1381
0/1:191,50:243:39:39,0,2375 0/1:118,25:146:5:5,0,1533
Pf3D7_01_v3 58 . A G 162.69 . AC=1;AF=0.100;AN=10;
BaseQRankSum=-7.121;DP=928;Dels=0.01;ExcessHet=3.0103;FS=
0.000;HaplotypeScore=147.7849;MLEAC=1;MLEAF=0.100;MQ=12.57;
MQ0=0;MQRankSum=-5.615;QD=0.68;ReadPosRankSum=0.938;SOR=0.445
GT:AD:DP:GQ:PL 0/0:102,11:114:99:0,115,1407
0/0:141,17:159:99:0,160,1896 0/0:162,32:198:66:0,66,1942
0/1:185,56:247:99:195,0,1990 0/0:170,30:201:51:0,51,2099
Pf3D7_01_v3 72 . G A 447.03 . AC=3;AF=0.300;AN=10;
BaseQRankSum=7.927;DP=995;Dels=0.03;ExcessHet=4.7712;FS=
0.000;HaplotypeScore=152.4765;MLEAC=3;MLEAF=0.300;MQ=12.70;
MQ0=0;MQRankSum=-5.905;QD=0.67;ReadPosRankSum=4.279;SOR=1.157
GT:AD:DP:GQ:PL 0/0:112,15:129:99:0,121,1642
0/0:144,21:168:99:0,155,2064 0/1:168,42:210:17:17,0,2079
0/1:186,53:243:99:107,0,2331 0/1:158,60:219:99:360,0,1938
GFF file:
Pf3D7_01_v3 EuPathDB gene 29510 37126 . + .
ID=PF3D7_0100100;description=erythrocyte membrane protein 1%2C
PfEMP1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB exon 29510 34762 . + .
ID=exon_PF3D7_0100100-E1;Parent=PF3D7_0100100.1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB CDS 29510 34762 . + 0
ID=PF3D7_0100100.1-p1-CDS1;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_
0100100.1-p1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB mRNA 29510 37126 . + .
ID=PF3D7_0100100.1;Parent=PF3D7_0100100;Ontology_term=
GO:0050839,GO:0004872,GO:0020002,GO:0020030,GO:0016021,
GO:0020033,GO:0020035,GO:0020013,GO:0009405,GO:0016337;description=erythrocyte
membrane protein 1%2C PfEMP1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB CDS 35888 37126 . + 0
ID=PF3D7_0100100.1-p1-CDS2;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_
0100100.1-p1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB exon 35888 37126 . + .
ID=exon_PF3D7_0100100-E2;Parent=PF3D7_0100100.1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB gene 38982 40207 . - .
ID=PF3D7_0100200;description=rifin;biotype=protein_coding
Pf3D7_01_v3 EuPathDB mRNA 38982 40207 . - .
ID=PF3D7_0100200.1;Parent=PF3D7_0100200;Ontology_term=
GO:0020036,GO:0020002,GO:0020003,GO:0020033,GO:0020035,
GO:0020013;description=rifin;biotype=protein_coding
Pf3D7_01_v3 EuPathDB exon 38982 39923 . - .
ID=exon_PF3D7_0100200-E2;Parent=PF3D7_0100200.1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB CDS 38982 39923 . - 1
ID=PF3D7_0100200.1-p1-CDS2;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_
0100200.1-p1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB exon 40154 40207 . - .
ID=exon_PF3D7_0100200-E1;Parent=PF3D7_0100200.1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB CDS 40154 40207 . - 0
ID=PF3D7_0100200.1-p1-CDS1;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_
0100200.1-p1;biotype=protein_coding
Pf3D7_01_v3 EuPathDB mRNA 42367 46507 . - .
ID=PF3D7_0100300.1;Parent=PF3D7_0100300;Ontology_term=
GO:0004872,GO:0016021,GO:0009405;description=erythrocyte membrane protein
1%2C PfEMP1;biotype=protein_coding
FASTA file:
>Pf3D7_01_v3 | organism=Plasmodium_falciparum_3D7 | version=2015-06-18 |
length=640851 | SO=chromosome
TGAACCCTAAAACCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAACCCTAAA
CCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAAACCTAAACCCT
GAACCCTAAACCCTGAACCCTGAACCCTAACCCTAAACCCTAAACCTAAAACCCTGAACC
CTAAACCCTGAACCCTGAACCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAA
ACCCTGAACCCTAAACCCTAAACCCTGAACCCTGAACCCTAAA...
--
I use VEP with the following command:
vep -i All_SNP.vcf -gff custom.gff.gz -fasta ref.fasta.gz -o anofile.csv
--force_overwrite
And I consisteltly get the following error:
WARNING: Parent entries with the following IDs were not found or skipped
due to invalid types: PF3D7_0108400.1,PF3D7_0108400.2,
PF3D7_0105400.1,PF3D7_0105400.2
WARNING: Parent entries with the following IDs were not found or skipped
due to invalid types: PF3D7_0205700.2,PF3D7_0205700.1,
PF3D7_0216700.1,PF3D7_0216700.2, PF3D7_0210100.1,PF3D7_0210100.2,
PF3D7_0208700.1,PF3D7_0208700.2, PF3D7_0219400.2,PF3D7_0219400.1,
PF3D7_0206900.1,PF3D7_0206900.2, PF3D7_0202600.1,PF3D7_0202600.2
Can't use an undefined value as an ARRAY reference at
/home/lucas/Programs/ensembl-vep/Bio/EnsEMBL/Transcript.pm line 1617,
<__ANONIO__> line 9772.
I added the biotype argument manually on the gff since i thought the
problem might lie thre but I have had no success.
Any help would be much appreciated.
Thank you very much,
Lucas
Lucas Michel Todó
Bioinformatics Researcher
*IS**Global*
Carrer Roselló 149, 1ª Planta
Barcelona 08036
T.: +34 93 2275400 ext. 4080 <+34%20932%2027%2054%2000>
E.: lucas.michel at isglobal.org
--
Lucas Michel Todó
Bioinformatics Researcher
*IS**Global*
Carrer Roselló 149, 1ª Planta
Barcelona 08036
T.: +34 93 2275400 ext. 4080 <+34%20932%2027%2054%2000>
E.: lucas.michel at isglobal.org
--
This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional privilege.
If you are not the intended recipient you are hereby notified that any
dissemination, copy or disclosure of this communication is strictly
prohibited by law. If this message has been received in error, please
immediately notify us via e-mail and delete it.
DATA PROTECTION. We inform you that your personal data, including your
e-mail address and data included in your email correspondence, are included
in the ISGlobal Foundation files. Your personal data will be used for the
purpose of contacting you and sending information on the activities of the
above foundations. You can exercise your rights of access, rectification,
cancellation and opposition by contacting the following address:
lopd at isglobal.org. ISGlobal Privacy Policy at www.isglobal.org.
-----------------------------------------------------------------------------------------------------------------------------
CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente a su
destinatario y puede contener información confidencial, por lo que la
utilización, divulgación y/o copia sin autorización está prohibida por la
legislación vigente. Si ha recibido este mensaje por error, le rogamos lo
comunique inmediatamente por esta misma vía y proceda a su destrucción.
PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este
envío, incluida su dirección de e-mail, forman parte de ficheros de
titularidad de la Fundación ISGlobal para cualquier finalidades de
contacto, relación institucional y/o envío de información sobre sus
actividades. Los datos que usted nos pueda facilitar contestando este
correo quedarán incorporados en los correspondientes ficheros, autorizando
el uso de su dirección de e-mail para las finalidades citadas. Puede
ejercer los derechos de acceso, rectificación, cancelación y oposición
dirigiéndose a lopd at isglobal.org . Política de privacidad en
www.isglobal.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180226/4ecf5c09/attachment.html>
More information about the Dev
mailing list