[ensembl-dev] Fwd: Custom annotation

Lucas Michel lucas.michel at isglobal.org
Mon Feb 26 12:18:34 GMT 2018


I am working on the annotation of some variants on Plasmodium Falciparum
with VEP. Since there is no reference for it in the ensemble page (or I
haven't found it) I am using a custom GFF and fasta file but i can not get
it to work properly.

VEP version:
Versions:
  ensembl              : 91.18ee742
  ensembl-funcgen      : 91.4681d69
  ensembl-io           : 91.923d668
  ensembl-variation    : 91.c78d8b4
  ensembl-vep          : 91.3



Example fragments:

VCF file:

CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT
1.2B_sample    10G_sample    A7K9_sample    C2_sample    E5K9_sample
Pf3D7_01_v3    12    .    A    C    371.78    .    AC=5;AF=0.500;AN=10;
BaseQRankSum=-2.571;DP=166;Dels=0.00;ExcessHet=11.9728;
FS=0.000;HaplotypeScore=13.0379;MLEAC=5;MLEAF=0.500;MQ=
12.68;MQ0=0;MQRankSum=-5.041;QD=2.24;ReadPosRankSum=0.232;SOR=2.263
GT:AD:DP:GQ:PL    0/1:19,5:24:16:16,0,280    0/1:16,11:27:90:90,0,187
0/1:20,10:30:63:63,0,248    0/1:39,18:57:99:137,0,486
0/1:16,12:28:99:100,0,193
Pf3D7_01_v3    30    .    A    G    39.92    .    AC=3;AF=0.300;AN=10;
BaseQRankSum=-7.495;DP=591;Dels=0.00;ExcessHet=4.7712;FS=
0.000;HaplotypeScore=76.2682;MLEAC=3;MLEAF=0.300;MQ=12.22;
MQ0=0;MQRankSum=-4.844;QD=0.13;ReadPosRankSum=-1.706;SOR=0.405
GT:AD:DP:GQ:PL    0/1:56,13:69:25:25,0,749    0/0:74,14:91:7:0,7,948
0/1:89,24:114:33:33,0,1035    0/0:167,27:197:94:0,94,2087
0/1:93,24:119:18:18,0,1089
Pf3D7_01_v3    37    .    G    A    10.87    .    AC=2;AF=0.200;AN=10;
BaseQRankSum=7.016;DP=745;Dels=0.03;ExcessHet=3.5218;FS=
0.000;HaplotypeScore=95.8754;MLEAC=2;MLEAF=0.200;MQ=12.26;
MQ0=0;MQRankSum=-4.419;QD=0.03;ReadPosRankSum=1.936;SOR=0.962
GT:AD:DP:GQ:PL    0/0:76,10:87:72:0,72,1134
0/0:102,8:111:99:0,151,1577    0/0:110,24:138:89:0,89,1381
0/1:191,50:243:39:39,0,2375    0/1:118,25:146:5:5,0,1533
Pf3D7_01_v3    58    .    A    G    162.69    .    AC=1;AF=0.100;AN=10;
BaseQRankSum=-7.121;DP=928;Dels=0.01;ExcessHet=3.0103;FS=
0.000;HaplotypeScore=147.7849;MLEAC=1;MLEAF=0.100;MQ=12.57;
MQ0=0;MQRankSum=-5.615;QD=0.68;ReadPosRankSum=0.938;SOR=0.445
GT:AD:DP:GQ:PL    0/0:102,11:114:99:0,115,1407
0/0:141,17:159:99:0,160,1896    0/0:162,32:198:66:0,66,1942
0/1:185,56:247:99:195,0,1990    0/0:170,30:201:51:0,51,2099
Pf3D7_01_v3    72    .    G    A    447.03    .    AC=3;AF=0.300;AN=10;
BaseQRankSum=7.927;DP=995;Dels=0.03;ExcessHet=4.7712;FS=
0.000;HaplotypeScore=152.4765;MLEAC=3;MLEAF=0.300;MQ=12.70;
MQ0=0;MQRankSum=-5.905;QD=0.67;ReadPosRankSum=4.279;SOR=1.157
GT:AD:DP:GQ:PL    0/0:112,15:129:99:0,121,1642
0/0:144,21:168:99:0,155,2064    0/1:168,42:210:17:17,0,2079
0/1:186,53:243:99:107,0,2331    0/1:158,60:219:99:360,0,1938

GFF file:

Pf3D7_01_v3    EuPathDB    gene    29510    37126    .    +    .
ID=PF3D7_0100100;description=erythrocyte membrane protein 1%2C
PfEMP1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    exon    29510    34762    .    +    .
ID=exon_PF3D7_0100100-E1;Parent=PF3D7_0100100.1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    CDS    29510    34762    .    +    0
ID=PF3D7_0100100.1-p1-CDS1;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_
0100100.1-p1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    mRNA    29510    37126    .    +    .
ID=PF3D7_0100100.1;Parent=PF3D7_0100100;Ontology_term=
GO:0050839,GO:0004872,GO:0020002,GO:0020030,GO:0016021,
GO:0020033,GO:0020035,GO:0020013,GO:0009405,GO:0016337;description=erythrocyte
membrane protein 1%2C PfEMP1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    CDS    35888    37126    .    +    0
ID=PF3D7_0100100.1-p1-CDS2;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_
0100100.1-p1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    exon    35888    37126    .    +    .
ID=exon_PF3D7_0100100-E2;Parent=PF3D7_0100100.1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    gene    38982    40207    .    -    .
ID=PF3D7_0100200;description=rifin;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    mRNA    38982    40207    .    -    .
ID=PF3D7_0100200.1;Parent=PF3D7_0100200;Ontology_term=
GO:0020036,GO:0020002,GO:0020003,GO:0020033,GO:0020035,
GO:0020013;description=rifin;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    exon    38982    39923    .    -    .
ID=exon_PF3D7_0100200-E2;Parent=PF3D7_0100200.1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    CDS    38982    39923    .    -    1
ID=PF3D7_0100200.1-p1-CDS2;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_
0100200.1-p1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    exon    40154    40207    .    -    .
ID=exon_PF3D7_0100200-E1;Parent=PF3D7_0100200.1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    CDS    40154    40207    .    -    0
ID=PF3D7_0100200.1-p1-CDS1;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_
0100200.1-p1;biotype=protein_coding
Pf3D7_01_v3    EuPathDB    mRNA    42367    46507    .    -    .
ID=PF3D7_0100300.1;Parent=PF3D7_0100300;Ontology_term=
GO:0004872,GO:0016021,GO:0009405;description=erythrocyte membrane protein
1%2C PfEMP1;biotype=protein_coding

FASTA file:

>Pf3D7_01_v3 | organism=Plasmodium_falciparum_3D7 | version=2015-06-18 |
length=640851 | SO=chromosome
TGAACCCTAAAACCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAACCCTAAA
CCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAAACCTAAACCCT
GAACCCTAAACCCTGAACCCTGAACCCTAACCCTAAACCCTAAACCTAAAACCCTGAACC
CTAAACCCTGAACCCTGAACCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAA
ACCCTGAACCCTAAACCCTAAACCCTGAACCCTGAACCCTAAA...

-- 

I use VEP with the following command:

vep -i All_SNP.vcf -gff custom.gff.gz -fasta ref.fasta.gz -o anofile.csv
--force_overwrite

And I consisteltly get the following error:

WARNING: Parent entries with the following IDs were not found or skipped
due to invalid types: PF3D7_0108400.1,PF3D7_0108400.2,
PF3D7_0105400.1,PF3D7_0105400.2
WARNING: Parent entries with the following IDs were not found or skipped
due to invalid types: PF3D7_0205700.2,PF3D7_0205700.1,
PF3D7_0216700.1,PF3D7_0216700.2, PF3D7_0210100.1,PF3D7_0210100.2,
PF3D7_0208700.1,PF3D7_0208700.2, PF3D7_0219400.2,PF3D7_0219400.1,
PF3D7_0206900.1,PF3D7_0206900.2, PF3D7_0202600.1,PF3D7_0202600.2
Can't use an undefined value as an ARRAY reference at
/home/lucas/Programs/ensembl-vep/Bio/EnsEMBL/Transcript.pm line 1617,
<__ANONIO__> line 9772.

I added the biotype argument manually on the gff since i thought the
problem might lie thre but I have had no success.

Any help would be much appreciated.

Thank you very much,

Lucas



Lucas Michel Todó

Bioinformatics Researcher

*IS**Global*

Carrer Roselló 149, 1ª Planta
Barcelona 08036



T.: +34 93 2275400 ext. 4080 <+34%20932%2027%2054%2000>
E.: lucas.michel at isglobal.org



-- 

Lucas Michel Todó

Bioinformatics Researcher

*IS**Global*

Carrer Roselló 149, 1ª Planta
Barcelona 08036



T.: +34 93 2275400 ext. 4080 <+34%20932%2027%2054%2000>
E.: lucas.michel at isglobal.org

-- 


This message is intended exclusively for its addressee and may contain 
information that is CONFIDENTIAL and protected by professional privilege. 
If you are not the intended recipient you are hereby notified that any 
dissemination, copy or disclosure of this communication is strictly 
prohibited by law. If this message has been received in error, please 
immediately notify us via e-mail and delete it.

DATA PROTECTION. We inform you that your personal data, including your 
e-mail address and data included in your email correspondence, are included 
in the ISGlobal Foundation files. Your personal data will be used for the 
purpose of contacting you and sending information on the activities of the 
above foundations. You can exercise your rights of access, rectification, 
cancellation and opposition by contacting the following address: 
lopd at isglobal.org. ISGlobal Privacy Policy at www.isglobal.org.

-----------------------------------------------------------------------------------------------------------------------------

CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente a su 
destinatario y puede contener información confidencial, por lo que la 
utilización, divulgación y/o copia sin autorización está prohibida por la 
legislación vigente. Si ha recibido este mensaje por error, le rogamos lo 
comunique inmediatamente por esta misma vía y proceda a su destrucción.

PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este 
envío, incluida su dirección de e-mail, forman parte de ficheros de 
titularidad de la Fundación ISGlobal  para cualquier finalidades de 
contacto, relación institucional y/o envío de información sobre sus 
actividades. Los datos que usted nos pueda facilitar contestando este 
correo quedarán incorporados en los correspondientes ficheros, autorizando 
el uso de su dirección de e-mail para las finalidades citadas. Puede 
ejercer los derechos de acceso, rectificación, cancelación y oposición 
dirigiéndose a lopd at isglobal.org . Política de privacidad en 
www.isglobal.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180226/4ecf5c09/attachment.html>


More information about the Dev mailing list