[ensembl-dev] Fwd: Custom annotation

Laurent Gil lgil at ebi.ac.uk
Tue Feb 27 11:01:26 GMT 2018


Dear Lucas,

It looks like your GFF file is missing some entries with the following 
IDs (these IDs are also present in the attribute "Parent" of other child 
entries in your GFF file):

PF3D7_0108400.1,PF3D7_0108400.2, PF3D7_0105400.1,PF3D7_0105400.2, 
PF3D7_0205700.2,PF3D7_0205700.1, PF3D7_0216700.1,PF3D7_0216700.2, 
PF3D7_0210100.1,PF3D7_0210100.2, PF3D7_0208700.1,PF3D7_0208700.2, 
PF3D7_0219400.2,PF3D7_0219400.1, PF3D7_0206900.1,PF3D7_0206900.2, 
PF3D7_0202600.1,PF3D7_0202600.2

or these entries have a "type" (gene, CDS, mRNA, exon, etc ..) not 
supported by the VEP.
Here is the list of "type" supported by the VEP tool: 
http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gfftypes 
(click on the link "[Show supported types]").

Best regards,

Laurent
Ensembl Variation

On 26/02/2018 12:18, Lucas Michel wrote:
>
>
> I am working on the annotation of some variants on Plasmodium 
> Falciparum with VEP. Since there is no reference for it in the 
> ensemble page (or I haven't found it) I am using a custom GFF and 
> fasta file but i can not get it to work properly.
>
> VEP version:
> Versions:
>   ensembl              : 91.18ee742
>   ensembl-funcgen      : 91.4681d69
>   ensembl-io           : 91.923d668
>   ensembl-variation    : 91.c78d8b4
>   ensembl-vep          : 91.3
>
>
>
> Example fragments:
>
> VCF file:
>
> CHROM    POS    ID    REF    ALT    QUAL    FILTER INFO    FORMAT    
> 1.2B_sample    10G_sample A7K9_sample    C2_sample    E5K9_sample
> Pf3D7_01_v3    12    .    A    C    371.78    . 
> AC=5;AF=0.500;AN=10;BaseQRankSum=-2.571;DP=166;Dels=0.00;ExcessHet=11.9728;FS=0.000;HaplotypeScore=13.0379;MLEAC=5;MLEAF=0.500;MQ=12.68;MQ0=0;MQRankSum=-5.041;QD=2.24;ReadPosRankSum=0.232;SOR=2.263 
> GT:AD:DP:GQ:PL    0/1:19,5:24:16:16,0,280 0/1:16,11:27:90:90,0,187    
> 0/1:20,10:30:63:63,0,248 0/1:39,18:57:99:137,0,486    
> 0/1:16,12:28:99:100,0,193
> Pf3D7_01_v3    30    .    A    G    39.92    . 
> AC=3;AF=0.300;AN=10;BaseQRankSum=-7.495;DP=591;Dels=0.00;ExcessHet=4.7712;FS=0.000;HaplotypeScore=76.2682;MLEAC=3;MLEAF=0.300;MQ=12.22;MQ0=0;MQRankSum=-4.844;QD=0.13;ReadPosRankSum=-1.706;SOR=0.405 
> GT:AD:DP:GQ:PL    0/1:56,13:69:25:25,0,749 0/0:74,14:91:7:0,7,948    
> 0/1:89,24:114:33:33,0,1035 0/0:167,27:197:94:0,94,2087 
> 0/1:93,24:119:18:18,0,1089
> Pf3D7_01_v3    37    .    G    A    10.87    . 
> AC=2;AF=0.200;AN=10;BaseQRankSum=7.016;DP=745;Dels=0.03;ExcessHet=3.5218;FS=0.000;HaplotypeScore=95.8754;MLEAC=2;MLEAF=0.200;MQ=12.26;MQ0=0;MQRankSum=-4.419;QD=0.03;ReadPosRankSum=1.936;SOR=0.962 
> GT:AD:DP:GQ:PL    0/0:76,10:87:72:0,72,1134 
> 0/0:102,8:111:99:0,151,1577 0/0:110,24:138:89:0,89,1381 
> 0/1:191,50:243:39:39,0,2375    0/1:118,25:146:5:5,0,1533
> Pf3D7_01_v3    58    .    A    G    162.69    . 
> AC=1;AF=0.100;AN=10;BaseQRankSum=-7.121;DP=928;Dels=0.01;ExcessHet=3.0103;FS=0.000;HaplotypeScore=147.7849;MLEAC=1;MLEAF=0.100;MQ=12.57;MQ0=0;MQRankSum=-5.615;QD=0.68;ReadPosRankSum=0.938;SOR=0.445 
> GT:AD:DP:GQ:PL    0/0:102,11:114:99:0,115,1407 
> 0/0:141,17:159:99:0,160,1896 0/0:162,32:198:66:0,66,1942 
> 0/1:185,56:247:99:195,0,1990 0/0:170,30:201:51:0,51,2099
> Pf3D7_01_v3    72    .    G    A    447.03    . 
> AC=3;AF=0.300;AN=10;BaseQRankSum=7.927;DP=995;Dels=0.03;ExcessHet=4.7712;FS=0.000;HaplotypeScore=152.4765;MLEAC=3;MLEAF=0.300;MQ=12.70;MQ0=0;MQRankSum=-5.905;QD=0.67;ReadPosRankSum=4.279;SOR=1.157 
> GT:AD:DP:GQ:PL    0/0:112,15:129:99:0,121,1642 
> 0/0:144,21:168:99:0,155,2064 0/1:168,42:210:17:17,0,2079 
> 0/1:186,53:243:99:107,0,2331 0/1:158,60:219:99:360,0,1938
>
> GFF file:
>
> Pf3D7_01_v3    EuPathDB    gene    29510    37126    . +    .    
> ID=PF3D7_0100100;description=erythrocyte membrane protein 1%2C 
> PfEMP1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    exon    29510    34762    . +    .    
> ID=exon_PF3D7_0100100-E1;Parent=PF3D7_0100100.1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    CDS    29510    34762    . +    0    
> ID=PF3D7_0100100.1-p1-CDS1;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_0100100.1-p1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    mRNA    29510    37126    . +    .    
> ID=PF3D7_0100100.1;Parent=PF3D7_0100100;Ontology_term=GO:0050839,GO:0004872,GO:0020002,GO:0020030,GO:0016021,GO:0020033,GO:0020035,GO:0020013,GO:0009405,GO:0016337;description=erythrocyte 
> membrane protein 1%2C PfEMP1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    CDS    35888    37126    . +    0    
> ID=PF3D7_0100100.1-p1-CDS2;Parent=PF3D7_0100100.1;protein_source_id=PF3D7_0100100.1-p1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    exon    35888    37126    . +    .    
> ID=exon_PF3D7_0100100-E2;Parent=PF3D7_0100100.1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    gene    38982    40207    . -    .    
> ID=PF3D7_0100200;description=rifin;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    mRNA    38982    40207    . -    .    
> ID=PF3D7_0100200.1;Parent=PF3D7_0100200;Ontology_term=GO:0020036,GO:0020002,GO:0020003,GO:0020033,GO:0020035,GO:0020013;description=rifin;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    exon    38982    39923    . -    .    
> ID=exon_PF3D7_0100200-E2;Parent=PF3D7_0100200.1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    CDS    38982    39923    . -    1    
> ID=PF3D7_0100200.1-p1-CDS2;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_0100200.1-p1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    exon    40154    40207    . -    .    
> ID=exon_PF3D7_0100200-E1;Parent=PF3D7_0100200.1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    CDS    40154    40207    . -    0    
> ID=PF3D7_0100200.1-p1-CDS1;Parent=PF3D7_0100200.1;protein_source_id=PF3D7_0100200.1-p1;biotype=protein_coding
> Pf3D7_01_v3    EuPathDB    mRNA    42367    46507    . -    .    
> ID=PF3D7_0100300.1;Parent=PF3D7_0100300;Ontology_term=GO:0004872,GO:0016021,GO:0009405;description=erythrocyte 
> membrane protein 1%2C PfEMP1;biotype=protein_coding
>
> FASTA file:
>
> >Pf3D7_01_v3 | organism=Plasmodium_falciparum_3D7 | version=2015-06-18 
> | length=640851 | SO=chromosome
> TGAACCCTAAAACCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAACCCTAAA
> CCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAAACCCTGAAACCTAAACCCT
> GAACCCTAAACCCTGAACCCTGAACCCTAACCCTAAACCCTAAACCTAAAACCCTGAACC
> CTAAACCCTGAACCCTGAACCCTAAACCCTGAACCCTAAACCCTAAACCCTGAACCCTAA
> ACCCTGAACCCTAAACCCTAAACCCTGAACCCTGAACCCTAAA...
>
> -- 
>
> I use VEP with the following command:
>
> vep -i All_SNP.vcf -gff custom.gff.gz -fasta ref.fasta.gz -o 
> anofile.csv --force_overwrite
>
> And I consisteltly get the following error:
>
> WARNING: Parent entries with the following IDs were not found or 
> skipped due to invalid types: PF3D7_0108400.1,PF3D7_0108400.2, 
> PF3D7_0105400.1,PF3D7_0105400.2
> WARNING: Parent entries with the following IDs were not found or 
> skipped due to invalid types: PF3D7_0205700.2,PF3D7_0205700.1, 
> PF3D7_0216700.1,PF3D7_0216700.2, PF3D7_0210100.1,PF3D7_0210100.2, 
> PF3D7_0208700.1,PF3D7_0208700.2, PF3D7_0219400.2,PF3D7_0219400.1, 
> PF3D7_0206900.1,PF3D7_0206900.2, PF3D7_0202600.1,PF3D7_0202600.2
> Can't use an undefined value as an ARRAY reference at 
> /home/lucas/Programs/ensembl-vep/Bio/EnsEMBL/Transcript.pm line 1617, 
> <__ANONIO__> line 9772.
>
> I added the biotype argument manually on the gff since i thought the 
> problem might lie thre but I have had no success.
>
> Any help would be much appreciated.
>
> Thank you very much,
>
> Lucas
>
>
>
> Lucas Michel Todó
>
> Bioinformatics Researcher
>
> **
>
> **
>
> *IS**Global*
>
> Carrer Roselló 149, 1ª Planta
> Barcelona 08036
>
> T.: +34 93 2275400 ext. 4080 <tel:+34%20932%2027%2054%2000>
>
> E.: lucas.michel at isglobal.org <mailto:lucas.michel at isglobal.org>
>
>
>
> -- 
>
> Lucas Michel Todó
>
> Bioinformatics Researcher
>
> **
>
> **
>
> *IS**Global*
>
> Carrer Roselló 149, 1ª Planta
> Barcelona 08036
>
> T.: +34 93 2275400 ext. 4080 <tel:+34%20932%2027%2054%2000>
>
> E.: lucas.michel at isglobal.org <mailto:lucas.michel at isglobal.org>
>
> This message is intended exclusively for its addressee and may contain 
> information that is CONFIDENTIAL and protected by professional 
> privilege. If you are not the intended recipient you are hereby 
> notified that any dissemination, copy or disclosure of this 
> communication is strictly prohibited by law. If this message has been 
> received in error, please immediately notify us via e-mail and delete it.
>
> DATA PROTECTION. We inform you that your personal data, including your 
> e-mail address and data included in your email correspondence, are 
> included in the ISGlobal Foundation files. Your personal data will be 
> used for the purpose of contacting you and sending information on the 
> activities of the above foundations. You can exercise your rights of 
> access, rectification, cancellation and opposition by contacting the 
> following address: lopd at isglobal.org <mailto:lopd at isglobal.org>. 
> ISGlobal Privacy Policy at www.isglobal.org <http://www.isglobal.org/>.
>
> -----------------------------------------------------------------------------------------------------------------------------
>
> CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente 
> a su destinatario y puede contener información confidencial, por lo 
> que la utilización, divulgación y/o copia sin autorización está 
> prohibida por la legislación vigente. Si ha recibido este mensaje por 
> error, le rogamos lo comunique inmediatamente por esta misma vía y 
> proceda a su destrucción.
>
> PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este 
> envío, incluida su dirección de e-mail, forman parte de ficheros de 
> titularidad de la Fundación ISGlobal  para cualquier finalidades de 
> contacto, relación institucional y/o envío de información sobre sus 
> actividades. Los datos que usted nos pueda facilitar contestando este 
> correo quedarán incorporados en los correspondientes ficheros, 
> autorizando el uso de su dirección de e-mail para las finalidades 
> citadas. Puede ejercer los derechos de acceso, rectificación, 
> cancelación y oposición dirigiéndose alopd at isglobal.org 
> <mailto:lopd at isglobal.org>__. Política de privacidad 
> enwww.isglobal.org <http://www.isglobal.org/>.
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180227/d5370f8b/attachment.html>


More information about the Dev mailing list