[ensembl-dev] VEP variants missing on output

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Tue May 21 16:02:57 BST 2013


Hello Will,

I'm getting more 3000 lines of file output.. this seems really weird....

wc -l variant_effect_output.txt

*3936*

Here's the way I'm proceeding:

./variant_effect_predictor.pl -i /home/likewise-open/SGNET/gmarco/vep_71_annotation_check/input.vcf -force -fork 4 --database --config vep_71.test


Here's the content of vep_71.test:

dir                /home/likewise-open/SGNET/gmarco/.vep
toplevel_dir       /home/likewise-open/SGNET/gmarco/.vep
force_overwrite    1
format             vcf
html               1
host               192.19.x.xx
port               3306
user               myuser
password           mypassword
buffer_size        5000
hgvs               1
canonical          1
ccds               1
check_svs          1
domains            1
gmaf               1
hgnc               1
maf_1kg            1
numbers            1
polyphen           b
regulatory         1
sift               b

Best regards,
Guillermo.

On 05/21/2013 02:30 PM, Will McLaren wrote:
> Hi Guillermo,
>
> I'm unable to recreate this, sorry!
>
> I get 406 going in, 406 coming out every time, whichever combination 
> of those options above I use, and whether I use VCF or standard output.
>
> Here's my run (minus -check_sv):
>
> > perl variant_effect_predictor.pl 
> <http://variant_effect_predictor.pl> -i guill.vcf -vcf -cache -force 
> -fork 4 -hgvs -canon -ccds -domains -gmaf -hgnc -maf_1kg -numbers 
> -poly b -regu -sift b -fasta 
> ~/NFS/Fasta/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa
> 2013-05-21 13:24:26 - Checking/creating FASTA index
> 2013-05-21 13:24:26 - Read existing cache info
> 2013-05-21 13:24:26 - Starting...
> 2013-05-21 13:24:26 - Detected format of input file as vcf
> 2013-05-21 13:24:26 - Read 406 variants into buffer
> 2013-05-21 13:24:26 - Reading transcript data from cache and/or database
> [================================================================]  [ 
> 100% ]
> 2013-05-21 13:24:30 - Retrieved 10891 transcripts (0 mem, 10919 
> cached, 0 DB, 28 duplicates)
> 2013-05-21 13:24:30 - Reading regulatory data from cache and/or database
> [================================================================]  [ 
> 100% ]
> 2013-05-21 13:24:35 - Retrieved 36955 regulatory features (0 mem, 
> 36955 cached, 0 DB, 0 duplicates)
> 2013-05-21 13:24:35 - Calculating consequences
> [================================================================]  [ 
> 100% ]
> 2013-05-21 13:24:56 - Writing output2013-05-21 13:24:56 - Processed 
> 406 total variants (14 vars/sec, 14 vars/sec total)
> 2013-05-21 13:24:56 - Wrote stats summary to 
> variant_effect_output.txt_summary.html
> 2013-05-21 13:24:56 - Finished!
> > wc -l variant_effect_output.txt
> 408
>
> It's 408 as it's adding two header lines to the VCF output.
>
> Which 16 are missing from your output, and is it the same 16 each time?
>
> Try writing to a different output file, or on a different disk if you 
> can (perhaps disk space is an issue?)
>
> Will
>
>
> On 21 May 2013 13:15, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hello Will,
>
>     Here's the input:
>     https://github.com/guillermomarco/vep_plugins_71/blob/master/missing_variants/missing_output_variants.vcf
>
>     As you said it's not about the options or plugins. Launching VEP
>     without specyfiying any option still returns an output with
>     missing variants.
>
>     Regards,
>     Guillermo.
>
>
>
>     On 05/21/2013 01:49 PM, Will McLaren wrote:
>>     Hi Guillermo,
>>
>>     None of those options should filter out variants.
>>
>>     Are you able to provide any of the files that recreate the problem?
>>
>>     Is there any chance that you are using VCF input and it contains
>>     non-variant lines - this would be where the ALT column is empty
>>     or "."? If so, this may be your problem. To force these to be
>>     included in the output, you should add --allow_non_variant.
>>
>>     Regards
>>
>>     Will
>>
>>
>>     On 21 May 2013 09:40, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hello,
>>
>>         I've been checking VEP results, and i've noticed that I'm
>>         missing some input variants in the output.
>>
>>         I think this may be issued to some of the options i'm using
>>         to launch vep:
>>
>>         hgvs               1
>>         canonical          1
>>         ccds               1
>>         check_svs          1
>>         domains            1
>>         gmaf               1
>>         hgnc               1
>>         maf_1kg            1
>>         numbers            1
>>         polyphen           b
>>         regulatory         1
>>         sift               b
>>
>>         Should be any of these options filtering output? I've
>>         disabled all plugins to run this test to be sure that it's
>>         not a plugin issue.
>>
>>           * With a 406 variant input vcf file, I've missed 16 variants.
>>           * I then ran VEP with only those 16 missing variants and
>>             missed 3 on output.
>>           * Rerun again and now with 3 missing variants and now not a
>>             single one was missing.
>>
>>         I would like to know what's behind that weird behaviour.
>>
>>         Thank you.
>>
>>         Best regards,
>>         Guillermo.
>>
>>
>>
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info <http://www.ensembl.info/> 
>>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130521/a68ad4d3/attachment.html>


More information about the Dev mailing list