[ensembl-dev] VEP variants missing on output

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed May 22 08:22:48 BST 2013


Hello Will,

You was right. I'm getting the 406 variants.
I just dropped html in case.

As always flawless Ensembl support. Thank you !

Best regards,
Guillermo.

On 05/21/2013 05:13 PM, Will McLaren wrote:
> You get one line of output for each variant/feature overlap, so you 
> will almost always see more output lines than input if you use the 
> default output format. If you use VCF output, you only get one line 
> per variant.
>
> You can check how many unique variants there are in the output with e.g.:
>
> grep -v # variant_effect_output.txt | cut -f 1 | sort -u | wc -l
>
> assuming your variants have unique names.
>
> Try dropping "html" from your config, see if that makes any difference 
> - as the newest feature there, it's got a higher chance of causing 
> problems!
>
> Will
>
>
>
>
> On 21 May 2013 16:02, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hello Will,
>
>     I'm getting more 3000 lines of file output.. this seems really
>     weird....
>
>     wc -l variant_effect_output.txt
>
>     *3936*
>
>     Here's the way I'm proceeding:
>
>     ./variant_effect_predictor.pl  <http://variant_effect_predictor.pl>  -i /home/likewise-open/SGNET/gmarco/vep_71_annotation_check/input.vcf -force -fork 4 --database --config vep_71.test
>
>
>     Here's the content of vep_71.test:
>
>     dir                /home/likewise-open/SGNET/gmarco/.vep
>     toplevel_dir       /home/likewise-open/SGNET/gmarco/.vep
>     force_overwrite    1
>     format             vcf
>     html               1
>     host               192.19.x.xx
>     port               3306
>     user               myuser
>     password           mypassword
>     buffer_size        5000
>
>     hgvs               1
>     canonical          1
>     ccds               1
>     check_svs          1
>     domains            1
>     gmaf               1
>     hgnc               1
>     maf_1kg            1
>     numbers            1
>     polyphen           b
>     regulatory         1
>     sift               b
>
>     Best regards,
>     Guillermo.
>
>
>     On 05/21/2013 02:30 PM, Will McLaren wrote:
>>     Hi Guillermo,
>>
>>     I'm unable to recreate this, sorry!
>>
>>     I get 406 going in, 406 coming out every time, whichever
>>     combination of those options above I use, and whether I use VCF
>>     or standard output.
>>
>>     Here's my run (minus -check_sv):
>>
>>     > perl variant_effect_predictor.pl
>>     <http://variant_effect_predictor.pl> -i guill.vcf -vcf -cache
>>     -force -fork 4 -hgvs -canon -ccds -domains -gmaf -hgnc -maf_1kg
>>     -numbers -poly b -regu -sift b -fasta
>>     ~/NFS/Fasta/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa
>>     2013-05-21 13:24:26 - Checking/creating FASTA index
>>     2013-05-21 13:24:26 - Read existing cache info
>>     2013-05-21 13:24:26 - Starting...
>>     2013-05-21 13:24:26 - Detected format of input file as vcf
>>     2013-05-21 13:24:26 - Read 406 variants into buffer
>>     2013-05-21 13:24:26 - Reading transcript data from cache and/or
>>     database
>>     [================================================================]  [
>>     100% ]
>>     2013-05-21 13:24:30 - Retrieved 10891 transcripts (0 mem, 10919
>>     cached, 0 DB, 28 duplicates)
>>     2013-05-21 13:24:30 - Reading regulatory data from cache and/or
>>     database
>>     [================================================================]  [
>>     100% ]
>>     2013-05-21 13:24:35 - Retrieved 36955 regulatory features (0 mem,
>>     36955 cached, 0 DB, 0 duplicates)
>>     2013-05-21 13:24:35 - Calculating consequences
>>     [================================================================]  [
>>     100% ]
>>     2013-05-21 13:24:56 - Writing output2013-05-21 13:24:56 -
>>     Processed 406 total variants (14 vars/sec, 14 vars/sec total)
>>     2013-05-21 13:24:56 - Wrote stats summary to
>>     variant_effect_output.txt_summary.html
>>     2013-05-21 13:24:56 - Finished!
>>     > wc -l variant_effect_output.txt
>>     408
>>
>>     It's 408 as it's adding two header lines to the VCF output.
>>
>>     Which 16 are missing from your output, and is it the same 16 each
>>     time?
>>
>>     Try writing to a different output file, or on a different disk if
>>     you can (perhaps disk space is an issue?)
>>
>>     Will
>>
>>
>>     On 21 May 2013 13:15, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hello Will,
>>
>>         Here's the input:
>>         https://github.com/guillermomarco/vep_plugins_71/blob/master/missing_variants/missing_output_variants.vcf
>>
>>         As you said it's not about the options or plugins. Launching
>>         VEP without specyfiying any option still returns an output
>>         with missing variants.
>>
>>         Regards,
>>         Guillermo.
>>
>>
>>
>>         On 05/21/2013 01:49 PM, Will McLaren wrote:
>>>         Hi Guillermo,
>>>
>>>         None of those options should filter out variants.
>>>
>>>         Are you able to provide any of the files that recreate the
>>>         problem?
>>>
>>>         Is there any chance that you are using VCF input and it
>>>         contains non-variant lines - this would be where the ALT
>>>         column is empty or "."? If so, this may be your problem. To
>>>         force these to be included in the output, you should add
>>>         --allow_non_variant.
>>>
>>>         Regards
>>>
>>>         Will
>>>
>>>
>>>         On 21 May 2013 09:40, Guillermo Marco Puche
>>>         <guillermo.marco at sistemasgenomicos.com
>>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>>             Hello,
>>>
>>>             I've been checking VEP results, and i've noticed that
>>>             I'm missing some input variants in the output.
>>>
>>>             I think this may be issued to some of the options i'm
>>>             using to launch vep:
>>>
>>>             hgvs 1
>>>             canonical          1
>>>             ccds               1
>>>             check_svs          1
>>>             domains            1
>>>             gmaf               1
>>>             hgnc               1
>>>             maf_1kg            1
>>>             numbers            1
>>>             polyphen           b
>>>             regulatory         1
>>>             sift               b
>>>
>>>             Should be any of these options filtering output? I've
>>>             disabled all plugins to run this test to be sure that
>>>             it's not a plugin issue.
>>>
>>>               * With a 406 variant input vcf file, I've missed 16
>>>                 variants.
>>>               * I then ran VEP with only those 16 missing variants
>>>                 and missed 3 on output.
>>>               * Rerun again and now with 3 missing variants and now
>>>                 not a single one was missing.
>>>
>>>             I would like to know what's behind that weird behaviour.
>>>
>>>             Thank you.
>>>
>>>             Best regards,
>>>             Guillermo.
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>             Posting guidelines and subscribe/unsubscribe info:
>>>             http://lists.ensembl.org/mailman/listinfo/dev
>>>             Ensembl Blog: http://www.ensembl.info
>>>             <http://www.ensembl.info/> 
>>>
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/


-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130522/a7f935a8/attachment.html>


More information about the Dev mailing list