[ensembl-dev] VEP variants missing on output

Duarte Molha Duarte.Molha at ogt.com
Wed May 22 09:07:24 BST 2013


So using the html misses variants?


From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Guillermo Marco Puche
Sent: 22 May 2013 08:23
To: dev at ensembl.org
Subject: Re: [ensembl-dev] VEP variants missing on output

Hello Will,

You was right. I'm getting the 406 variants.
I just dropped html in case.

As always flawless Ensembl support. Thank you !

Best regards,
Guillermo.

On 05/21/2013 05:13 PM, Will McLaren wrote:
You get one line of output for each variant/feature overlap, so you will almost always see more output lines than input if you use the default output format. If you use VCF output, you only get one line per variant.

You can check how many unique variants there are in the output with e.g.:

grep -v # variant_effect_output.txt | cut -f 1 | sort -u | wc -l

assuming your variants have unique names.

Try dropping "html" from your config, see if that makes any difference - as the newest feature there, it's got a higher chance of causing problems!

Will



On 21 May 2013 16:02, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hello Will,

I'm getting more 3000 lines of file output.. this seems really weird....

wc -l variant_effect_output.txt
3936

Here's the way I'm proceeding:

./variant_effect_predictor.pl<http://variant_effect_predictor.pl> -i /home/likewise-open/SGNET/gmarco/vep_71_annotation_check/input.vcf -force -fork 4 --database --config vep_71.test

Here's the content of vep_71.test:

dir                /home/likewise-open/SGNET/gmarco/.vep
toplevel_dir       /home/likewise-open/SGNET/gmarco/.vep
force_overwrite    1
format             vcf
html               1
host               192.19.x.xx
port               3306
user               myuser
password           mypassword
buffer_size        5000

hgvs               1
canonical          1
ccds               1
check_svs          1
domains            1
gmaf               1
hgnc               1
maf_1kg            1
numbers            1
polyphen           b
regulatory         1
sift               b
Best regards,
Guillermo.


On 05/21/2013 02:30 PM, Will McLaren wrote:
Hi Guillermo,

I'm unable to recreate this, sorry!

I get 406 going in, 406 coming out every time, whichever combination of those options above I use, and whether I use VCF or standard output.

Here's my run (minus -check_sv):

> perl variant_effect_predictor.pl<http://variant_effect_predictor.pl> -i guill.vcf -vcf -cache -force -fork 4 -hgvs -canon -ccds -domains -gmaf -hgnc -maf_1kg -numbers -poly b -regu -sift b -fasta ~/NFS/Fasta/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa
2013-05-21 13:24:26 - Checking/creating FASTA index
2013-05-21 13:24:26 - Read existing cache info
2013-05-21 13:24:26 - Starting...
2013-05-21 13:24:26 - Detected format of input file as vcf
2013-05-21 13:24:26 - Read 406 variants into buffer
2013-05-21 13:24:26 - Reading transcript data from cache and/or database
[================================================================]  [ 100% ]
2013-05-21 13:24:30 - Retrieved 10891 transcripts (0 mem, 10919 cached, 0 DB, 28 duplicates)
2013-05-21 13:24:30 - Reading regulatory data from cache and/or database
[================================================================]  [ 100% ]
2013-05-21 13:24:35 - Retrieved 36955 regulatory features (0 mem, 36955 cached, 0 DB, 0 duplicates)
2013-05-21 13:24:35 - Calculating consequences
[================================================================]  [ 100% ]
2013-05-21 13:24:56 - Writing output2013-05-21 13:24:56 - Processed 406 total variants (14 vars/sec, 14 vars/sec total)
2013-05-21 13:24:56 - Wrote stats summary to variant_effect_output.txt_summary.html
2013-05-21 13:24:56 - Finished!
> wc -l variant_effect_output.txt
408

It's 408 as it's adding two header lines to the VCF output.

Which 16 are missing from your output, and is it the same 16 each time?

Try writing to a different output file, or on a different disk if you can (perhaps disk space is an issue?)

Will

On 21 May 2013 13:15, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hello Will,

Here's the input: https://github.com/guillermomarco/vep_plugins_71/blob/master/missing_variants/missing_output_variants.vcf

As you said it's not about the options or plugins. Launching VEP without specyfiying any option still returns an output with missing variants.

Regards,
Guillermo.



On 05/21/2013 01:49 PM, Will McLaren wrote:
Hi Guillermo,

None of those options should filter out variants.

Are you able to provide any of the files that recreate the problem?

Is there any chance that you are using VCF input and it contains non-variant lines - this would be where the ALT column is empty or "."? If so, this may be your problem. To force these to be included in the output, you should add --allow_non_variant.

Regards

Will

On 21 May 2013 09:40, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hello,

I've been checking VEP results, and i've noticed that I'm missing some input variants in the output.

I think this may be issued to some of the options i'm using to launch vep:

hgvs               1
canonical          1
ccds               1
check_svs          1
domains            1
gmaf               1
hgnc               1
maf_1kg            1
numbers            1
polyphen           b
regulatory         1
sift               b

Should be any of these options filtering output? I've disabled all plugins to run this test to be sure that it's not a plugin issue.

  *   With a 406 variant input vcf file, I've missed 16 variants.
  *   I then ran VEP with only those 16 missing variants and missed 3 on output.
  *   Rerun again and now with 3 missing variants and now not a single one was missing.

I would like to know what's behind that weird behaviour.

Thank you.

Best regards,
Guillermo.


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info<http://www.ensembl.info/>

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/




_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/

-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130522/6bb6a02f/attachment.html>


More information about the Dev mailing list