[ensembl-dev] VEP variants missing on output
Will McLaren
wm2 at ebi.ac.uk
Wed May 22 09:20:15 BST 2013
It doesn't cause any problems that I can see, but of course as always
please report if you do see any problems.
Thanks
Will
On 22 May 2013 09:09, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:
> It doesn't. I just dropped it because Will said it could be buggy.
>
>
>
> On 05/22/2013 10:07 AM, Duarte Molha wrote:
>
> So using the html misses variants?****
>
> ** **
>
> ** **
>
> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org<dev-bounces at ensembl.org>]
> *On Behalf Of *Guillermo Marco Puche
> *Sent:* 22 May 2013 08:23
> *To:* dev at ensembl.org
> *Subject:* Re: [ensembl-dev] VEP variants missing on output****
>
> ** **
>
> Hello Will,
>
> You was right. I'm getting the 406 variants.
> I just dropped html in case.
>
> As always flawless Ensembl support. Thank you !
>
> Best regards,
> Guillermo.
>
> On 05/21/2013 05:13 PM, Will McLaren wrote:****
>
> You get one line of output for each variant/feature overlap, so you will
> almost always see more output lines than input if you use the default
> output format. If you use VCF output, you only get one line per variant. *
> ***
>
> ** **
>
> You can check how many unique variants there are in the output with e.g.:*
> ***
>
> ** **
>
> grep -v # variant_effect_output.txt | cut -f 1 | sort -u | wc -l****
>
> ** **
>
> assuming your variants have unique names.****
>
> ** **
>
> Try dropping "html" from your config, see if that makes any difference -
> as the newest feature there, it's got a higher chance of causing problems!
> ****
>
> ** **
>
> Will****
>
> ** **
>
> ** **
>
> ** **
>
> On 21 May 2013 16:02, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:****
>
> Hello Will,
>
> I'm getting more 3000 lines of file output.. this seems really weird....**
> **
>
> wc -l variant_effect_output.txt****
>
> *3936*
>
> Here's the way I'm proceeding:****
>
> ./variant_effect_predictor.pl -i /home/likewise-open/SGNET/gmarco/vep_71_annotation_check/input.vcf -force -fork 4 --database --config vep_71.test****
>
>
> Here's the content of vep_71.test:
>
> dir /home/likewise-open/SGNET/gmarco/.vep
> toplevel_dir /home/likewise-open/SGNET/gmarco/.vep
> force_overwrite 1
> format vcf
> html 1
> host 192.19.x.xx
> port 3306
> user myuser
> password mypassword
> buffer_size 5000 ****
>
>
> hgvs 1
> canonical 1
> ccds 1
> check_svs 1
> domains 1
> gmaf 1
> hgnc 1
> maf_1kg 1
> numbers 1
> polyphen b
> regulatory 1
> sift b****
>
> Best regards,
> Guillermo. ****
>
>
>
> On 05/21/2013 02:30 PM, Will McLaren wrote:****
>
> Hi Guillermo, ****
>
>
> I'm unable to recreate this, sorry!****
>
> ** **
>
> I get 406 going in, 406 coming out every time, whichever combination of
> those options above I use, and whether I use VCF or standard output.****
>
> ** **
>
> Here's my run (minus -check_sv):****
>
> ** **
>
> > perl variant_effect_predictor.pl -i guill.vcf -vcf -cache -force -fork
> 4 -hgvs -canon -ccds -domains -gmaf -hgnc -maf_1kg -numbers -poly b -regu
> -sift b -fasta ~/NFS/Fasta/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa*
> ***
>
> 2013-05-21 13:24:26 - Checking/creating FASTA index****
>
> 2013-05-21 13:24:26 - Read existing cache info****
>
> 2013-05-21 13:24:26 - Starting...****
>
> 2013-05-21 13:24:26 - Detected format of input file as vcf****
>
> 2013-05-21 13:24:26 - Read 406 variants into buffer****
>
> 2013-05-21 13:24:26 - Reading transcript data from cache and/or database**
> **
>
> [================================================================] [ 100%
> ]****
>
> 2013-05-21 13:24:30 - Retrieved 10891 transcripts (0 mem, 10919 cached, 0
> DB, 28 duplicates)****
>
> 2013-05-21 13:24:30 - Reading regulatory data from cache and/or database**
> **
>
> [================================================================] [ 100%
> ]****
>
> 2013-05-21 13:24:35 - Retrieved 36955 regulatory features (0 mem, 36955
> cached, 0 DB, 0 duplicates)****
>
> 2013-05-21 13:24:35 - Calculating consequences****
>
> [================================================================] [ 100%
> ]****
>
> 2013-05-21 13:24:56 - Writing output2013-05-21 13:24:56 - Processed 406
> total variants (14 vars/sec, 14 vars/sec total)****
>
> 2013-05-21 13:24:56 - Wrote stats summary to
> variant_effect_output.txt_summary.html****
>
> 2013-05-21 13:24:56 - Finished!****
>
> > wc -l variant_effect_output.txt****
>
> 408****
>
> ** **
>
> It's 408 as it's adding two header lines to the VCF output.****
>
> ** **
>
> Which 16 are missing from your output, and is it the same 16 each time?***
> *
>
> ** **
>
> Try writing to a different output file, or on a different disk if you can
> (perhaps disk space is an issue?)****
>
> ** **
>
> Will****
>
> ** **
>
> On 21 May 2013 13:15, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:****
>
> Hello Will,
>
> Here's the input:
> https://github.com/guillermomarco/vep_plugins_71/blob/master/missing_variants/missing_output_variants.vcf
>
> As you said it's not about the options or plugins. Launching VEP without
> specyfiying any option still returns an output with missing variants.
>
> Regards,
> Guillermo. ****
>
>
>
>
> On 05/21/2013 01:49 PM, Will McLaren wrote:****
>
> Hi Guillermo, ****
>
> ** **
>
> None of those options should filter out variants.****
>
> ** **
>
> Are you able to provide any of the files that recreate the problem?****
>
> ** **
>
> Is there any chance that you are using VCF input and it contains
> non-variant lines - this would be where the ALT column is empty or "."? If
> so, this may be your problem. To force these to be included in the output,
> you should add --allow_non_variant.****
>
> ** **
>
> Regards****
>
> ** **
>
> Will****
>
> ** **
>
> On 21 May 2013 09:40, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:****
>
> Hello,
>
> I've been checking VEP results, and i've noticed that I'm missing some
> input variants in the output.
>
> I think this may be issued to some of the options i'm using to launch vep:
>
> hgvs 1
> canonical 1
> ccds 1
> check_svs 1
> domains 1
> gmaf 1
> hgnc 1
> maf_1kg 1
> numbers 1
> polyphen b
> regulatory 1
> sift b
>
> Should be any of these options filtering output? I've disabled all plugins
> to run this test to be sure that it's not a plugin issue.****
>
> - With a 406 variant input vcf file, I've missed 16 variants. ****
> - I then ran VEP with only those 16 missing variants and missed 3 on
> output. ****
> - Rerun again and now with 3 missing variants and now not a single one
> was missing.****
>
> I would like to know what's behind that weird behaviour.****
>
> Thank you.****
>
> Best regards,
> Guillermo.****
>
> ** **
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info ****
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/****
>
> ** **
>
>
>
> ****
>
> _______________________________________________****
>
> Dev mailing list Dev at ensembl.org****
>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev****
>
> Ensembl Blog: http://www.ensembl.info/****
>
> ** **
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/****
>
> ** **
>
>
>
>
> ****
>
> _______________________________________________****
>
> Dev mailing list Dev at ensembl.org****
>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev****
>
> Ensembl Blog: http://www.ensembl.info/****
>
> ** **
>
> -****
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130522/592c1be4/attachment.html>
More information about the Dev
mailing list