[ensembl-dev] VEP variants missing on output
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Wed May 22 08:22:48 BST 2013
Hello Will,
You was right. I'm getting the 406 variants.
I just dropped html in case.
As always flawless Ensembl support. Thank you !
Best regards,
Guillermo.
On 05/21/2013 05:13 PM, Will McLaren wrote:
> You get one line of output for each variant/feature overlap, so you
> will almost always see more output lines than input if you use the
> default output format. If you use VCF output, you only get one line
> per variant.
>
> You can check how many unique variants there are in the output with e.g.:
>
> grep -v # variant_effect_output.txt | cut -f 1 | sort -u | wc -l
>
> assuming your variants have unique names.
>
> Try dropping "html" from your config, see if that makes any difference
> - as the newest feature there, it's got a higher chance of causing
> problems!
>
> Will
>
>
>
>
> On 21 May 2013 16:02, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hello Will,
>
> I'm getting more 3000 lines of file output.. this seems really
> weird....
>
> wc -l variant_effect_output.txt
>
> *3936*
>
> Here's the way I'm proceeding:
>
> ./variant_effect_predictor.pl <http://variant_effect_predictor.pl> -i /home/likewise-open/SGNET/gmarco/vep_71_annotation_check/input.vcf -force -fork 4 --database --config vep_71.test
>
>
> Here's the content of vep_71.test:
>
> dir /home/likewise-open/SGNET/gmarco/.vep
> toplevel_dir /home/likewise-open/SGNET/gmarco/.vep
> force_overwrite 1
> format vcf
> html 1
> host 192.19.x.xx
> port 3306
> user myuser
> password mypassword
> buffer_size 5000
>
> hgvs 1
> canonical 1
> ccds 1
> check_svs 1
> domains 1
> gmaf 1
> hgnc 1
> maf_1kg 1
> numbers 1
> polyphen b
> regulatory 1
> sift b
>
> Best regards,
> Guillermo.
>
>
> On 05/21/2013 02:30 PM, Will McLaren wrote:
>> Hi Guillermo,
>>
>> I'm unable to recreate this, sorry!
>>
>> I get 406 going in, 406 coming out every time, whichever
>> combination of those options above I use, and whether I use VCF
>> or standard output.
>>
>> Here's my run (minus -check_sv):
>>
>> > perl variant_effect_predictor.pl
>> <http://variant_effect_predictor.pl> -i guill.vcf -vcf -cache
>> -force -fork 4 -hgvs -canon -ccds -domains -gmaf -hgnc -maf_1kg
>> -numbers -poly b -regu -sift b -fasta
>> ~/NFS/Fasta/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa
>> 2013-05-21 13:24:26 - Checking/creating FASTA index
>> 2013-05-21 13:24:26 - Read existing cache info
>> 2013-05-21 13:24:26 - Starting...
>> 2013-05-21 13:24:26 - Detected format of input file as vcf
>> 2013-05-21 13:24:26 - Read 406 variants into buffer
>> 2013-05-21 13:24:26 - Reading transcript data from cache and/or
>> database
>> [================================================================] [
>> 100% ]
>> 2013-05-21 13:24:30 - Retrieved 10891 transcripts (0 mem, 10919
>> cached, 0 DB, 28 duplicates)
>> 2013-05-21 13:24:30 - Reading regulatory data from cache and/or
>> database
>> [================================================================] [
>> 100% ]
>> 2013-05-21 13:24:35 - Retrieved 36955 regulatory features (0 mem,
>> 36955 cached, 0 DB, 0 duplicates)
>> 2013-05-21 13:24:35 - Calculating consequences
>> [================================================================] [
>> 100% ]
>> 2013-05-21 13:24:56 - Writing output2013-05-21 13:24:56 -
>> Processed 406 total variants (14 vars/sec, 14 vars/sec total)
>> 2013-05-21 13:24:56 - Wrote stats summary to
>> variant_effect_output.txt_summary.html
>> 2013-05-21 13:24:56 - Finished!
>> > wc -l variant_effect_output.txt
>> 408
>>
>> It's 408 as it's adding two header lines to the VCF output.
>>
>> Which 16 are missing from your output, and is it the same 16 each
>> time?
>>
>> Try writing to a different output file, or on a different disk if
>> you can (perhaps disk space is an issue?)
>>
>> Will
>>
>>
>> On 21 May 2013 13:15, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com
>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>> Hello Will,
>>
>> Here's the input:
>> https://github.com/guillermomarco/vep_plugins_71/blob/master/missing_variants/missing_output_variants.vcf
>>
>> As you said it's not about the options or plugins. Launching
>> VEP without specyfiying any option still returns an output
>> with missing variants.
>>
>> Regards,
>> Guillermo.
>>
>>
>>
>> On 05/21/2013 01:49 PM, Will McLaren wrote:
>>> Hi Guillermo,
>>>
>>> None of those options should filter out variants.
>>>
>>> Are you able to provide any of the files that recreate the
>>> problem?
>>>
>>> Is there any chance that you are using VCF input and it
>>> contains non-variant lines - this would be where the ALT
>>> column is empty or "."? If so, this may be your problem. To
>>> force these to be included in the output, you should add
>>> --allow_non_variant.
>>>
>>> Regards
>>>
>>> Will
>>>
>>>
>>> On 21 May 2013 09:40, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com
>>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>> Hello,
>>>
>>> I've been checking VEP results, and i've noticed that
>>> I'm missing some input variants in the output.
>>>
>>> I think this may be issued to some of the options i'm
>>> using to launch vep:
>>>
>>> hgvs 1
>>> canonical 1
>>> ccds 1
>>> check_svs 1
>>> domains 1
>>> gmaf 1
>>> hgnc 1
>>> maf_1kg 1
>>> numbers 1
>>> polyphen b
>>> regulatory 1
>>> sift b
>>>
>>> Should be any of these options filtering output? I've
>>> disabled all plugins to run this test to be sure that
>>> it's not a plugin issue.
>>>
>>> * With a 406 variant input vcf file, I've missed 16
>>> variants.
>>> * I then ran VEP with only those 16 missing variants
>>> and missed 3 on output.
>>> * Rerun again and now with 3 missing variants and now
>>> not a single one was missing.
>>>
>>> I would like to know what's behind that weird behaviour.
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info
>>> <http://www.ensembl.info/>
>>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130522/a7f935a8/attachment.html>
More information about the Dev
mailing list