[ensembl-dev] Empty GMAF field in VEP (possible bug?)

Duarte Molha duartemolha at gmail.com
Wed May 22 11:42:34 BST 2013


I believe if your output is is VCF format, all fields are output even if
they are empty, because all fields are separated by pipes "|" and the order
has to be retained.

I believe what you had asked previously was for the dev team to give you an
example of a annotated VCF with all fields filled in , and they pointed out
correctly that this is not possible since the presence of certain fields
will exclude other from being filled . As an example, a intergenic variant
means that all the CDS, and protein info fields are necessarily empty.

However, if your output is the tab delimited version fields that contain no
information and are stored in the Extra column should not be output if they
are empty...

That is why I believe this to be a bug.

Cheers
Duarte

=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================


On Wed, May 22, 2013 at 11:32 AM, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hello Duarte,
>
> To my understanding it shouldn't be printed if it's empty like you said.
>
> I'm telling you this because when I started to use VEP I asked if it was
> possible to print all the fields even if they empty. I was told you cannot
> if you use VCF as output format.
>
> I wish they add an option you can force empty output to be printed on VCF.
> Would make things a lot easier if you want to run other scripts or parsers
> over VCF output.
>
> Regards,
> Guillermo.
>
>
> On 05/22/2013 12:01 PM, Duarte Molha wrote:
>
>  Hi Will
>
>  I came across another unexpected  behavior that I would like to report
> to you...
> After running one of my VCF files a few variations output an empty GMAF
> field like
>
>  GMAF=:
>
>  Here is one of the output lines in question :
>
>  1_145075775_G/A 1:145075775    A        ENSG00000178104 ENST00000530740
> Transcript      missense_variant    127 88      30      P/S     Cct/Tct
> rs76199660,COSM329521
> alt_alleles=A;ENSP=ENSP00000435654;PolyPhen=possibly_damaging(0.739);Grantham=74;SIFT=deleterious(0);quality_score=113.16;AD=220,25;GT=0/1;CAROL=Deleterious(0.999);DP=247;GQ=99;CLIN_SIG=other;Condel=deleterious(0.715);GMAF=:;PL=153,0,6524;ZYG=HET;BLOSUM62=-1;EXON=1/46;IND=S100807505;HGNC=PDE4DIP;GERP++_RS=3.25
>
>
>  here is the VCF line that produces it:
>
>  1   145075775   . G   A 113.16 SNPhardFilter
> AC=1;AF=0.056;AN=18;BaseQRankSum=-5.651;DP=1883;DS;Dels=0.00;FS=0.784;HRun=0;HaplotypeScore=26.7686;MQ=57.87;MQ0=0;MQRankSum=-0.030;QD=0.46;ReadPosRankSum=0.964;SB=-64.14;set=FilteredInAll
>   GT:AD:DP:GQ:PL    0/0:247,1:248:99:0,617,7876 0/0:249,0:250:99:0,644,8180
>  0/1:220,25:247:99:153,0,6524     0/0:95,0:95:99:0,247,3222
> 0/0:247,1:248:99:0,656,8350     0/0:249,1:250:99:0,598,8015
> 0/0:215,0:215:99:0,560,7164 0/0:164,0:164:99:0,415,5416
> 0/0:166,0:166:99:0,433,5729
>
>  I was expecting that if you do not have GMAF information for any given
> variation the field simply would not be output. Is this correct?
>
>  Best regards
>
>  Duarte
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130522/2108ac1c/attachment.html>


More information about the Dev mailing list