[ensembl-dev] Question regarding the varian_effect_predictor

Will McLaren wm2 at ebi.ac.uk
Wed Oct 19 16:20:00 BST 2011


Hi Duarte,

Currently the VEP does not take into account any of the sample
information in the VCF file; VCF was not the original input format for
the script, so everything to do with VCF has kind of been "bolted on"
(not to say that you shouldn't use VCF of course!!!).

I think we should be able to put in some code such that you could
supply a flag that would mean the script would only consider alleles
observed in the individuals in the VCF file - I will look at doing
this for the next release of the VEP (v2.3, due along with e!65 at the
end of November).

In the short term, there might be a way to pre-process your VCF file
using VCFtools or something such that only the observed alleles remain
in the file...

Hope this helps

Will McLaren
Ensembl Variation


On 19 October 2011 15:56, Duarte Molha <duartemolha at gmail.com> wrote:
> Hi guys
> I was wondering if anyone could tel me if there is a easy way for providing
> sample ID correspondence between the annotation and the input VCF file:
> Here is an example of what I am referring to:
> Lets say I have a vcf entry like this:
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A13 A15
> 1 13273 . G C,T 57.59 .
> AC=2;AF=1.00;AN=2;DP=39;Dels=0.00;HRun=0;HaplotypeScore=38.7838;MQ=8.42;MQ0=33;QD=1.48;SB=-0.01;sumGLbyD=2.30
> GT:AD:DP:GQ:PL 1/1:33,6:4:6.02:89,6,0 0/1:33,6:4:6.02:89,6,0
> When I run this CVF file I get this all possible effects of the 2
> alternative alleles on the position:
> #Uploaded_variation Location Allele Gene Feature Feature_type Consequence
> cDNA_position CDS_position Protein_position Amino_acids Codons
> Existing_variation Extra
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000430492 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000430492 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000456328 Transcript
> WITHIN_NON_CODING_GENE 521 - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000456328 Transcript
> WITHIN_NON_CODING_GENE 521 - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000488147 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000488147 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000541675 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000541675 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000450305 Transcript
> WITHIN_NON_CODING_GENE 313 - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000450305 Transcript
> WITHIN_NON_CODING_GENE 313 - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000515242 Transcript
> WITHIN_NON_CODING_GENE 514 - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000515242 Transcript
> WITHIN_NON_CODING_GENE 514 - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000538476 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000538476 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000537342 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000537342 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000518655 Transcript
> WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000518655 Transcript
> WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000438504 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000438504 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000423562 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000423562 Transcript
> DOWNSTREAM - - - - - - -
> However if we look at the sample genotypes we see that both individual
> samples only feature the 1 alternative allele.
> It would be very useful if the script would take this into consideration for
> 2 reasons:
> 1) not all variations would have to be considered since in both individuals
> only one of the 2 possible alternative alleles present ( I guess this could
> speed up the script considerably)
> 2) We could also improve the annotation by referring only the samples that
> have the variation.
> As an example if the previous variation was instead:
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A13 A15
> 1 13273 . G C,T 57.59 .
> AC=2;AF=1.00;AN=2;DP=39;Dels=0.00;HRun=0;HaplotypeScore=38.7838;MQ=8.42;MQ0=33;QD=1.48;SB=-0.01;sumGLbyD=2.30
> GT:AD:DP:GQ:PL 1/1:33,6:4:6.02:89,6,0 0/2:33,6:4:6.02:89,6,0
> We could output only what is relevant for the each sample
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000430492 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000456328 Transcript
> WITHIN_NON_CODING_GENE 521 - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000488147 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000541675 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000450305 Transcript
> WITHIN_NON_CODING_GENE 313 - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000515242 Transcript
> WITHIN_NON_CODING_GENE 514 - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000538476 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000537342 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000518655 Transcript
> WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000438504 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000423562 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000430492 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000456328 Transcript
> WITHIN_NON_CODING_GENE 521 - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000488147 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000541675 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000450305 Transcript
> WITHIN_NON_CODING_GENE 313 - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000515242 Transcript
> WITHIN_NON_CODING_GENE 514 - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000538476 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000537342 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000518655 Transcript
> WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000438504 Transcript
> DOWNSTREAM - - - - - - -
> 1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000423562 Transcript
> DOWNSTREAM - - - - - - -
>
> Best regards
> Duarte
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>




More information about the Dev mailing list