[ensembl-dev] Question regarding the varian_effect_predictor

Duarte Molha duartemolha at gmail.com
Wed Oct 19 15:56:00 BST 2011


Hi guys

I was wondering if anyone could tel me if there is a easy way for providing
sample ID correspondence between the annotation and the input VCF file:

Here is an example of what I am referring to:

Lets say I have a vcf entry like this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A13 A15
1 13273 . G C,T 57.59 .
AC=2;AF=1.00;AN=2;DP=39;Dels=0.00;HRun=0;HaplotypeScore=38.7838;MQ=8.42;MQ0=33;QD=1.48;SB=-0.01;sumGLbyD=2.30
GT:AD:DP:GQ:PL *1/1*:33,6:4:6.02:89,6,0 *0/1*:33,6:4:6.02:89,6,0

When I run this CVF file I get this all possible effects of the 2
alternative alleles on the position:

#Uploaded_variation Location Allele Gene Feature Feature_type Consequence
cDNA_position CDS_position Protein_position Amino_acids Codons
Existing_variation Extra
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000430492 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000430492 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000456328 Transcript
WITHIN_NON_CODING_GENE 521 - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000456328 Transcript
WITHIN_NON_CODING_GENE 521 - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000488147 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000488147 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000541675 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000541675 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000450305 Transcript
WITHIN_NON_CODING_GENE 313 - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000450305 Transcript
WITHIN_NON_CODING_GENE 313 - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000515242 Transcript
WITHIN_NON_CODING_GENE 514 - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000515242 Transcript
WITHIN_NON_CODING_GENE 514 - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000538476 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000538476 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000537342 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000537342 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000223972 ENST00000518655 Transcript
WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000223972 ENST00000518655 Transcript
WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000438504 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000438504 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 T ENSG00000227232 ENST00000423562 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T 1:13273 C ENSG00000227232 ENST00000423562 Transcript
DOWNSTREAM - - - - - - -

However if we look at the sample genotypes we see that both individual
samples only feature the 1 alternative allele.
It would be very useful if the script would take this into consideration for
2 reasons:

1) not all variations would have to be considered since in both individuals
only one of the 2 possible alternative alleles present ( I guess this could
speed up the script considerably)
2) We could also improve the annotation by referring only the samples that
have the variation.

As an example if the previous variation was instead:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A13 A15
1 13273 . G C,T 57.59 .
AC=2;AF=1.00;AN=2;DP=39;Dels=0.00;HRun=0;HaplotypeScore=38.7838;MQ=8.42;MQ0=33;QD=1.48;SB=-0.01;sumGLbyD=2.30
GT:AD:DP:GQ:PL 1/1:33,6:4:6.02:89,6,0 0/2:33,6:4:6.02:89,6,0

We could output only what is relevant for the each sample

1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000430492 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000456328 Transcript
WITHIN_NON_CODING_GENE 521 - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000488147 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000541675 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000450305 Transcript
WITHIN_NON_CODING_GENE 313 - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000515242 Transcript
WITHIN_NON_CODING_GENE 514 - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000538476 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000537342 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000223972 ENST00000518655 Transcript
WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000438504 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A13 1:13273 C ENSG00000227232 ENST00000423562 Transcript
DOWNSTREAM - - - - - - -

1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000430492 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000456328 Transcript
WITHIN_NON_CODING_GENE 521 - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000488147 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000541675 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000450305 Transcript
WITHIN_NON_CODING_GENE 313 - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000515242 Transcript
WITHIN_NON_CODING_GENE 514 - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000538476 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000537342 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000223972 ENST00000518655 Transcript
WITHIN_NON_CODING_GENE,INTRONIC - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000438504 Transcript
DOWNSTREAM - - - - - - -
1_13273_G/C/T A15 1:13273 T ENSG00000227232 ENST00000423562 Transcript
DOWNSTREAM - - - - - - -


Best regards

Duarte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20111019/6c93904d/attachment.html>


More information about the Dev mailing list