[ensembl-dev] Variant effect predictor: regulatory information and MATRIX/HIGH_INF_POS

Will McLaren wm2 at ebi.ac.uk
Thu Oct 6 09:41:31 BST 2011


Hello Adam,

Answers inline below:

On 5 October 2011 21:49, A. P. Levine <a.levine at ucl.ac.uk> wrote:
> I have two questions regarding the Variant Effect Predictor (VEP).
>
> 1. Regulatory information
>
> When I use the VEP online it gives me the regulatory feature information
> without a problem. However, when I use the Perl script it does not report
> this information, e.g.:
>
> Input (VCF format):
> 18   10304   .       TACCC   TAACCC
> 18   10333   .       T       C
> 18   10334   .       T       A
> 18   10405   .       C       T
> 18   10411   .       TA      TAA
>
> Output from web version:
> Uploaded Variation    Location    Allele    Gene    Feature    Feature
> type    Consequence    Position in cDNA    Position in CDS    Position in
> protein    Amino acid change    Codon change    Co-located Variation
> Extra
> 18_10305_ACCC/AACCC    18:10305-10308    -    -    -    -    INTERGENIC
> -    -    -    -    -    -    -
> 18_10305_ACCC/AACCC    18:10305-10308    AACCC    -    ENSR00000667451
> RegulatoryFeature    REGULATORY_REGION    -    -    -    -    -    -    -
> 18_10333_T/C    18:10333    C    -    ENSR00000667451
> RegulatoryFeature    REGULATORY_REGION    -    -    -    -    -    -    -
> 18_10333_T/C    18:10333    -    -    -    -    INTERGENIC    -    -    -
> -    -    -    -
> 18_10334_T/A    18:10334    A    -    ENSR00000667451
> RegulatoryFeature    REGULATORY_REGION    -    -    -    -    -    -    -
> 18_10334_T/A    18:10334    -    -    -    -    INTERGENIC    -    -    -
> -    -    -    -
> 18_10405_C/T    18:10405    -    -    -    -    INTERGENIC    -    -    -
> -    -    -    -
> 18_10405_C/T    18:10405    T    -    ENSR00000667451
> RegulatoryFeature    REGULATORY_REGION    -    -    -    -    -    -    -
> 18_10441_C/T    18:10441    T    -    ENSR00000667451
> RegulatoryFeature    REGULATORY_REGION    -    -    -    -    -
> rs56928311    -
>
> Output from Perl script (perl variant_effect_predictor.pl -i test -o
> test.out --sift b --polyphen b --condel b --regulatory --hgvs --gene --hgnc
> --check_existing):
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
> Protein_position        Amino_acids     Codons  Existing_variation    Extra
> 18_10305_ACCC/AACCC     18:10305-10308  -       -       -       -
> INTERGENIC      -       -       -       -       -       -       -
> 18_10333_T/C    18:10333        -       -       -       -
> INTERGENIC      -       -       -       -       -       -       -
> 18_10334_T/A    18:10334        -       -       -       -
> INTERGENIC      -       -       -       -       -       -       -
> 18_10405_C/T    18:10405        -       -       -       -
> INTERGENIC      -       -       -       -       -       -       -
> 18_10441_C/T    18:10441        -       -       -       -
> INTERGENIC      -       -       -       -       -       rs56928311      -
>
> Any thoughts on why it might not be working would be appreciated.
>

Seems like you might be missing the Ensembl Funcgen API - can you
check that it is installed? If I run the same command with that input
I get the same output as the web.

> 2. MATRIX/HIGH_INF_POS
>
> The header from running the perl script is as follows:
>     ## ENSEMBL VARIANT EFFECT PREDICTOR v2.1
>     ## Output produced at 2011-10-05 21:44:03
>     ## Connected to homo_sapiens_core_63_37 on ensembldb.ensembl.org
>     ## Using API version 63, DB version 63
>     ## Extra column keys:
>     ## HGNC         : HGNC gene identifier
>     ## ENSP         : Ensembl protein identifer
>     ## HGVSc        : HGVS coding sequence name
>     ## HGVSp        : HGVS protein sequence name
>     ## SIFT         : SIFT prediction
>     ## PolyPhen     : PolyPhen prediction
>     ## Condel       : Condel SIFT/PolyPhen consensus prediction
>     ## MATRIX       : The source and identifier of a transcription factor
> binding profile aligned at this position
>     ## HIGH_INF_POS : A flag indicating if the variant falls in a high
> information position of a transcription factor binding profile
>
> Are MATRIX and HIGH_INF_POS operational?

They are operational yes, and will appear when you see a variant
overlap with a transcription factor binding site that has an
associated binding matrix.

This is an example line of input that should show MATRIX and
HIGH_INF_POS in the output

17  46622288    46622288    G/A +

Hope this helps

Will McLaren
Ensembl Variation

>
> Thank you,
>
> Adam
>
> Adam P. Levine
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


More information about the Dev mailing list