[ensembl-dev] Variant effect predictor: regulatory information and MATRIX/HIGH_INF_POS
Will McLaren
wm2 at ebi.ac.uk
Thu Oct 6 09:41:31 BST 2011
Hello Adam,
Answers inline below:
On 5 October 2011 21:49, A. P. Levine <a.levine at ucl.ac.uk> wrote:
> I have two questions regarding the Variant Effect Predictor (VEP).
>
> 1. Regulatory information
>
> When I use the VEP online it gives me the regulatory feature information
> without a problem. However, when I use the Perl script it does not report
> this information, e.g.:
>
> Input (VCF format):
> 18 10304 . TACCC TAACCC
> 18 10333 . T C
> 18 10334 . T A
> 18 10405 . C T
> 18 10411 . TA TAA
>
> Output from web version:
> Uploaded Variation Location Allele Gene Feature Feature
> type Consequence Position in cDNA Position in CDS Position in
> protein Amino acid change Codon change Co-located Variation
> Extra
> 18_10305_ACCC/AACCC 18:10305-10308 - - - - INTERGENIC
> - - - - - - -
> 18_10305_ACCC/AACCC 18:10305-10308 AACCC - ENSR00000667451
> RegulatoryFeature REGULATORY_REGION - - - - - - -
> 18_10333_T/C 18:10333 C - ENSR00000667451
> RegulatoryFeature REGULATORY_REGION - - - - - - -
> 18_10333_T/C 18:10333 - - - - INTERGENIC - - -
> - - - -
> 18_10334_T/A 18:10334 A - ENSR00000667451
> RegulatoryFeature REGULATORY_REGION - - - - - - -
> 18_10334_T/A 18:10334 - - - - INTERGENIC - - -
> - - - -
> 18_10405_C/T 18:10405 - - - - INTERGENIC - - -
> - - - -
> 18_10405_C/T 18:10405 T - ENSR00000667451
> RegulatoryFeature REGULATORY_REGION - - - - - - -
> 18_10441_C/T 18:10441 T - ENSR00000667451
> RegulatoryFeature REGULATORY_REGION - - - - -
> rs56928311 -
>
> Output from Perl script (perl variant_effect_predictor.pl -i test -o
> test.out --sift b --polyphen b --condel b --regulatory --hgvs --gene --hgnc
> --check_existing):
> #Uploaded_variation Location Allele Gene Feature
> Feature_type Consequence cDNA_position CDS_position
> Protein_position Amino_acids Codons Existing_variation Extra
> 18_10305_ACCC/AACCC 18:10305-10308 - - - -
> INTERGENIC - - - - - - -
> 18_10333_T/C 18:10333 - - - -
> INTERGENIC - - - - - - -
> 18_10334_T/A 18:10334 - - - -
> INTERGENIC - - - - - - -
> 18_10405_C/T 18:10405 - - - -
> INTERGENIC - - - - - - -
> 18_10441_C/T 18:10441 - - - -
> INTERGENIC - - - - - rs56928311 -
>
> Any thoughts on why it might not be working would be appreciated.
>
Seems like you might be missing the Ensembl Funcgen API - can you
check that it is installed? If I run the same command with that input
I get the same output as the web.
> 2. MATRIX/HIGH_INF_POS
>
> The header from running the perl script is as follows:
> ## ENSEMBL VARIANT EFFECT PREDICTOR v2.1
> ## Output produced at 2011-10-05 21:44:03
> ## Connected to homo_sapiens_core_63_37 on ensembldb.ensembl.org
> ## Using API version 63, DB version 63
> ## Extra column keys:
> ## HGNC : HGNC gene identifier
> ## ENSP : Ensembl protein identifer
> ## HGVSc : HGVS coding sequence name
> ## HGVSp : HGVS protein sequence name
> ## SIFT : SIFT prediction
> ## PolyPhen : PolyPhen prediction
> ## Condel : Condel SIFT/PolyPhen consensus prediction
> ## MATRIX : The source and identifier of a transcription factor
> binding profile aligned at this position
> ## HIGH_INF_POS : A flag indicating if the variant falls in a high
> information position of a transcription factor binding profile
>
> Are MATRIX and HIGH_INF_POS operational?
They are operational yes, and will appear when you see a variant
overlap with a transcription factor binding site that has an
associated binding matrix.
This is an example line of input that should show MATRIX and
HIGH_INF_POS in the output
17 46622288 46622288 G/A +
Hope this helps
Will McLaren
Ensembl Variation
>
> Thank you,
>
> Adam
>
> Adam P. Levine
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
More information about the Dev
mailing list