[ensembl-dev] FW: VEP polyphen predictions

Tue Jun 7 10:33:35 BST 2011

Hi Marc,

We use the ensembl translation sequences directly as input to polyphen, we don't map them to Uniprot protein sequences or anything beforehand. So our input files look like:

ENST00000397678_2_C ENST00000397678 2   A   C

and we also supply a FASTA file with the translation of ENST00000397678 to polyphen with the -s command line flag to run_pph.pl. 

Regards,

Graham Ritchie

Ensembl variation

On 7 Jun 2011, at 10:22, Marc Jan Bonder wrote:

> Hi,
>  
> I have a question, I’m using VEP2.0 to translate SNP positions to amino acids positions in transcripts and then predicting if it is damaging or not using polyphen. I know that you also include polyphen annotations in the latest version of VEP. This causes my question/problem when I try to annotate the amino acids substitutions using my polyphen set up, I sometimes get errors, it says that the returned amino acids are not in the transcript. When I look in the VEP output, then it does return a polyphen annotation. I use a convert table from the UCSC to translate the ensemble transcript ID’s to Swissprot id’s.
>  
> An example of this error:
> Input for VEP:
> 6             152453291          152453291          G/A       +
>  
> Output by VEP
> 6_152453291_G/A          6:152453291       A             ENSG00000131018          ENST00000367257                NON_SYNONYMOUS_CODING                3998      3998      1333      T/I          aCa/aTa               rs35591210                SIFT=tolerated(0.06);PolyPhen=benign(0.004);Condel=deleterious(0.885)
>  
> Input to polyphen:
> Q8NF91-4           T             I
>  
> Output by Polyphen
> ERROR: Neither AA1 (T) nor AA2 (I) in input matches Q8NF91-4 query sequence residue (R) at position (1333)
>  
> There are more of these errors in my polyphen input. Is this because you used another than the standard database for your polyphen annotations (not the unireff100 database)? Or is it because my conversions from ensemble transcript id to swissprot id’s aren’t correct? Or do you just convert the T into a R and then rerun the prediction? Or is it something completely different?
>  
> Regards,
>  
> Marc Jan Bonder
>  
> Bioinformatician at the University Medical Centre Groningen
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/