[ensembl-dev] FW: VEP polyphen predictions

Tue Jun 7 14:12:31 BST 2011

Hi Marc,

We did indeed use uniref100 as the blast database, as advised by the polyphen authors. 

You can download a fasta file with all the translations from ensembl 62 here:

ftp://ftp.ensembl.org/pub/release-62/fasta/homo_sapiens/pep/

Though note that the first part of the header for each protein sequence is the ENSP identifier, but you can find the associated ENST transcript identifier at the end of the header line.

Cheers,

Graham

On 7 Jun 2011, at 14:02, Marc Jan Bonder wrote:

> Hi Graham,
> 
> Do you use the unireff100 blast database or also your own blast database
> created with the ensemble transcripts?
> Also is there a possibility to download a fasta file with all the proteins
> from the ensemble transcripts?
> 
> Regards,
> Marc Jan Bonder
> 
> -----Oorspronkelijk bericht-----
> Van: Graham Ritchie [mailto:grsr at ebi.ac.uk] 
> Verzonden: dinsdag 7 juni 2011 11:34
> Aan: Marc Jan Bonder
> CC: dev at ensembl.org
> Onderwerp: Re: [ensembl-dev] FW: VEP polyphen predictions
> 
> Hi Marc,
> 
> We use the ensembl translation sequences directly as input to polyphen, we
> don't map them to Uniprot protein sequences or anything beforehand. So our
> input files look like:
> 
> ENST00000397678_2_C ENST00000397678 2   A   C
> 
> and we also supply a FASTA file with the translation of ENST00000397678 to
> polyphen with the -s command line flag to run_pph.pl. 
> 
> Regards,
> 
> Graham Ritchie
> 
> Ensembl variation
> 
> 
> On 7 Jun 2011, at 10:22, Marc Jan Bonder wrote:
> 
>> Hi,
>> 
>> I have a question, I'm using VEP2.0 to translate SNP positions to amino
> acids positions in transcripts and then predicting if it is damaging or not
> using polyphen. I know that you also include polyphen annotations in the
> latest version of VEP. This causes my question/problem when I try to
> annotate the amino acids substitutions using my polyphen set up, I sometimes
> get errors, it says that the returned amino acids are not in the transcript.
> When I look in the VEP output, then it does return a polyphen annotation. I
> use a convert table from the UCSC to translate the ensemble transcript ID's
> to Swissprot id's.
>> 
>> An example of this error:
>> Input for VEP:
>> 6             152453291          152453291          G/A       +
>> 
>> Output by VEP
>> 6_152453291_G/A          6:152453291       A             ENSG00000131018
> ENST00000367257                NON_SYNONYMOUS_CODING                3998
> 3998      1333      T/I          aCa/aTa               rs35591210
> SIFT=tolerated(0.06);PolyPhen=benign(0.004);Condel=deleterious(0.885)
>> 
>> Input to polyphen:
>> Q8NF91-4           T             I
>> 
>> Output by Polyphen
>> ERROR: Neither AA1 (T) nor AA2 (I) in input matches Q8NF91-4 query 
>> sequence residue (R) at position (1333)
>> 
>> There are more of these errors in my polyphen input. Is this because you
> used another than the standard database for your polyphen annotations (not
> the unireff100 database)? Or is it because my conversions from ensemble
> transcript id to swissprot id's aren't correct? Or do you just convert the T
> into a R and then rerun the prediction? Or is it something completely
> different?
>> 
>> Regards,
>> 
>> Marc Jan Bonder
>> 
>> Bioinformatician at the University Medical Centre Groningen 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): 
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/