[ensembl-dev] Question about variant effect predictor/Condel

Mark Aquino aquinom85 at me.com
Tue Nov 29 17:50:31 GMT 2011


Hey Graham,

Thank you.  I think that's going to be the way I'll have to do it.  I originally did create such an input file with each codon sequence at the position of interest and tried running it through the variant effect predictor like so:

#CHR    POS     ID      REF     ALT
17      41276034        TEST1   CAG     GAT

but the output was  like this:
TEST1   17:41276034-41276036    GAT     ENSG00000012048 ENST00000352993 Transcript      NON_SYNONYMOUS_CODING,SPLICE_SITE       310-312 78-80   26-27   IC/IS   atCTGt/atATCt   -       -

I overwrote the first codon in the first coding exon of BRCA1 here as an example but the output does not make sense to me and, of course, there's no scores given.  Was I just doing this incorrectly to begin with or is there any chance it's a bug in the code?


On Nov 29, 2011, at 11:20 AM, Graham Ritchie wrote:

> Hi Mark,
> 
> If you want to fetch all possible amino acid substitution predictions from sift or polyphen for a particular codon you could create an input file with every possible codon sequence at the position of interest, but it would be much more efficient to use the API to retrieve this set of substitution scores directly. We store all possible scores for every protein in the variation database as a matrix in a compressed form. The format is described in detail here:
> 
> http://www.ensembl.org/info/docs/variation/index.html#nsSNP_data_format
> 
> But the simplest way to access these matrices is to use an API script to fetch a ProteinFunctionPredictionMatrix for your protein of interest and then call its 'get_prediction' method to get the score for a particular position and amino acid, looping over all possible amino acids for your position. There is some detailed documentation on this class in the API documentation here:
> 
> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1ProteinFunctionPredictionMatrix.html
> 
> You would need to work out which peptide position your codon maps to, but there are methods in the TranscriptVariationAllele class that should help you (probably translation_start and translation_end):
> 
> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1TranscriptVariation.html
> 
> We don't store a matrix for the condel scores because they are very quick to compute, given a sift and polyphen score, but you can compute this score using the get_condel_prediction subroutine in the Condel module.
> 
> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1Utils_1_1Condel.html
> 
> If you need any help gluing this all together let me know!
> 
> Cheers,
> 
> Graham
> 
> 
> On 28 Nov 2011, at 15:53, Mark Aquino wrote:
> 
>> Hi,
>> 
>> I was wondering if it is possible to get functional predictions for entire codon swaps  (to easily get a score for any of the 19 non-reference AAs at any codon) using the variant effect predictor? My assumption, and through trying to insert a whole codon into a VCF and running VEP on that site, is that the answer is no but I wanted to double check, and perhaps see if it would be possible in the future or if it's simply unfeasible.
>> 
>> Best,
>> Mark
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20111129/bc616c46/attachment.html>


More information about the Dev mailing list