[ensembl-dev] Question about variant effect predictor/Condel

Graham Ritchie grsr at ebi.ac.uk
Mon Dec 5 13:02:06 GMT 2011


Hi Mark,

This warning means that you tried to fetch a sift or polyphen prediction for an invalid peptide position or amino acid. Under what circumstances are you seeing this error? Are you running the VEP, or are you using a ProteinFunctionPredictionMatrix from the API directly? In either case, if you can tell me how to reproduce the error I will look into it further.

Cheers,

Graham


On 2 Dec 2011, at 22:31, Mark Aquino wrote:

> Hey Graham,
> 
> Do you know what this error means:
> -------------------- WARNING ----------------------
> MSG: Offset outside of prediction matrix for position 62 and amino acid S?
> FILE: EnsEMBL/Variation/ProteinFunctionPredictionMatrix.pm LINE: 626
> CALLED BY: EnsEMBL/Variation/ProteinFunctionPredictionMatrix.pm  LINE: 271
> Ensembl API version = 64
> ---------------------------------------------------
> 
> On Nov 30, 2011, at 5:54 AM, Graham Ritchie wrote:
> 
>> Hi Mark,
>> 
>> Actually, on reflection, your example does in fact really only constitute a single amino acid substitution, because ATC and ATA both translate to the same amino acid (isoleucine), and only the second codon change actually results in an amino acid substitution, so we could probably recognise such cases and provide a score for the one real substitution.
>> 
>> I will think about fixing this in future releases.
>> 
>> Cheers,
>> 
>> Graham
>> 
>> 
>> On 30 Nov 2011, at 10:42, Graham Ritchie wrote:
>> 
>>> Hi Mark,
>>> 
>>> There are a couple of issues here. Firstly, the coordinates and sequence you have specified do not correspond to the first codon of this transcript, rather they correspond to a location overlapping the last two codons of exon 2 (though the final codon is in fact split by the intron, so the end of the codon is on exon 3). Secondly, sift and polyphen (and therefore condel) only provide predictions for single amino acid substitutions, and so you can't retrieve scores for a multiple substitution, such as this case.
>>> 
>>> If you adjust your coordinates so that you do hit the first codon you get the expected results, and you could systematically try every possible codon to retrieve every score for this position, but as I mentioned before, using the API to do this directly will be much more efficient, let me know if you need some more pointers on how to achieve what you're after. Here's a version of your example that seems to do what you expect.
>>> 
>>> #CHR    POS     ID      REF     ALT
>>> 17      41276111        TEST1   CAT     GAT
>>> 
>>> Cheers,
>>> 
>>> Graham
>>> 
>>> 
>>> On 29 Nov 2011, at 17:50, Mark Aquino wrote:
>>> 
>>>> Hey Graham,
>>>> 
>>>> Thank you.  I think that's going to be the way I'll have to do it.  I originally did create such an input file with each codon sequence at the position of interest and tried running it through the variant effect predictor like so:
>>>> 
>>>> #CHR    POS     ID      REF     ALT
>>>> 17      41276034        TEST1   CAG     GAT
>>>> 
>>>> but the output was  like this:
>>>> TEST1   17:41276034-41276036    GAT     ENSG00000012048 ENST00000352993 Transcript      NON_SYNONYMOUS_CODING,SPLICE_SITE       310-312 78-80   26-27   IC/IS   atCTGt/atATCt   -       -
>>>> 
>>>> I overwrote the first codon in the first coding exon of BRCA1 here as an example but the output does not make sense to me and, of course, there's no scores given.  Was I just doing this incorrectly to begin with or is there any chance it's a bug in the code?
>>>> 
>>>> 
>>>> On Nov 29, 2011, at 11:20 AM, Graham Ritchie wrote:
>>>> 
>>>>> Hi Mark,
>>>>> 
>>>>> If you want to fetch all possible amino acid substitution predictions from sift or polyphen for a particular codon you could create an input file with every possible codon sequence at the position of interest, but it would be much more efficient to use the API to retrieve this set of substitution scores directly. We store all possible scores for every protein in the variation database as a matrix in a compressed form. The format is described in detail here:
>>>>> 
>>>>> http://www.ensembl.org/info/docs/variation/index.html#nsSNP_data_format
>>>>> 
>>>>> But the simplest way to access these matrices is to use an API script to fetch a ProteinFunctionPredictionMatrix for your protein of interest and then call its 'get_prediction' method to get the score for a particular position and amino acid, looping over all possible amino acids for your position. There is some detailed documentation on this class in the API documentation here:
>>>>> 
>>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1ProteinFunctionPredictionMatrix.html
>>>>> 
>>>>> You would need to work out which peptide position your codon maps to, but there are methods in the TranscriptVariationAllele class that should help you (probably translation_start and translation_end):
>>>>> 
>>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1TranscriptVariation.html
>>>>> 
>>>>> We don't store a matrix for the condel scores because they are very quick to compute, given a sift and polyphen score, but you can compute this score using the get_condel_prediction subroutine in the Condel module.
>>>>> 
>>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1Utils_1_1Condel.html
>>>>> 
>>>>> If you need any help gluing this all together let me know!
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Graham
>>>>> 
>>>>> 
>>>>> On 28 Nov 2011, at 15:53, Mark Aquino wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I was wondering if it is possible to get functional predictions for entire codon swaps  (to easily get a score for any of the 19 non-reference AAs at any codon) using the variant effect predictor? My assumption, and through trying to insert a whole codon into a VCF and running VEP on that site, is that the answer is no but I wanted to double check, and perhaps see if it would be possible in the future or if it's simply unfeasible.
>>>>>> 
>>>>>> Best,
>>>>>> Mark
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 





More information about the Dev mailing list