[ensembl-dev] mutation consequences
Graham Ritchie
grsr at ebi.ac.uk
Tue Aug 16 10:06:49 BST 2011
Hi Venu,
For the current release (63) the predictions are stored in the human variation database in the protein_info, protein_position, sift_prediction and polyphen_prediction tables. You can download mysql dumps of these tables from the ensembl ftp server here:
ftp://ftp.ensembl.org/pub/release-63/mysql/homo_sapiens_variation_63_37/
You do not need any other tables to retrieve these predictions. The protein_info table contains a row for each protein coding transcript in ensembl, the protein_info table contains a row for each position in every translation and the sift_prediction and polyphen_prediction tables include a row for each possible amino acid substitution at each position in every translation, and these tables are joined by primary keys. So for example, to fetch the sift prediction for a substitution of amino acid K at position 82 in the translation ENST00000308731 you could use an SQL query like:
SELECT pred.prediction, pred.score
FROM sift_prediction pred, protein_position pp, protein_info pi
WHERE pred.protein_position_id = pp.protein_position_id
AND pp.protein_info_id = pi.protein_info_id
AND pi.transcript_stable_id = 'ENST00000308731'
AND pp.position = 82
AND pred.amino_acid = 'K';
Hopefully that should let you fetch the data you want. Note that the Condel score is a (very fast to compute) function of the sift and polyphen scores, so we don't precompute it, but you can use the get_condel_prediction subroutine from the Bio::EnsEMBL::Variation::Utils::Condel module to calculate it (or just run the Condel script itself).
The schema for these predictions will change significantly for the forthcoming release 64, as we now store these predictions in a much more efficient format in a single table in the variation database, but we will provide documentation on how to retrieve these scores for the release.
Let me know if you have any further questions.
Cheers,
Graham
On 15 Aug 2011, at 23:34, Venugopal Valmeekam wrote:
> Hi,
> I am interested in computing the effects of all possible substitutions in the human proteome. I have been trying to run SIFT and PolyPhen and then run Condel. But i believe that you already have this data set in Ensembl variation database.
>
> could you please point me to this resource? I would like to download the entire data set for the human proteome.
>
> Thanks,
> Venu
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list