[ensembl-dev] Mapping between Swiss-Prot/TrEMBL and Ensembl

tjaart at ebi.ac.uk tjaart at ebi.ac.uk
Wed Aug 11 14:04:19 BST 2010


Hi all

Today I have a question regarding the mapping of transcripts to UniProt. I
have posted it here but it might also be suitable to post to the UniProt
help desk.

What I would like is to be able to map all the different nsSNPs for a
specific gene to ONE UniProt code. If I understand everything correctly
from the UniProt documentation, all the splice variants for a specific
protein encoding gene are collected under one UniProt/Swiss-Prot id. What
I have found in Ensembl is that for some genes some of the transcripts map
to a Swiss-Prot id but other transcripts from the same gene map to a
TrEMBL id. From my understanding, all the transcripts for a gene should
map to the same Swiss-Prot id but as different splice variants.

Here is an example:
One of the chains in protein (PDB id) 2bnq is coded for by
ENSG00000166710.

When retrieving the the transcripts for said gene (v54 of Ensembl for
compatibility with 1000 genomes database reasons) I get the following:

ENST00000349264 maps to P61769 (Swiss-Prot reviewed, what I want)

but

ENST00000396754 maps to Q9UM88, Q9UD48 and B4E0X1 (all TrEMBL)

How do I now go about to map nsSNPs to a UniProt id if one transcript maps
to three different TrEMBL entries?

I did some preliminary counts and for my list of 2890 genes (identified
from the UniProt data for 2890 protein chains) I find 7706 transcripts of
which 3331 map to Swiss-Prot and 2659 map only to TrEMBL. I would have
expected that if a gene is linked to a protein in Swiss-Prot, all the
transcripts for that gene would also be linked the same Swiss-Prot id. And
since my list originates from UniProt all the transcripts for the genes
should be mapped to the same UniProt ids.

I already have a Perl script (thanks to Ian Longden) which returns the
UniProt positions of each nsSNP but only for those nsSNPs located on a
transcript which maps to a Swiss-Prot id. How would I go about mapping
nsSNPs in TrEMBL-mapped transcripts to the Swiss-Prot id assigned for the
gene?

Any other suggestions are also welcome.

Thanks!
Tjaart de Beer








More information about the Dev mailing list