[ensembl-dev] Transmembrane feature in Ensembl protein summary

Andy Yates ayates at ebi.ac.uk
Wed Apr 30 15:10:23 BST 2014


Hi,

Comparing the Ensembl protein domain features against UniProt is possible to do since Ensembl proteins are now mapped to UniParc records and so it has become a lot easier to identify 100% matches between the two resources [1]. Should you identify the corresponding UniProt protein their predicted domains are comparable to ours. The case you've identified is interesting as TMHMM has been unable to find a sufficiently strong signal for the 3rd TMD [2].

At the moment our internal infrastructure means incorporating this information directly into Ensembl is non-trivial. We are actively working on a solution which will ease external data source import meaning tasks like these would become easier. We can review the import of UniProt annotation at that later date once the work is completed.

All the best,

Andy

[1] - UniProt cross references come in two forms; DIRECT and SEQUENCE_MATCH. The former is a call declared by UniProt & the later is one derived from alignment. As far as I remember UniProt can declare two proteins to be equivalent even though their protein sequence is not identical (such as in the case of isoforms or where the genome has resulted in small 1-2 residue differences). Going via UniParc means a MD5 checksum has been calculated on the protein sequence so we are 100% assured that two proteins are indeed the same thing.

[2] - http://www.cbs.dtu.dk/cgi-bin/webface2.fcgi?jobid=5360FFF200007D9ED6A2C011&wait=20

------------
Andrew Yates - Ensembl Support Coordinator
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
http://www.ensembl.org/

On 30 Apr 2014, at 14:45, Genomeo Dev <genomeodev at gmail.com> wrote:

> Hi,
> 
> While transmembrane regions in Ensembl protein summary are based on TMHMM:
> 
> http://www.ensembl.org/Help/View?id=177
> 
> It is interesting to find that for some proteins with established number of transmemrbane helices such as GPCRs e.g. , Ensembl shows 6 predicted TMs for the merged ensembl-havana transcript protein ENSP00000355316. The corresponding Uniprot ID (Q14832) shows the correct number of 7 TM helices:
> 
> TMH1
> Uniprot Transmembrane	577 – 599	23	Helical; Name=1; Potential
> Ensembl Transmembrane	577	599
> TMH2
> Uniprot Transmembrane	614 – 634	21	Helical; Name=2; Potential
> Ensembl Transmembrane	612	634
> TMH3
> Uniprot Transmembrane	646 – 664	19	Helical; Name=3; Potential
> TMH4
> Uniprot Transmembrane	689 – 709	21	Helical; Name=4; Potential
> Ensembl Transmembrane	687	709
> TMH5
> Uniprot Transmembrane	735 – 756	22	Helical; Name=5; Potential
> Ensembl Transmembrane	737	756
> TMH6
> Uniprot Transmembrane	770 – 792	23	Helical; Name=6; Potential
> Ensembl Transmembrane	771	793
> TMH7
> Uniprot Transmembrane	803 – 828	26	Helical; Name=7; Potential
> Ensembl Transmembrane	806	828
> 
> http://www.uniprot.org/uniprot/Q14832
> 
> Would be useful if protein features from sources such as uniprot can be checked against/incorporated into Ensembl.
> 
> Best regards
> 
> -- 
> G.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list