[ensembl-dev] Transmembrane feature in Ensembl protein summary
Andy Yates
ayates at ebi.ac.uk
Wed Apr 30 15:10:23 BST 2014
Hi,
Comparing the Ensembl protein domain features against UniProt is possible to do since Ensembl proteins are now mapped to UniParc records and so it has become a lot easier to identify 100% matches between the two resources [1]. Should you identify the corresponding UniProt protein their predicted domains are comparable to ours. The case you've identified is interesting as TMHMM has been unable to find a sufficiently strong signal for the 3rd TMD [2].
At the moment our internal infrastructure means incorporating this information directly into Ensembl is non-trivial. We are actively working on a solution which will ease external data source import meaning tasks like these would become easier. We can review the import of UniProt annotation at that later date once the work is completed.
All the best,
Andy
[1] - UniProt cross references come in two forms; DIRECT and SEQUENCE_MATCH. The former is a call declared by UniProt & the later is one derived from alignment. As far as I remember UniProt can declare two proteins to be equivalent even though their protein sequence is not identical (such as in the case of isoforms or where the genome has resulted in small 1-2 residue differences). Going via UniParc means a MD5 checksum has been calculated on the protein sequence so we are 100% assured that two proteins are indeed the same thing.
[2] - http://www.cbs.dtu.dk/cgi-bin/webface2.fcgi?jobid=5360FFF200007D9ED6A2C011&wait=20
------------
Andrew Yates - Ensembl Support Coordinator
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
http://www.ensembl.org/
On 30 Apr 2014, at 14:45, Genomeo Dev <genomeodev at gmail.com> wrote:
> Hi,
>
> While transmembrane regions in Ensembl protein summary are based on TMHMM:
>
> http://www.ensembl.org/Help/View?id=177
>
> It is interesting to find that for some proteins with established number of transmemrbane helices such as GPCRs e.g. , Ensembl shows 6 predicted TMs for the merged ensembl-havana transcript protein ENSP00000355316. The corresponding Uniprot ID (Q14832) shows the correct number of 7 TM helices:
>
> TMH1
> Uniprot Transmembrane 577 – 599 23 Helical; Name=1; Potential
> Ensembl Transmembrane 577 599
> TMH2
> Uniprot Transmembrane 614 – 634 21 Helical; Name=2; Potential
> Ensembl Transmembrane 612 634
> TMH3
> Uniprot Transmembrane 646 – 664 19 Helical; Name=3; Potential
> TMH4
> Uniprot Transmembrane 689 – 709 21 Helical; Name=4; Potential
> Ensembl Transmembrane 687 709
> TMH5
> Uniprot Transmembrane 735 – 756 22 Helical; Name=5; Potential
> Ensembl Transmembrane 737 756
> TMH6
> Uniprot Transmembrane 770 – 792 23 Helical; Name=6; Potential
> Ensembl Transmembrane 771 793
> TMH7
> Uniprot Transmembrane 803 – 828 26 Helical; Name=7; Potential
> Ensembl Transmembrane 806 828
>
> http://www.uniprot.org/uniprot/Q14832
>
> Would be useful if protein features from sources such as uniprot can be checked against/incorporated into Ensembl.
>
> Best regards
>
> --
> G.
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list