[ensembl-dev] Oddity with ENSP identifiers: same identifier for two unrelated sequences?
ayates at ebi.ac.uk
Tue Jun 12 09:53:33 BST 2012
I can see where your confusion lies here as you have to investigate the archived transcripts of these proteins to discover what has occurred. Please bear in mind that protein stable IDs are derived from the transcript & transcript stable IDs are derived from its exons. The only biological unit which is physically mapped is the exon.
If we check the archive sites for the transcript identifier ENSP00000400005 we can see a distinct point in time when the protein sequence changed which were releases 64 and 65:
* 64 - http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_Protein?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
* 65 - http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_Protein?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
A quick check of the exon structure reveals that we still have the same exons in the transcript but that we've had a 44bp contraction of the coding sequence at the 3' end. The actual sequence of the Exon is still identical therefore this is still the same transcript. Also note the change of phase in the 1st Exon as this will be important in a second
* 64 - http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
* 65 - http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
A final check of the cDNA shows that this change of phase has resulted in a coding frameshift which when combined with the truncated cds has resulted in two proteins which seem to be un-related.
* 64 - http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
* 65 - http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
In this situation the mapping pipeline will have incremented the versions on the transcript and protein to flag that there has been a change in the underlying spliced/translated sequence and you should proceed with caution if you were to use this mapping.
A further complication seems to be in release 67 this Havana transcript has been flagged as a processed transcript (non-coding transcript without an ORF) & no longer as a protein coding transcript. The protein should no longer be considered active as indicated by the protein history interface.
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 11 Jun 2012, at 20:26, Healy, Matthew wrote:
> The first URL below appears to show two different sequences with the same accession number.
> The second URL below shows one possible explanation: perhaps this is the correct accession number for one of the above sequences and there is a bug in the web interface?
> This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev