[ensembl-dev] Oddity with ENSP identifiers: same identifier for two unrelated sequences?

Liu, Mingyi Mingyi.Liu at bms.com
Tue Jun 12 11:53:27 BST 2012


Hi, Andy,

Thanks for explaining the reason for the change, that'll be helpful when we have to explain the results to our internal customers, if needed.

However, our main confusion was that we assumed that any sequence change of ENSP00000400005 from v64 to v65 should have resulted in either a new stable ID, or a new version of the same stable ID.  But both v64 and v65 marked this protein's latest version # is ENSP00000400005.1, meaning despite the sequence change, the ID/version stayed the same, which causes issues in our internal sequence storage/analysis/ID-based results linking (we did notice that in this particular case, the ID disappeared in v67).  A colleague of ours seemed to believe this issue happened more than this one ID, although we didn't have time to track it down yet.

Thanks,

Mingyi

>-----Original Message-----
>From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf
>Of Andy Yates
>Sent: Tuesday, June 12, 2012 4:54 AM
>To: Ensembl developers list
>Subject: Re: [ensembl-dev] Oddity with ENSP identifiers: same identifier
>for two unrelated sequences?
>
>Hi Matthew,
>
>I can see where your confusion lies here as you have to investigate the
>archived transcripts of these proteins to discover what has occurred.
>Please bear in mind that protein stable IDs are derived from the
>transcript & transcript stable IDs are derived from its exons. The only
>biological unit which is physically mapped is the exon.
>
>If we check the archive sites for the transcript identifier
>ENSP00000400005 we can see a distinct point in time when the protein
>sequence changed which were releases 64 and 65:
>
>* 64 -
>http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_Prot
>ein?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>* 65 -
>http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_Prot
>ein?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>
>A quick check of the exon structure reveals that we still have the same
>exons in the transcript but that we've had a 44bp contraction of the
>coding sequence at the 3' end. The actual sequence of the Exon is still
>identical therefore this is still the same transcript. Also note the
>change of phase in the 1st Exon as this will be important in a second
>
>* 64 -
>http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Exons?db=core
>;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>* 65 -
>http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Exons?db=core
>;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>
>A final check of the cDNA shows that this change of phase has resulted
>in a coding frameshift which when combined with the truncated cds has
>resulted in two proteins which seem to be un-related.
>
>* 64 -
>http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA
>?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>* 65 -
>http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA
>?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>
>In this situation the mapping pipeline will have incremented the
>versions on the transcript and protein to flag that there has been a
>change in the underlying spliced/translated sequence and you should
>proceed with caution if you were to use this mapping.
>
>
>A further complication seems to be in release 67 this Havana transcript
>has been flagged as a processed transcript (non-coding transcript
>without an ORF) & no longer as a protein coding transcript. The protein
>should no longer be considered active as indicated by the protein
>history interface.
>
>
>Best regards,
>
>Andy
>
>Andrew Yates                   Ensembl Core Software Project Leader
>EMBL-EBI                       Tel: +44-(0)1223-492538
>Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>Cambridge CB10 1SD, UK         http://www.ensembl.org/
>
>On 11 Jun 2012, at 20:26, Healy, Matthew wrote:
>
>>
>> The first URL below appears to show two different sequences with the
>same accession number.
>>
>>
>http://useast.ensembl.org/Homo_sapiens/Transcript/Idhistory/Protein?p=EN
>SP00000400005;t=ENSP00000400005
>>
>> The second URL below shows one possible explanation: perhaps this is
>the correct accession number for one of the above sequences and there is
>a bug in the web interface?
>>
>>
>http://useast.ensembl.org/Homo_sapiens/Transcript/Idhistory/Protein?db=c
>ore;t=ENSP00000402579
>>
>> This message (including any attachments) may contain confidential,
>proprietary, privileged and/or private information.  The information is
>intended to be for the use of the individual or entity designated above.
>If you are not the intended recipient of this message, please notify the
>sender immediately, and delete the message and any attachments.  Any
>disclosure, reproduction, distribution or other use of this message or
>any attachments by an individual or entity other than the intended
>recipient is prohibited.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>_______________________________________________
>Dev mailing list    Dev at ensembl.org
>List admin (including subscribe/unsubscribe):
>http://lists.ensembl.org/mailman/listinfo/dev
>Ensembl Blog: http://www.ensembl.info/

This message (including any attachments) may contain confidential, proprietary, privileged and/or private information.  The information is intended to be for the use of the individual or entity designated above.  If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments.  Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.




More information about the Dev mailing list