[ensembl-dev] Oddity with ENSP identifiers: same identifier for two unrelated sequences?

Liu, Mingyi Mingyi.Liu at bms.com
Wed Jun 13 17:20:06 BST 2012


Thanks!

BTW I just wonder if anyone saw the same error message I saw when I installed v67 locally? The error message is basically saying there's no /Multi path on the server. Since there's no such physical path in v66 either but v66 finds /Multi just fine, I was assuming that there's some rewrite rules defined for the path that worked in v66 but not in v67.  I just wonder if anyone saw this and know how to fix it for v67?  Otherwise if you could give me some tips as to how and where the rewrite rules for /Multi path are defined, it'd be quite helpful too.

Thanks again,

Mingyi

>-----Original Message-----
>From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf
>Of Andy Yates
>Sent: Wednesday, June 13, 2012 11:25 AM
>To: Ensembl developers list
>Subject: Re: [ensembl-dev] Oddity with ENSP identifiers: same identifier
>for two unrelated sequences?
>
>Hi there,
>
>We've gone through our code and found a pair of issues which caused the
>unclear and confusing history of these translations. Firstly it seems
>that the peptide sequence was not used when deciding to increment the
>translation stable id; only the transcript spliced sequence was used.
>The effect of this was to increment the translation stable id version at
>the same rate as the transcript stable id. We have changed this logic to
>increment translation stable id if there is a difference in the
>resulting peptide sequence.
>
>The second issue was related to the splicing of an exon between releases.
>This resulted in a penalty meaning that even though the exons had a
>perfect location match we still attempted to use exonerate for the
>matches. This penalty has been removed.
>
>All the best,
>
>Andy
>
>Andrew Yates                   Ensembl Core Software Project Leader
>EMBL-EBI                       Tel: +44-(0)1223-492538
>Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>Cambridge CB10 1SD, UK         http://www.ensembl.org/
>
>On 12 Jun 2012, at 11:53, Liu, Mingyi wrote:
>
>> Hi, Andy,
>>
>> Thanks for explaining the reason for the change, that'll be helpful
>when we have to explain the results to our internal customers, if needed.
>>
>> However, our main confusion was that we assumed that any sequence
>change of ENSP00000400005 from v64 to v65 should have resulted in either
>a new stable ID, or a new version of the same stable ID.  But both v64
>and v65 marked this protein's latest version # is ENSP00000400005.1,
>meaning despite the sequence change, the ID/version stayed the same,
>which causes issues in our internal sequence storage/analysis/ID-based
>results linking (we did notice that in this particular case, the ID
>disappeared in v67).  A colleague of ours seemed to believe this issue
>happened more than this one ID, although we didn't have time to track it
>down yet.
>>
>> Thanks,
>>
>> Mingyi
>>
>>> -----Original Message-----
>>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On
>Behalf
>>> Of Andy Yates
>>> Sent: Tuesday, June 12, 2012 4:54 AM
>>> To: Ensembl developers list
>>> Subject: Re: [ensembl-dev] Oddity with ENSP identifiers: same
>identifier
>>> for two unrelated sequences?
>>>
>>> Hi Matthew,
>>>
>>> I can see where your confusion lies here as you have to investigate
>the
>>> archived transcripts of these proteins to discover what has occurred.
>>> Please bear in mind that protein stable IDs are derived from the
>>> transcript & transcript stable IDs are derived from its exons. The
>only
>>> biological unit which is physically mapped is the exon.
>>>
>>> If we check the archive sites for the transcript identifier
>>> ENSP00000400005 we can see a distinct point in time when the protein
>>> sequence changed which were releases 64 and 65:
>>>
>>> * 64 -
>>>
>http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_Prot
>>> ein?db=core;g=ENSG00000236022;r=17:19109483-
>19110124;t=ENST00000447506
>>> * 65 -
>>>
>http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_Prot
>>> ein?db=core;g=ENSG00000236022;r=17:19109483-
>19110124;t=ENST00000447506
>>>
>>> A quick check of the exon structure reveals that we still have the
>same
>>> exons in the transcript but that we've had a 44bp contraction of the
>>> coding sequence at the 3' end. The actual sequence of the Exon is
>still
>>> identical therefore this is still the same transcript. Also note the
>>> change of phase in the 1st Exon as this will be important in a second
>>>
>>> * 64 -
>>>
>http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Exons?db=core
>>> ;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>>> * 65 -
>>>
>http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Exons?db=core
>>> ;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>>>
>>> A final check of the cDNA shows that this change of phase has
>resulted
>>> in a coding frameshift which when combined with the truncated cds has
>>> resulted in two proteins which seem to be un-related.
>>>
>>> * 64 -
>>>
>http://sep2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA
>>> ?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>>> * 65 -
>>>
>http://dec2011.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA
>>> ?db=core;g=ENSG00000236022;r=17:19109483-19110124;t=ENST00000447506
>>>
>>> In this situation the mapping pipeline will have incremented the
>>> versions on the transcript and protein to flag that there has been a
>>> change in the underlying spliced/translated sequence and you should
>>> proceed with caution if you were to use this mapping.
>>>
>>>
>>> A further complication seems to be in release 67 this Havana
>transcript
>>> has been flagged as a processed transcript (non-coding transcript
>>> without an ORF) & no longer as a protein coding transcript. The
>protein
>>> should no longer be considered active as indicated by the protein
>>> history interface.
>>>
>>>
>>> Best regards,
>>>
>>> Andy
>>>
>>> Andrew Yates                   Ensembl Core Software Project Leader
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>>>
>>> On 11 Jun 2012, at 20:26, Healy, Matthew wrote:
>>>
>>>>
>>>> The first URL below appears to show two different sequences with the
>>> same accession number.
>>>>
>>>>
>>>
>http://useast.ensembl.org/Homo_sapiens/Transcript/Idhistory/Protein?p=EN
>>> SP00000400005;t=ENSP00000400005
>>>>
>>>> The second URL below shows one possible explanation: perhaps this is
>>> the correct accession number for one of the above sequences and there
>is
>>> a bug in the web interface?
>>>>
>>>>
>>>
>http://useast.ensembl.org/Homo_sapiens/Transcript/Idhistory/Protein?db=c
>>> ore;t=ENSP00000402579
>>>>
>>>> This message (including any attachments) may contain confidential,
>>> proprietary, privileged and/or private information.  The information
>is
>>> intended to be for the use of the individual or entity designated
>above.
>>> If you are not the intended recipient of this message, please notify
>the
>>> sender immediately, and delete the message and any attachments.  Any
>>> disclosure, reproduction, distribution or other use of this message
>or
>>> any attachments by an individual or entity other than the intended
>>> recipient is prohibited.
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> This message (including any attachments) may contain confidential,
>proprietary, privileged and/or private information.  The information is
>intended to be for the use of the individual or entity designated above.
>If you are not the intended recipient of this message, please notify the
>sender immediately, and delete the message and any attachments.  Any
>disclosure, reproduction, distribution or other use of this message or
>any attachments by an individual or entity other than the intended
>recipient is prohibited.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>_______________________________________________
>Dev mailing list    Dev at ensembl.org
>List admin (including subscribe/unsubscribe):
>http://lists.ensembl.org/mailman/listinfo/dev
>Ensembl Blog: http://www.ensembl.info/

This message (including any attachments) may contain confidential, proprietary, privileged and/or private information.  The information is intended to be for the use of the individual or entity designated above.  If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments.  Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.




More information about the Dev mailing list