[ensembl-dev] C.elegans protein/transcripts share same IDs

Thu Sep 2 16:29:06 BST 2010

  On 02/09/10 16:17, Liu, Mingyi wrote:
> Hi, Michael,
>
> Thanks for the quick response!  That makes sense, but is it possible/if there's any plan to maybe just attach a 'p' to the end of the transcript id to make unique protein ids?  This way the transcript IDs would be different from protein IDs, making sure a mixed storage facility would not be confused (especially since Ensembl traditionally do give transcripts and proteins different IDs, e.g. not using ENST000... for both protein/dna).  We would probably do that internally but it'd makes it harder to x-check with ensembl.org if we keep our own internal protein IDs different.
>
> Thanks,
>
> Mingyi

problem with that is, that we use the stable_ids to create the links back to the wormbase database, and if I append 
letters, the links would break.

But you could use the connected xrefs as ids for your storage, as they got the proteinids. Sadly I connected them to 
translations/transcripts and genes, so it needs some kind of logic depending on the source of the table.

And to be fair, if you put in a logic to separate the xrefs, you could as well add a p to the molecule-type of your choice.

M

>> -----Original Message-----
>> From: Michael Paulini [mailto:mh6 at sanger.ac.uk]
>> Sent: Thursday, September 02, 2010 11:10 AM
>> To: Liu, Mingyi
>> Cc: dev at ensembl.org
>> Subject: Re: [ensembl-dev] C.elegans protein/transcripts share same IDs
>>
>>   On 02/09/10 15:52, Liu, Mingyi wrote:
>>> Hi,
>>>
>>> Sorry if it's answered before (googled and didn't find answer) - We
>> just noticed that in Ensembl C.elegans transcripts share the same IDs as
>> proteins, while annotating Wormbase protein IDs as a xref.  Is there any
>> particular reason why it was done this way?  The same IDs present an
>> issue for our sequence storage.  We could work around it but it'd be
>> messy.  It seems best if Ensembl could use Wormbase's protein IDs in
>> addition to the transcript IDs too?
>>> Thanks,
>>>
>>> Mingyi
>>>
>> Hi Mingyi,
>>
>> I used the WormBase-TranscriptIDs as stable_ids for the translations
>> (knowing that people expect unique translation
>> IDs), therefore they should be unique.
>>
>> The protein xrefs in contrast are not unique to a single transcript, as
>> WormBase-ProteinIDs are unique to the protein
>> sequence, so more than one transcript/translation/gene can share the
>> same WormBase-proteinID as xref.
>>
>> Michael
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose registered
>> office is 215 Euston Road, London, NW1 2BE.
> This message (including any attachments) may contain confidential, proprietary, privileged and/or private information.  The information is intended to be for the use of the individual or entity designated above.  If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments.  Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.