[ensembl-dev] Changes in EMBL file format?

Matthew Laird lairdm at ebi.ac.uk
Tue Jun 20 08:06:16 BST 2017


Hi Fin,

Such changes should go in the release notes [1], unfortunately it didn't 
in this case. As the person who  made the actual code changes, that's 
completely on me for not ensuring it made it in to those notes. I 
apologize for any issues it causes in your pipelines, but please do keep 
an eye on the release notes over future releases, that's where we'll 
notify users of such changes.

[1] http://www.ensembl.org/info/website/news.html

On 20/06/17 05:00, Fin Swimmer wrote:
> Hey Anne and Matthew,
> thank you for your detailed answer. Is there a place where I can follow
> these discussion or how can I be warned if something in the fileformat
> have changed?
>
> fin swimmer
> Am 15.06.2017 um 10:37 schrieb Matthew Laird:
>> Hi Fin,
>>
>> The discussion about this change began in the context of Genbank files,
>> however since our Genbank and EMBL file dumpers share a common code base
>> the change rippled to the EMBL files as well.
>>
>> As Anne indicated, this stemmed from wanting to be more INSDC compliant,
>> /note just didn't seem like the appropriate place to put the primary
>> identifier for a record, it's more than just a "note." Unfortunately in
>> the INSDC standards for Genbank files there is no /transcript_id record,
>> despite some other sources using it. /standard_name seemed like the most
>> appropriate choice of those allowed.
>>
>> But yes, for CDS records, not having a /transcript_id, how do we point
>> to that record's parent. The parent transcript isn't the /standard_name,
>> so a difficult choice was made to stick with the /note field in this
>> context. This does feel like an inconsistency, but we believe the
>> benefit of a transcript having it's primary, stable identifier more
>> prominently part of the record outweighs this negative.
>>
>> If you have any other questions or concerns, please do let us know.
>>
>> On 15/06/17 05:25, Fin Swimmer wrote:
>>> Hello,
>>>
>>> I often export gene informations from ensembl in the EMBL file format. I
>>> realized that, since the last ensembl update, the transcript id's now
>>> have the key /standard_name in the mRNA or misc_RNA part, whereas in the
>>> CDS part /note="transcript_id= is still used.
>>>
>>> Is this change a bug or a feature?
>>>
>>> fin swimmer
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list