[ensembl-dev] pseudogenes in ensembl bacteria

Dan Staines dstaines at ebi.ac.uk
Wed Aug 21 16:47:01 BST 2013

On 08/21/2013 04:40 PM, Adam Witney wrote:
> Yes, sorry, I meant that if you look at the end of the description field
> for rd_kw20, then these pseudogenes have the locus_tag within that
> description field, although it is truncated in some cases.

Ah, sorry, my misunderstanding. Yes, that makes some kind of sense. The 
description here comes from the free text "note" qualifier for the INSDC 
gene feature, though I suspect this is truncated to 255 chars on storing 
to Ensembl as I'm misusing the display_label to store this information 

Looking at L42023, it seems that the submitter for at least some of 
these the submitter added a form of the locus_tag to the end of the 
note, though I wouldn't rely on it as a general source e.g.
FT   gene            928097..929402
FT                   /pseudo
FT                   /locus_tag="HI_0875"
FT                   /note="This region contains an authentic frame 
shift and is
FT                   not the result of a sequencing artifact; similar to
FT                   PID:1438114 percent identity: 54.31; identified by 
FT                   similarity; putative;peptidase B (pepB), authentic
FT                   frameshift; HI0875"

As I've said, I'll see if I can get the revised pipeline running again 
so may be able to provide a mapping file in due course if its a problem 
for you not having the labels reliably present.


