[ensembl-dev] pseudogenes in ensembl bacteria
Dan Staines
dstaines at ebi.ac.uk
Wed Aug 21 16:47:01 BST 2013
On 08/21/2013 04:40 PM, Adam Witney wrote:
>
> Yes, sorry, I meant that if you look at the end of the description field
> for rd_kw20, then these pseudogenes have the locus_tag within that
> description field, although it is truncated in some cases.
Ah, sorry, my misunderstanding. Yes, that makes some kind of sense. The
description here comes from the free text "note" qualifier for the INSDC
gene feature, though I suspect this is truncated to 255 chars on storing
to Ensembl as I'm misusing the display_label to store this information
here.
Looking at L42023, it seems that the submitter for at least some of
these the submitter added a form of the locus_tag to the end of the
note, though I wouldn't rely on it as a general source e.g.
FT gene 928097..929402
FT /pseudo
FT /locus_tag="HI_0875"
FT /note="This region contains an authentic frame
shift and is
FT not the result of a sequencing artifact; similar to
FT PID:1438114 percent identity: 54.31; identified by
sequence
FT similarity; putative;peptidase B (pepB), authentic
FT frameshift; HI0875"
As I've said, I'll see if I can get the revised pipeline running again
so may be able to provide a mapping file in due course if its a problem
for you not having the labels reliably present.
Dan.
--
Dan Staines, PhD
Technical Coordinator, Ensembl Genomes
European Bioinformatics Institute (EMBL-EBI)
http://www.ebi.ac.uk/
http://www.ensemblgenomes.org/
More information about the Dev
mailing list