[ensembl-dev] pseudogenes in ensembl bacteria

Dan Staines dstaines at ebi.ac.uk
Wed Aug 21 15:57:46 BST 2013

On 08/21/2013 03:42 PM, Adam Witney wrote:
> Actually it looks like they are generally within the
> $simple_feature->display_label() field, but in some cases this seems to
> have been truncated? e.g. gene_feature:1633

If you look at the fasta header generated by this script, its of this form:
 ><id> <location string> <description>
id is the gene stable ID for those genes loaded as "proper" Ensembl gene 
- this is the same as the locus_tag from the INSDC entry. In the cases 
of the subset of pseudogenes which not loaded as full genes, but as 
simple_features, we don't have the locus_tags stored anywhere in the 
database. In this case, id is gene_feature:<dbid> where <dbid> is the 
internal surrogate key (simple_feature.simple_feature_id) used by 
Ensembl. As I said, you can modify this format easily enough.


Dan Staines, PhD
Technical Coordinator, Ensembl Genomes
European Bioinformatics Institute (EMBL-EBI)

More information about the Dev mailing list