[ensembl-dev] pseudogenes in ensembl bacteria

Wed Aug 21 08:57:51 BST 2013

>> Are details about pseudogenes stored anywhere? For example, HI_0006 
>> from
>> haemophilus_influenzae_rd_kw20 seems to be missing from any of the
>> downloaded files (fasta, gff3, gtf) and is not searchable on the
>> browser. Can it be accessed (i.e coordinates and DNA sequence) from 
>> any
>> of the database tables?
> 
> For some reason, the pseudogenes from that record do not appear to
> have been loaded into the core database (though they have been for
> other genomes in the same database). I'll investigate and let you know
> when I have a more detailed answer.

Looking at this some more, it seems there is variability in how 
pseudogenes are annotated in INSDC records, which has led to missing 
pseudogenes from some records. This will be corrected in a future 
release of Ensembl Bacteria (unclear right now whether this will make 
the upcoming release in September though). For the moment, the currently 
unprocessed features are stored as simple_feature entries in the core 
database (visible in the browser as the "ENA features" track") so they 
can be retrieved if needed. I can help with retrieval here if needed.

Dan.

-- 
Dan Staines, PhD               Ensembl Genomes Technical Coordinator
EMBL-EBI                       Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/