[ensembl-dev] pseudogenes in ensembl bacteria
Dan Staines
dstaines at ebi.ac.uk
Wed Aug 21 08:57:51 BST 2013
>> Are details about pseudogenes stored anywhere? For example, HI_0006
>> from
>> haemophilus_influenzae_rd_kw20 seems to be missing from any of the
>> downloaded files (fasta, gff3, gtf) and is not searchable on the
>> browser. Can it be accessed (i.e coordinates and DNA sequence) from
>> any
>> of the database tables?
>
> For some reason, the pseudogenes from that record do not appear to
> have been loaded into the core database (though they have been for
> other genomes in the same database). I'll investigate and let you know
> when I have a more detailed answer.
Looking at this some more, it seems there is variability in how
pseudogenes are annotated in INSDC records, which has led to missing
pseudogenes from some records. This will be corrected in a future
release of Ensembl Bacteria (unclear right now whether this will make
the upcoming release in September though). For the moment, the currently
unprocessed features are stored as simple_feature entries in the core
database (visible in the browser as the "ENA features" track") so they
can be retrieved if needed. I can help with retrieval here if needed.
Dan.
--
Dan Staines, PhD Ensembl Genomes Technical Coordinator
EMBL-EBI Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Dev
mailing list