[ensembl-dev] Coordinate System anomalies in EnsemblGenomes
Dan Staines
dstaines at ebi.ac.uk
Fri Mar 1 17:21:50 GMT 2013
Hi Trevor,
The quick (Friday afternoon!) answer is that (according to my colleagues
in core), rank doesn't have to be an exact sequence, but you should
always have toplevel sequences which may be (in the case of the
bacterial load) from the coord_systems chromosome or supercontig, or a
mixture of the two. You should use the top_level seq_region attribute to
identify top level sequences (maybe someone from core can comment about
the schema doc you reference).
As you point out, there are a significant number without chromosomal
assemblies. There are also a handful of cases where a chromosome is
incorrectly labelled as a supercontig due to a description line (which
can be quite variable...) not matching the set of expected values - I'm
looking in to fixing these for the next release.
Having said that, there does seem to be something else up with the
specific example in your mail, where the underlying WGS sequences have
been retrieved as toplevel rather than the single assembled chromosome,
which may be to do with how the logic of how the assembly is retrieved
from the INSDC assembly database, or the state of the assembly database
at load time. I'll look into it and let you know.
Dan.
--
Dan Staines, PhD Ensembl Genomes Technical Coordinator
EMBL-EBI Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Dev
mailing list