[ensembl-dev] Coordinate System anomalies in EnsemblGenomes

Dan Staines dstaines at ebi.ac.uk
Fri Mar 1 17:21:50 GMT 2013


Hi Trevor,

The quick (Friday afternoon!) answer is that (according to my colleagues 
in core), rank doesn't have to be an exact sequence, but you should 
always have toplevel sequences which may be (in the case of the 
bacterial load) from the coord_systems chromosome or supercontig, or a 
mixture of the two. You should use the top_level seq_region attribute to 
identify top level sequences (maybe someone from core can comment about 
the schema doc you reference).

As you point out, there are a significant number without chromosomal 
assemblies. There are also a handful of cases where a chromosome is 
incorrectly labelled as a supercontig due to a description line (which 
can be quite variable...) not matching the set of expected values - I'm 
looking in to fixing these for the next release.

Having said that, there does seem to be something else up with the 
specific example in your mail, where the underlying WGS sequences have 
been retrieved as toplevel rather than the single assembled chromosome, 
which may be to do with how the logic of how the assembly is retrieved 
from the INSDC assembly database, or the state of the assembly database 
at load time. I'll look into it and let you know.

Dan.

-- 
Dan Staines, PhD               Ensembl Genomes Technical Coordinator
EMBL-EBI                       Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/




More information about the Dev mailing list