[ensembl-dev] Coordinate System anomalies in EnsemblGenomes

Andy Yates ayates at ebi.ac.uk
Sat Mar 2 18:46:21 GMT 2013


Yes the toplevel data flag is held in seq_region_attrib and joins into attrib_type for the cv/dict table. What I think has confused things a little is that toplevel sequences must be attached to the default assembly whose version can normally be found in the lowest ranking coord system. As you'll see in a number of species toplevel spans multiple coordinate systems but does not span more than one version. 

Hope this clears up any confusion. 



Sent from my mobile.

On 1 Mar 2013, at 17:21, Dan Staines <dstaines at ebi.ac.uk> wrote:

> Hi Trevor,
> The quick (Friday afternoon!) answer is that (according to my colleagues in core), rank doesn't have to be an exact sequence, but you should always have toplevel sequences which may be (in the case of the bacterial load) from the coord_systems chromosome or supercontig, or a mixture of the two. You should use the top_level seq_region attribute to identify top level sequences (maybe someone from core can comment about the schema doc you reference).
> As you point out, there are a significant number without chromosomal assemblies. There are also a handful of cases where a chromosome is incorrectly labelled as a supercontig due to a description line (which can be quite variable...) not matching the set of expected values - I'm looking in to fixing these for the next release.
> Having said that, there does seem to be something else up with the specific example in your mail, where the underlying WGS sequences have been retrieved as toplevel rather than the single assembled chromosome, which may be to do with how the logic of how the assembly is retrieved from the INSDC assembly database, or the state of the assembly database at load time. I'll look into it and let you know.
> Dan.
> -- 
> Dan Staines, PhD               Ensembl Genomes Technical Coordinator
> EMBL-EBI                       Tel: +44-(0)1223-492507
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

More information about the Dev mailing list