[ensembl-dev] Ensembl Core database SQL query

Susan Fairley sf7 at sanger.ac.uk
Mon Jun 20 09:25:50 BST 2011

Hi Jonathan,

I'm assuming this is a local copy of a core Ensembl database (rather 
than one you've populated yourself).

You should query the seq_region and coord_system tables. Something along 
the lines of:

select seq_region.* from seq_region, coord_system where 
seq_region.coord_system_id=coord_system.coord_system_id and 

would typically return the seq_regions corresponding to the chromosomes. 
You will then be able to identify the genes stored on those regions by 
using seq_region.seq_region_id = gene.seq_region_id.

It is worth noting, however, that the above is the simplest case. In 
some databases, more than one version of the assembly is stored, in that 
case you'll need to use the coord_system version as well.

Further, in some assemblies, not all of the sequence is on the 
chromosomes and the highest level of some assemblies is scaffolds. To 
identify the sequences that make up the toplevel of the assembly, you 
can use the toplevel seq_region_attrib.

A query like this should find the seq_regions in the toplevel:

select seq_region.* from seq_region, seq_region_attrib, attrib_type 
where attrib_type.code='toplevel' and attrib_type.attrib_type_id 
=seq_region_attrib.attrib_type_id and seq_region_attrib.seq_region_id 
=seq_region.seq_region_id ;

I hope this helps.


jonathan wrote:
> Hi, everyone,
> I have implemented the Enesmbl Core database in my own PC. However I 
> cannot find a table including the chromosome information.  I wanna know 
> how to get the chromosome ID for each gene using SQL select code? 
> Cheers,
> Jonathan
> ------------------------------------------------------------------------
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

More information about the Dev mailing list