[ensembl-dev] Ensembl Core database SQL query
Susan Fairley
sf7 at sanger.ac.uk
Mon Jun 20 09:25:50 BST 2011
Hi Jonathan,
I'm assuming this is a local copy of a core Ensembl database (rather
than one you've populated yourself).
You should query the seq_region and coord_system tables. Something along
the lines of:
select seq_region.* from seq_region, coord_system where
seq_region.coord_system_id=coord_system.coord_system_id and
coord_system.name='chromosome';
would typically return the seq_regions corresponding to the chromosomes.
You will then be able to identify the genes stored on those regions by
using seq_region.seq_region_id = gene.seq_region_id.
It is worth noting, however, that the above is the simplest case. In
some databases, more than one version of the assembly is stored, in that
case you'll need to use the coord_system version as well.
Further, in some assemblies, not all of the sequence is on the
chromosomes and the highest level of some assemblies is scaffolds. To
identify the sequences that make up the toplevel of the assembly, you
can use the toplevel seq_region_attrib.
A query like this should find the seq_regions in the toplevel:
select seq_region.* from seq_region, seq_region_attrib, attrib_type
where attrib_type.code='toplevel' and attrib_type.attrib_type_id
=seq_region_attrib.attrib_type_id and seq_region_attrib.seq_region_id
=seq_region.seq_region_id ;
I hope this helps.
Regards,
Susan.
jonathan wrote:
> Hi, everyone,
> I have implemented the Enesmbl Core database in my own PC. However I
> cannot find a table including the chromosome information. I wanna know
> how to get the chromosome ID for each gene using SQL select code?
> Cheers,
> Jonathan
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list