[ensembl-dev] unique species

Matthieu Muffato muffato at ebi.ac.uk
Thu Feb 6 11:02:36 GMT 2020


Dear Joseph


You have probably noticed that you query returns two rows for the 
taxon_id 644223. There are a few more things to consider:

1. There are a number of genomes that belong to the Fungi site. They 
exist as Core databases on the server and you can identify them by 
checking the return value of get_division() via the MetaContainer 
adaptor of every genome registered on the Registry. If you do SQL, 
that's the species.division meta key

2. Some of those genomes will make it to the genome_db table of the 
Compara database

3. Some of those genomes will make it to the gene_tree / species_tree 
tables of the Compara database, i.e. the genomes that are used in the 
protein-tree / orthology builds

I believe you can have multiple strains / sub-species of the same 
species (and perhaps multiple assemblies of the same organism ?) at each 
of these three levels.

- If you deal with Core databases, the species.species_taxonomy_id meta 
key holds the taxon_id of the species, whereas the species.taxonomy_id 
meta keys holds the taxon_id of the particular strain / sub-species (if 
one has been assigned).

- If you use the Compara database, the taxon_id we store in the 
genome_db and species_tree_node tables is species.taxonomy_id, so to get 
the species' taxon_id instead you need to traverse the taxonomy upwards 
until you find a node at the species rank.


Note that on the same server there is another database 
(ensembl_metadata_99) which holds the list of genomes and taxon_ids 
(both species and sub-species), and whether the genome is in the Compara 
database. It might be easier to query that the Core  / Compara databases;


Hope this helps,

Matthieu


On 06/02/2020 05:58, Joseph Steinberger wrote:
> Dear Community,
>
> I would like to know the number of  unique  species in the Ensembl 
> Fungi database - I believe there are 488.
>
> I run the following command, and get 488 rows  -
>
>     SELECT taxon_id,
>             node_name
>     FROM ensembl_compara_fungi_46_99.species_tree_node
>     WHERE genome_db_id != 'NaN'
>
>
> Am I correct in my interpretation?
>
> Sincerely,
> Yossi
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

-- 
Matthieu Muffato, Ph.D.
Ensembl Compara Principal Developer
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-123
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200206/7aaf8c43/attachment.html>


More information about the Dev mailing list