[ensembl-dev] unique species
Matthieu Muffato
muffato at ebi.ac.uk
Thu Feb 6 11:02:36 GMT 2020
Dear Joseph
You have probably noticed that you query returns two rows for the
taxon_id 644223. There are a few more things to consider:
1. There are a number of genomes that belong to the Fungi site. They
exist as Core databases on the server and you can identify them by
checking the return value of get_division() via the MetaContainer
adaptor of every genome registered on the Registry. If you do SQL,
that's the species.division meta key
2. Some of those genomes will make it to the genome_db table of the
Compara database
3. Some of those genomes will make it to the gene_tree / species_tree
tables of the Compara database, i.e. the genomes that are used in the
protein-tree / orthology builds
I believe you can have multiple strains / sub-species of the same
species (and perhaps multiple assemblies of the same organism ?) at each
of these three levels.
- If you deal with Core databases, the species.species_taxonomy_id meta
key holds the taxon_id of the species, whereas the species.taxonomy_id
meta keys holds the taxon_id of the particular strain / sub-species (if
one has been assigned).
- If you use the Compara database, the taxon_id we store in the
genome_db and species_tree_node tables is species.taxonomy_id, so to get
the species' taxon_id instead you need to traverse the taxonomy upwards
until you find a node at the species rank.
Note that on the same server there is another database
(ensembl_metadata_99) which holds the list of genomes and taxon_ids
(both species and sub-species), and whether the genome is in the Compara
database. It might be easier to query that the Core / Compara databases;
Hope this helps,
Matthieu
On 06/02/2020 05:58, Joseph Steinberger wrote:
> Dear Community,
>
> I would like to know the number of unique species in the Ensembl
> Fungi database - I believe there are 488.
>
> I run the following command, and get 488 rows -
>
> SELECT taxon_id,
> node_name
> FROM ensembl_compara_fungi_46_99.species_tree_node
> WHERE genome_db_id != 'NaN'
>
>
> Am I correct in my interpretation?
>
> Sincerely,
> Yossi
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
--
Matthieu Muffato, Ph.D.
Ensembl Compara Principal Developer
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room A3-123
Phone + 44 (0) 1223 49 4631
Fax + 44 (0) 1223 49 4468
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200206/7aaf8c43/attachment.html>
More information about the Dev
mailing list