[ensembl-dev] Ensembl versions diffing tool

mag mr6 at ebi.ac.uk
Fri Jan 24 11:28:58 GMT 2014


Hi Kiran,

If you are familiar with the Ensembl healthchecks, you are probably 
aware that they mostly use SQL calls.
Hence, mysql queries or API calls should be able to give you all the 
numbers you need.

The key factor here is to have access to two sets of databases, the 
current and the previous release.
Once you have that, you should be able to run all the comparisons you want.

Another solution would be to process the databases each release and 
store the results somewhere.
Then you would be able to compare a release with any prior one.

Keeping track of how many gene models have changed, especially if 
working with human, is a relatively tricky task.
The stable id mapping would probably be the best way to go.
In the gene_archive table, you can get a list of all genes which have 
changed from the previous release.
This includes version changes or complete retirement.
For example, select count(distinct gene_stable_id) from gene_archive 
where mapping_session_id = 395 ;
indicates that 693 genes have changed from release 73 to 74.

For other statistics, we do try and include them on our annotation page:
http://www.ensembl.org/Homo_sapiens/Info/Annotation#assembly
This displays number of genes by biotype groups, total number of 
variations and assembly version.
If this does not cover all the numbers you are looking for, we will 
happily take suggestions into consideration.

Also, from release 75 onwards, these statistics will also be available 
directly from the database, stored in the genome_statistics table.


Regards,
Magali

On 23/01/2014 22:51, Kiran Mukhyala wrote:
> Hello,
>
> I am looking for a way to summarize the differences between two 
> versions of Ensembl databases for a given species.
> Specifically things like the total number of genes, how many gene 
> models have changed, number of genes with PFAM domains, number of 
> protein coding genes, number of variations from various sources, 
> number of homologs in species X etc.
>
> I am aware of two ways to do this:
>
> 1. By reading the release details page for each version that I am 
> interested in, which doesn't really give me the numbers I am looking for.
> 2. Using Ensembl healthcheck which I assume is hard to customize.
>
> Are there any other tools for accomplishing this? If not, would a tool 
> like that be useful to anyone else?
>
> Thanks,
> -Kiran
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140124/0c557bb8/attachment.html>


More information about the Dev mailing list