[ensembl-dev] Ensembl versions diffing tool

Kiran Mukhyala mukhyala.kiran at gene.com
Fri Jan 24 19:59:07 GMT 2014


Hi Magali,
The annotation page very informative, I wasn't aware of it.
I think processing the data and crunching the numbers is the best bet, but
we'll take a look at the healthcheck SQLs, the gene_archive table and the
genome_statistics table when it comes out.
This is more than I expected, Thanks!

-Kiran


On Fri, Jan 24, 2014 at 3:28 AM, mag <mr6 at ebi.ac.uk> wrote:

>  Hi Kiran,
>
> If you are familiar with the Ensembl healthchecks, you are probably aware
> that they mostly use SQL calls.
> Hence, mysql queries or API calls should be able to give you all the
> numbers you need.
>
> The key factor here is to have access to two sets of databases, the
> current and the previous release.
> Once you have that, you should be able to run all the comparisons you want.
>
> Another solution would be to process the databases each release and store
> the results somewhere.
> Then you would be able to compare a release with any prior one.
>
> Keeping track of how many gene models have changed, especially if working
> with human, is a relatively tricky task.
> The stable id mapping would probably be the best way to go.
> In the gene_archive table, you can get a list of all genes which have
> changed from the previous release.
> This includes version changes or complete retirement.
> For example, select count(distinct gene_stable_id) from gene_archive where
> mapping_session_id = 395 ;
> indicates that 693 genes have changed from release 73 to 74.
>
> For other statistics, we do try and include them on our annotation page:
> http://www.ensembl.org/Homo_sapiens/Info/Annotation#assembly
> This displays number of genes by biotype groups, total number of
> variations and assembly version.
> If this does not cover all the numbers you are looking for, we will
> happily take suggestions into consideration.
>
> Also, from release 75 onwards, these statistics will also be available
> directly from the database, stored in the genome_statistics table.
>
>
> Regards,
> Magali
>
>
> On 23/01/2014 22:51, Kiran Mukhyala wrote:
>
> Hello,
>
>  I am looking for a way to summarize the differences between two versions
> of Ensembl databases for a given species.
> Specifically things like the total number of genes, how many gene models
> have changed, number of genes with PFAM domains, number of protein coding
> genes, number of variations from various sources, number of homologs in
> species X etc.
>
>  I am aware of two ways to do this:
>
>  1. By reading the release details page for each version that I am
> interested in, which doesn't really give me the numbers I am looking for.
>  2. Using Ensembl healthcheck which I assume is hard to customize.
>
>  Are there any other tools for accomplishing this? If not, would a tool
> like that be useful to anyone else?
>
>  Thanks,
> -Kiran
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140124/e21eed02/attachment.html>


More information about the Dev mailing list