[ensembl-dev] database versions and genome builds

Andrea Edwards edwardsa at cs.man.ac.uk
Thu Nov 18 14:05:46 GMT 2010


Hello

I was hoping to understand how the ensembl database links its data into 
genome builds.

At the moment I am using cow. I can see that the current release of 
ensembl for cow uses the Bta4.0 assembly. I presume this means that all 
of the gene locations etc are given relative to this genome assembly. I 
know this may well be stating the obvious but could someone confirm this 
for me?

How often does ensembl release a new build? Am i right in saying that, 
if the new build uses the same genome assembly, then gene positions etc 
should be fairly stable between database releases.

But what happens if there is a new cow assembly made in between ensembl 
builds? I presume that on the next build the database will be mapped to 
the new cow assembly and gene positions could well move (especially if 
it is a newly sequenced organism whose genome hasn't settled down). How 
is this documented if at all? Is there some information a gene record 
that says something like

'the position of this gene is bases 100,000- 200,000 on chromosome 7 but 
in the last database build it was actually at base 75,000 - 175,000 
because it was mapped to a different genome build'

or do you not say anything about it and rely on people to use things 
like liftovers to map from an old genome build to a new one.

I've seen documentation for how database records handle sequence changes 
but I don't know how they handle the fact that the position of a gene 
may move between genome builds.

thanks a lot




More information about the Dev mailing list