[ensembl-dev] database versions and genome builds
Andrea Edwards
edwardsa at cs.man.ac.uk
Thu Nov 18 14:05:46 GMT 2010
Hello
I was hoping to understand how the ensembl database links its data into
genome builds.
At the moment I am using cow. I can see that the current release of
ensembl for cow uses the Bta4.0 assembly. I presume this means that all
of the gene locations etc are given relative to this genome assembly. I
know this may well be stating the obvious but could someone confirm this
for me?
How often does ensembl release a new build? Am i right in saying that,
if the new build uses the same genome assembly, then gene positions etc
should be fairly stable between database releases.
But what happens if there is a new cow assembly made in between ensembl
builds? I presume that on the next build the database will be mapped to
the new cow assembly and gene positions could well move (especially if
it is a newly sequenced organism whose genome hasn't settled down). How
is this documented if at all? Is there some information a gene record
that says something like
'the position of this gene is bases 100,000- 200,000 on chromosome 7 but
in the last database build it was actually at base 75,000 - 175,000
because it was mapped to a different genome build'
or do you not say anything about it and rely on people to use things
like liftovers to map from an old genome build to a new one.
I've seen documentation for how database records handle sequence changes
but I don't know how they handle the fact that the position of a gene
may move between genome builds.
thanks a lot
More information about the Dev
mailing list