[ensembl-dev] database versions and genome builds

Bronwen Aken ba1 at sanger.ac.uk
Thu Nov 18 16:33:58 GMT 2010


Hi Andrea,

The cow core database holds the Btau4.0 assembly ** and all features within the database, including gene locations, are given relative to this assembly. 

As a rough guideline, we try to release new genebuilds when there is a new assembly released or when significant new sequence data (protein, cDNA) has been released for a species. For most species this only seems to happen every few years. The last genebuild on cow was in 2008 (see http://www.ensembl.org/Bos_taurus/Info/StatsTable?db=core) but there may have been small patches to the gene set since this genebuild.

Doing a new genebuild includes remapping all sequence data to the genome, whether it is a genebuild on a new assembly or a genebuild on an existing assembly. The genebuild process takes several months to complete. You can find details of the dates on the 'Assembly and Genebuild' page and some documentation on the main species page. From release 60, we will start to introduce more detailed documentation on our new genebuilds eg. http://www.ensembl.org/info/docs/genebuild/201006_panda_genebuild.pdf

We expect gene positions to be fairly stable between builds on the same genome assembly, although some new genes may be added and some other genes may be deleted depending on what sequence data is available. 

Gene IDs (ENSBTAG*) are stable and are versioned: 
http://www.ensembl.org/Bos_taurus/Gene/Idhistory?g=ENSBTAG00000003925;r=9:107457744-107460348;t=ENSBTAT00000005121
As far as I know, if a gene moves position then its stable_id will change. We don't explicitly add documentation for each gene by recording where it was in the last build compared to the current build.

Hope this helps,
Bronwen


**  If you're working directly with our database bos_taurus_core_60_4i, then you can tell the assembly by looking at the assembly.default entry in the meta table or even by looking directly at the database name. The database is named <species><database_type><release_number><assembly_version><data_changes>
where 
<species> = bos_taurus
<database_type> = core
<release_number> = 60
<assembly_version> = 4
<data_changes> = i (number of data changes since first build on this assembly)



On 18 Nov 2010, at 14:05, Andrea Edwards wrote:

> Hello
> 
> I was hoping to understand how the ensembl database links its data into genome builds.
> 
> At the moment I am using cow. I can see that the current release of ensembl for cow uses the Bta4.0 assembly. I presume this means that all of the gene locations etc are given relative to this genome assembly. I know this may well be stating the obvious but could someone confirm this for me?
> 
> How often does ensembl release a new build? Am i right in saying that, if the new build uses the same genome assembly, then gene positions etc should be fairly stable between database releases.
> 
> But what happens if there is a new cow assembly made in between ensembl builds? I presume that on the next build the database will be mapped to the new cow assembly and gene positions could well move (especially if it is a newly sequenced organism whose genome hasn't settled down). How is this documented if at all? Is there some information a gene record that says something like
> 
> 'the position of this gene is bases 100,000- 200,000 on chromosome 7 but in the last database build it was actually at base 75,000 - 175,000 because it was mapped to a different genome build'
> 
> or do you not say anything about it and rely on people to use things like liftovers to map from an old genome build to a new one.
> 
> I've seen documentation for how database records handle sequence changes but I don't know how they handle the fact that the position of a gene may move between genome builds.
> 
> thanks a lot
> 
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2058 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101118/ba0f6fd6/attachment.p7s>


More information about the Dev mailing list