[ensembl-announce] Intentions for Ensembl release 61
st3 at sanger.ac.uk
Thu Nov 25 17:12:31 GMT 2010
Please see below a summary for the intentions declared for Ensembl 61
(scheduled for 19th January). Note these are intentions and are not
guaranteed to be in the release
Updated MCL families including all Ensembl transcript isoforms and
newest Uniprot Metazoa.
* Clustering by MCL
* Multiple Sequence Alignments with MAFFT
* Family stable ID mapping
GeneTrees with new/updated genebuilds and assemblies
* Updated build of ncRNA trees
* Clustering using hcluster_sg * Multiple Sequence Alignments
using consistency-based MCoffee meta-aligner
* Homology inference including the recent 'possible_ortholog' type
and 'putative gene split' and
'contiguous gene split' exceptions
* Pairwise gene-based dN/dS calculations for high coverage species
* GeneTree stable ID mapping
Human - Lizard tBlat - net
Human - Turkey tBlat net
Turkey - Chicken Lastz
Lizard - Chicken Lastz
Dog - Horse Lastz
**Removing chicken - zebrafinch tBlat
Chicken - Turkey -Zebrafinch EPO multiple alignment
seq region synonyms-
New table seq_region_synonym added to allow multiple names for sequence
Species: all species
external database references-
Human, mouse, rat and tree shrew will be updated.
GO term and gene name projections-
Gene display names and GO terms will be projected from high-coverage
species to those with lower coverage.
The Ensembl Ontology database will as usual be populated with the latest
available versions of the
* Gene Ontology (GO)
* Sequence Ontology (SO)
embl and genbank dumps-
Onlt the reference sequence will be dumped in the main directory for
embl and genbank. Unique non-reference regions(haplotype/par regions)
will now be dumped in a subdirectory and only contain the unique regions.
Array mapping was updated on all species which have had an update to
their genome assemblies or gene builds. The probe/set to transcript
xrefs were recalculated across all species.
Mouse Regulatory Build-
The mouse RegulatoryBuild was re-run to re-introduce some data which had
been omitted in the previous build.
Human cDNA update-
Updated set of cDNA alignments to the human genome.
Correction of an error that added one extra N to the end of the
alternative versions of the chromosomes for five of the haplotypes. The
altered alternative chromosomes are: HSCHR6_MHC_MANN, HSCHR6_MHC_MCF,
HSCHR6_MHC_SSTO, HSCHR4_1 and HSCHR17_1.
Zebrafish Havana merge-
A merge of the zebrafish core gene set with Havana manual annotation.
The core gene set has been altered to include missing genes that were
lost in e60 due to a problem in gene clustering.
GENCODE gene set update (release 6)-
Update to the Ensembl/Havana GENCODE gene set using the latest Vega gene
Updates to mouse and human Vega annotation-
The Vega annotation for both human and mouse has been updated. This
matches the annotation presented in Vega release 41.
new rnaseq database-
I will provide a new databases which consists of the core tables ; the
data will data from the human bodymap project ( rnasesq data ). This is
a new database which has not been released before. This was originally
planned for e60.
mouse cDNA update-
mouse cDNA update
Zebrafish Vega annotation-
Manual annotation of zebrafish from Havana is now present in Ensembl.
This represetns the annotation presented in Vega release 40
Mouse gene set update-
A merge of Ensembl core gene set and Vega manual annotation.
The core gene set has been improved by incorporating new data resources
which had become available since the last NCBIM37 genebuild (April
2007), resulting in the correction of existing gene models and the
recovery of new mouse genes with human orthologues.
A new otherfeatures database is also available.
New assembly for lizard-
A new assembly for lizard
The first genebuild for turkey
New Canonical Transcript definition-
For previous releases, the canonical transcript of a gene has been set
to the transcript with the longest translation (for coding genes) or to
the transcript with the longest mRNA (for noncoding genes). From release
61, the canonical transcript for human and mouse will now be set to the
longest CCDS transcript. Where no CCDS transcript exists for the gene,
the longest Ensembl-HAVANA merge transcript will be used.
Species: Human, Mouse
Removal of ambiguous bases from human DNA sequence-
Ambiguous bases have been replaced with 'N' for the following two human
* contig::AF152363.1:1:185763:1. This contig held 28 ambiguous bases:
S(4), W(6), M(5), K(4), R(5), Y(4).
* contig::AF152364.1:1:170452:1. This contig held 4 ambiguous bases:
S(1), W(1), Y(1), K(1).
Updated CCDS databases for Human and Mouse. Populates other_features
with new gene models and serves data for CCDS Public Note DAS track.
Ensembl Marts for release 61-
Full build of all 7 marts for all species.
- import dbSNP 132 (human)
- import dbSNP for further species if available in time (mouse, rat,
zebrafish, cat, opossum)
- import new release of HGMD database
- corrections to Affymetrix CNV probe data
- import PorcineSNP60 BeadChip
- update of zebrafish variation consequences for new gene build
- variations will now be flagged and retained instead of failed and
deleted for species with a new import of dbSNP
- produce GVF file dumps of all variants and their consequence by species
A new version of the C.elegans database based on the official frozen
WS220 WormBase release.
More information about the Announce