[ensembl-announce] Intentions for Ensembl Release 64

Magali Ruffier mr6 at sanger.ac.uk
Fri Jul 15 09:05:21 BST 2011


Hi,

Please see below a list of intentions declared for Ensembl release 64 
(scheduled for September).
Please not that these are intentions and are not guaranteed to be in the 
release.


Thanks,
Magali



=======================================
Declarations of Intentions - Ensembl 64
=======================================


Compara
=======

Pairwise alignments (All Species)
---------------------------------
 ~ human vs cow lastz alignments
 ~ human vs tasmanian devil lastz alignments
 ~ human haplotype alignments for high coverage blastz-net alignments
 ~ human vs lamprey tblat alignments
 ~ lamprey vs Ciona intestinalis tblat alignments
 ~ lamprey vs Danio rerio tblat alignments
 ~ lamprey vs Gasterosteus aculeatus tblat alignments

Multiple alignments (All Species)
---------------------------------
 ~ 12way-mammal EPO alignments to incorporate new cow
 ~ 19way-amniota Pecan alignments to incorporate new cow
 ~ 35way-mammal low-coverage-EPO alignments (new cow)

Syntenies (All Species)
-----------------------
 ~ human-cow synteny

ProteinTrees and homologies (All Species)
-----------------------------------------
GeneTrees (protein-coding) with new/updated genebuilds and assemblies
 ~ Clustering using hcluster_sg
 ~ Multiple sequence alignments using MCoffee without the 
exon-disaligner module, or Mafft
 ~ Phylogenetic reconstruction using TreeBeST
 ~ Homology inference including the recent 'possible_ortholog','putative 
gene split' and 'contiguous gene split' exceptions
 ~ Pairwise gene-based dN/dS scores for high coverage species pairs only 
(both on orthologues and paralogues)
 ~ GeneTree stable ID mapping

ncRNAtrees and homologies (All Species)
---------------------------------------
 ~ Classification based on Rfam model
 ~ Multiple sequence alignments with infernal
 ~ Phylogenetic reconstruction using RaxML
 ~ Phylogenetic reconstruction using FastTree2 and RaxML-light for very 
big families
 ~ Additional multiple sequence alignments with Prank (w/ genomic flanks)
 ~ Additional phylogenetic reconstruction using PhyML and NJ
 ~ Phylogenetic tree merging using TreeBeST
 ~ Homology inference

Protein Families (All Species)
------------------------------
Updated MCL families including all Ensembl transcript isoforms and 
newest Uniprot Metazoa.
 ~ Clustering by MCL
 ~ Multiple Sequence Alignments with MAFFT
 ~ Family stable ID mapping

Compara dumps (All Species)
---------------------------
 ~ EMF dumps for 20 way PECAN multiple aligments
 ~ BED files for 20 way GERP constrained elements
 ~ EMF dumps for 12 way EPO multiple aligments
 ~ EMF dumps for 35 way low-coverage alignments
 ~ BED files for 35 way low-coverage alignments
 ~ Data dumps for ProteinTrees
 ~ Data dumps for ncRNAtrees

API/schema changes (All Species)
--------------------------------
 ~ changes in the class hierarchy (NestedSet-Member-AlignedMember) to 
achieve better flexibility and fight code redundancy
 ~ possible changes in the schema: adding a new "header" table for a 
tree root, moving general properties of the tree into that table
 ~ possible changes in the schema: adding tables for tree's properties 
and node's properties to make tag storage and extraction more efficient


Core
====

Xref projections (All Species)
------------------------------
Project GO IDs and gene names to species.

Xrefs (All Species)
-------------------
Update xrefs for human, mouse, sea squirt ciona intestinalis and ciona 
savignyi,  madagascar hedgehog, western european hedgehog, mouse lemur, 
platypus, bushbaby, shrew, squirrel

schema change (All Species)
---------------------------
is_ref will be added to the alt_allele table to show which is the 
reference gene.

Embl/ Genbank flat file dumps (All Species)
-------------------------------------------

Schema version update (All Species)
-----------------------------------
patch_63_64_a.sql
update schema version to 64

LRG import (Human)
------------------
Import of new LRG sequences


Funcgen
=======

New Regulatory Data (Human and Mouse)
-------------------------------------
 ~ New Mouse MEL cell-line regulatory build, including Dnase-Seq, and 
ChIP-Seq for p300, cMyb, USF2, Rad21, NELFe and Max. All data is from 
ENCODE, following their data policies.
 ~ New Human CD4 ChIP-Seq data for CBP, p300, MOF, PCAF, Tip60, HDAC1, 
HDAC2, HDAC3 and HDAC6 (Wang et al, 2009). A new regulatory build was 
made to incorporate this data.

MotifFeatures: Scores Rounded (All Species)
-------------------------------------------
MotifFeature scores were rounded to 3 decimal places.

MicroArray Mapping (Human, Mouse and Rat)
-----------------------------------------
Micro array mapping has been performed for those species with new 
assemblies or updated gene builds.

DNA methylation DAS tracks (Human)
----------------------------------
We have updated the set of DNA methylation DAS tracks using data 
available from the ENCODE project.

patch_63_64_a - Schema version (All Species)
--------------------------------------------
The schema_cersion entry in the meta table has been patched to version 64.

patch_63_64_b - Cell type experimental factor ontology ID (All Species)
-----------------------------------------------------------------------
The cell_type table has had an efo_id field added to represent links to 
the Experimental Factor Ontology.

patch_63_64_c - Experimental meta data (All Species)
----------------------------------------------------
A patch has been applied to add fields to capture experimental meta data 
e.g. archive & pubmed IDs


Genebuild
=========

Lamprey genome (Lamprey)
------------------------
A full annotated gene set for the lamprey genome

Human cDNA update (Human)
-------------------------
New cDNA db for human.

GRCh37.p5 (Human)
-----------------
Adding the fifth patch release for the human assembly. This alters  the 
assembly information in all human databases.

GRCh37.p5 annotation (Human)
----------------------------
Annotation of the patches in the other features db, including  
projection of annotation from the primary assembly.

Cow genebuild (Cow)
-------------------
A new genebuild has been done on the cow UMD 3.1 assembly.
A new otherfeatures database has also been prepared.

Mouse gene set update (Mouse)
-----------------------------
The merged gene set has been updated to incorporate the latest Vega 
manual annotation.

Tasmanian Devil Genome (Tasmanian devil)
----------------------------------------
A genebuild on the Tasmanian Devil 7.0 assembly has been created along 
with an otherfeatures database containing RNASeq models.

Update to Ensembl-Havana GENCODE gene set (release 9) (Human)
-------------------------------------------------------------
Update to Ensembl-Havana GENCODE gene set (release 9) based on Ensembl 
gene set and latest Havana gene annotation.

GENCODE RNA-Seq (Human)
-----------------------
-----------------------

Human Vega annotation (Human)
-----------------------------
Manual annotation of human from Havana has been updated. This 
represents  the annotation presented in Vega release 44. Annotation by 
Havana of  chromosome 14 has been completed.

Mouse Vega annotation (Mouse)
-----------------------------
Manual annotation of mouse from Havana has been updated, including 
annotation of the MHC region on chromosome 17. The data represents the 
annotation presented in Vega release 44.

Mouse cDNA update (Mouse)
-------------------------
The latest set of cDNAs for mouse (as of dd/mon/yyyy) from the
European Nucleotide Archive and NCBI RefSeq were aligned to the
current genome using Exonerate.There are nnnn new cDNA for Ensembl 64.

Flagging obsolete Uniprot proteins (All Species)
------------------------------------------------
Flag the obsolete proteins in Uniprot used as supporting evidence

Flagging obsolete Ensembl proteins (All Species)
------------------------------------------------
Flag obsolete human Ensembl proteins used as supporting evidence

Gorilla 3.1 projection (Gorilla)
--------------------------------
A new AGP file has been provided for gorilla, using the same contigs but 
with different gaps sizes. Genes from the e63 release will be projected 
onto the updated assembly.

Removal of ambiguous bases from Takifugu rubripes (Fugu)
--------------------------------------------------------
Removed the 'R' and 'Y' ambiguous bases from scaffold_20 in 
takifugu_rubripes_core_64_4.


Mart
====

Ensembl 64 mart databases (All Species)
---------------------------------------
Full build of all seven marts for release 64


Variation
=========

dbSNP 133 import (Cow)
----------------------
dbSNP Build 133 for cow based on the UMD_3.1 assembly.


LRG variant import (Human)
--------------------------
variants on LRG sequences

Phenotype annotations (Human)
-----------------------------
 ~ COSMIC import update. Minor changes in COSMIC sample names
 ~ OMIM, NHGRI GWAS catalog, UniProt and EGA updates

Schema changes (All Species)
----------------------------
 ~ Schema changes for structural variations
   * Add a structural_variation_feature table: store the coordinates
   * Modification of the structural_variation table: remove the coordinates
 ~ Additional enum in variation source table for LSDBs

Web display updates (All Species)
---------------------------------
 ~ For the variation displays, we will add more detail to the 
consequence table, including displays showing the DNA sequence change 
and amino acid changes.
 ~ Add a variation annotation panel in the Gene section

update consequences for transcript alleles (Human and Mouse)
------------------------------------------------------------
update human and mouse variation transcript alleles due to new gene builds

Structural variation (Dog, Human, Mouse and Pig)
------------------------------------------------
 ~ Update structural variation data from DGVa for Human, Mouse, Dog and Pig.
 ~ Add COSMIC structural variation data (Human).





More information about the Announce mailing list