[ensembl-announce] Intentions for Ensembl Release 64
Magali Ruffier
mr6 at sanger.ac.uk
Fri Jul 15 09:05:21 BST 2011
Hi,
Please see below a list of intentions declared for Ensembl release 64
(scheduled for September).
Please not that these are intentions and are not guaranteed to be in the
release.
Thanks,
Magali
=======================================
Declarations of Intentions - Ensembl 64
=======================================
Compara
=======
Pairwise alignments (All Species)
---------------------------------
~ human vs cow lastz alignments
~ human vs tasmanian devil lastz alignments
~ human haplotype alignments for high coverage blastz-net alignments
~ human vs lamprey tblat alignments
~ lamprey vs Ciona intestinalis tblat alignments
~ lamprey vs Danio rerio tblat alignments
~ lamprey vs Gasterosteus aculeatus tblat alignments
Multiple alignments (All Species)
---------------------------------
~ 12way-mammal EPO alignments to incorporate new cow
~ 19way-amniota Pecan alignments to incorporate new cow
~ 35way-mammal low-coverage-EPO alignments (new cow)
Syntenies (All Species)
-----------------------
~ human-cow synteny
ProteinTrees and homologies (All Species)
-----------------------------------------
GeneTrees (protein-coding) with new/updated genebuilds and assemblies
~ Clustering using hcluster_sg
~ Multiple sequence alignments using MCoffee without the
exon-disaligner module, or Mafft
~ Phylogenetic reconstruction using TreeBeST
~ Homology inference including the recent 'possible_ortholog','putative
gene split' and 'contiguous gene split' exceptions
~ Pairwise gene-based dN/dS scores for high coverage species pairs only
(both on orthologues and paralogues)
~ GeneTree stable ID mapping
ncRNAtrees and homologies (All Species)
---------------------------------------
~ Classification based on Rfam model
~ Multiple sequence alignments with infernal
~ Phylogenetic reconstruction using RaxML
~ Phylogenetic reconstruction using FastTree2 and RaxML-light for very
big families
~ Additional multiple sequence alignments with Prank (w/ genomic flanks)
~ Additional phylogenetic reconstruction using PhyML and NJ
~ Phylogenetic tree merging using TreeBeST
~ Homology inference
Protein Families (All Species)
------------------------------
Updated MCL families including all Ensembl transcript isoforms and
newest Uniprot Metazoa.
~ Clustering by MCL
~ Multiple Sequence Alignments with MAFFT
~ Family stable ID mapping
Compara dumps (All Species)
---------------------------
~ EMF dumps for 20 way PECAN multiple aligments
~ BED files for 20 way GERP constrained elements
~ EMF dumps for 12 way EPO multiple aligments
~ EMF dumps for 35 way low-coverage alignments
~ BED files for 35 way low-coverage alignments
~ Data dumps for ProteinTrees
~ Data dumps for ncRNAtrees
API/schema changes (All Species)
--------------------------------
~ changes in the class hierarchy (NestedSet-Member-AlignedMember) to
achieve better flexibility and fight code redundancy
~ possible changes in the schema: adding a new "header" table for a
tree root, moving general properties of the tree into that table
~ possible changes in the schema: adding tables for tree's properties
and node's properties to make tag storage and extraction more efficient
Core
====
Xref projections (All Species)
------------------------------
Project GO IDs and gene names to species.
Xrefs (All Species)
-------------------
Update xrefs for human, mouse, sea squirt ciona intestinalis and ciona
savignyi, madagascar hedgehog, western european hedgehog, mouse lemur,
platypus, bushbaby, shrew, squirrel
schema change (All Species)
---------------------------
is_ref will be added to the alt_allele table to show which is the
reference gene.
Embl/ Genbank flat file dumps (All Species)
-------------------------------------------
Schema version update (All Species)
-----------------------------------
patch_63_64_a.sql
update schema version to 64
LRG import (Human)
------------------
Import of new LRG sequences
Funcgen
=======
New Regulatory Data (Human and Mouse)
-------------------------------------
~ New Mouse MEL cell-line regulatory build, including Dnase-Seq, and
ChIP-Seq for p300, cMyb, USF2, Rad21, NELFe and Max. All data is from
ENCODE, following their data policies.
~ New Human CD4 ChIP-Seq data for CBP, p300, MOF, PCAF, Tip60, HDAC1,
HDAC2, HDAC3 and HDAC6 (Wang et al, 2009). A new regulatory build was
made to incorporate this data.
MotifFeatures: Scores Rounded (All Species)
-------------------------------------------
MotifFeature scores were rounded to 3 decimal places.
MicroArray Mapping (Human, Mouse and Rat)
-----------------------------------------
Micro array mapping has been performed for those species with new
assemblies or updated gene builds.
DNA methylation DAS tracks (Human)
----------------------------------
We have updated the set of DNA methylation DAS tracks using data
available from the ENCODE project.
patch_63_64_a - Schema version (All Species)
--------------------------------------------
The schema_cersion entry in the meta table has been patched to version 64.
patch_63_64_b - Cell type experimental factor ontology ID (All Species)
-----------------------------------------------------------------------
The cell_type table has had an efo_id field added to represent links to
the Experimental Factor Ontology.
patch_63_64_c - Experimental meta data (All Species)
----------------------------------------------------
A patch has been applied to add fields to capture experimental meta data
e.g. archive & pubmed IDs
Genebuild
=========
Lamprey genome (Lamprey)
------------------------
A full annotated gene set for the lamprey genome
Human cDNA update (Human)
-------------------------
New cDNA db for human.
GRCh37.p5 (Human)
-----------------
Adding the fifth patch release for the human assembly. This alters the
assembly information in all human databases.
GRCh37.p5 annotation (Human)
----------------------------
Annotation of the patches in the other features db, including
projection of annotation from the primary assembly.
Cow genebuild (Cow)
-------------------
A new genebuild has been done on the cow UMD 3.1 assembly.
A new otherfeatures database has also been prepared.
Mouse gene set update (Mouse)
-----------------------------
The merged gene set has been updated to incorporate the latest Vega
manual annotation.
Tasmanian Devil Genome (Tasmanian devil)
----------------------------------------
A genebuild on the Tasmanian Devil 7.0 assembly has been created along
with an otherfeatures database containing RNASeq models.
Update to Ensembl-Havana GENCODE gene set (release 9) (Human)
-------------------------------------------------------------
Update to Ensembl-Havana GENCODE gene set (release 9) based on Ensembl
gene set and latest Havana gene annotation.
GENCODE RNA-Seq (Human)
-----------------------
-----------------------
Human Vega annotation (Human)
-----------------------------
Manual annotation of human from Havana has been updated. This
represents the annotation presented in Vega release 44. Annotation by
Havana of chromosome 14 has been completed.
Mouse Vega annotation (Mouse)
-----------------------------
Manual annotation of mouse from Havana has been updated, including
annotation of the MHC region on chromosome 17. The data represents the
annotation presented in Vega release 44.
Mouse cDNA update (Mouse)
-------------------------
The latest set of cDNAs for mouse (as of dd/mon/yyyy) from the
European Nucleotide Archive and NCBI RefSeq were aligned to the
current genome using Exonerate.There are nnnn new cDNA for Ensembl 64.
Flagging obsolete Uniprot proteins (All Species)
------------------------------------------------
Flag the obsolete proteins in Uniprot used as supporting evidence
Flagging obsolete Ensembl proteins (All Species)
------------------------------------------------
Flag obsolete human Ensembl proteins used as supporting evidence
Gorilla 3.1 projection (Gorilla)
--------------------------------
A new AGP file has been provided for gorilla, using the same contigs but
with different gaps sizes. Genes from the e63 release will be projected
onto the updated assembly.
Removal of ambiguous bases from Takifugu rubripes (Fugu)
--------------------------------------------------------
Removed the 'R' and 'Y' ambiguous bases from scaffold_20 in
takifugu_rubripes_core_64_4.
Mart
====
Ensembl 64 mart databases (All Species)
---------------------------------------
Full build of all seven marts for release 64
Variation
=========
dbSNP 133 import (Cow)
----------------------
dbSNP Build 133 for cow based on the UMD_3.1 assembly.
LRG variant import (Human)
--------------------------
variants on LRG sequences
Phenotype annotations (Human)
-----------------------------
~ COSMIC import update. Minor changes in COSMIC sample names
~ OMIM, NHGRI GWAS catalog, UniProt and EGA updates
Schema changes (All Species)
----------------------------
~ Schema changes for structural variations
* Add a structural_variation_feature table: store the coordinates
* Modification of the structural_variation table: remove the coordinates
~ Additional enum in variation source table for LSDBs
Web display updates (All Species)
---------------------------------
~ For the variation displays, we will add more detail to the
consequence table, including displays showing the DNA sequence change
and amino acid changes.
~ Add a variation annotation panel in the Gene section
update consequences for transcript alleles (Human and Mouse)
------------------------------------------------------------
update human and mouse variation transcript alleles due to new gene builds
Structural variation (Dog, Human, Mouse and Pig)
------------------------------------------------
~ Update structural variation data from DGVa for Human, Mouse, Dog and Pig.
~ Add COSMIC structural variation data (Human).
More information about the Announce
mailing list