[ensembl-announce] Intentions for Ensembl release 63
Rhoda Kinsella
rhoda at ebi.ac.uk
Mon May 9 13:55:09 BST 2011
Hi,
Please see below a list of intentions declared for Ensembl release 63
(scheduled for the end of June). Please note that these are intentions
and are not guaranteed to be in the release.
Regards,
Rhoda Kinsella
=======================================
Declarations of Intentions - Ensembl 63
=======================================
Compara
=======
pairwise alignments (All Species)
---------------------------------
human vs panda lastz
human vs marmoset lastz
human vs microbat lastz
human vs cow lastz
human haplotype alignments for high coverage blastz-net alignments
multiple alignments (All Species)
---------------------------------
6way-primate-epo alignments to incorporate new marmoset
12way-mammal-epo alignments to incorporate new marmoset and cow
19way-amniota-pecan alignments to incorporate new marmoset and cow
35way-mammal low-coverage-epo alignments ( new marmoset, panda, cow
and microbat )
5way-fish (new mappings with HMM derived anchors)
syntenies (All Species)
-----------------------
human marmoset synteny
human cow synteny
ProteinTrees and homologies (All Species)
-----------------------------------------
GeneTrees (protein-coding) with new/updated genebuilds and assemblies
Clustering using hcluster_sg
Multiple sequence alignments using MCoffee, without the exon-
disaligner module (AKA decaf)
Phylogenetic reconstruction using TreeBeST
Homology inference including the recent 'possible_ortholog','putative
gene split' and 'contiguous gene split' exceptions
Pairwise gene-based dN/dS scores for high coverage species pairs only
GeneTree stable ID mapping
ncRNAtrees and homologies (All Species)
---------------------------------------
Classification based on RFAM model
Multiple sequence alignments with infernal
Phylogenetic reconstruction using RaxML
Additional multiple sequence alignments with Prank (w/ genomic flanks)
Additional phylogenetic reconstruction using PnyML and NJ
Phylogenetic tree merging using TreeBeST
Homology inference
families (All Species)
----------------------
Clustering by MCL
Multiple Sequence Alignments with MAFFT
Family stable ID mapping
data dumps (All Species)
------------------------
EMF dumps for 19 way PECAN multiple aligments
BED files for 19 way GERP constrained elements
EMF dumps for 12 way EPO multiple aligments
EMF dumps for 35 way low-coverage alignments
BED files for 35 way low-coverage alignments
EMF dumps for 6 way EPO primate multiple aligments
Data dumps for ProteinTrees
Data dumps for ncRNAtrees
schema changes (All Species)
----------------------------
The linking table 'species_set' is renamed into 'species_set_genome_db'
There will be a new, "header" table called 'species_set' for which
'species_set_id' will be [obviously, unique] primary key.
species_set.species_set_id will become a foreign key for
species_set_genome_db.species_set_id
Core
====
xref projection (All Species)
-----------------------------
Project GO ids and gene names to species. Make alterations to
zebrafish projections.
EMBL & Genbank dumps (All Species)
----------------------------------
EMBL & Genbank dumps for all species
Update schema version (All Species)
-----------------------------------
# Description:
# Update schema_version in meta table to 63.
UPDATE meta SET meta_value='63' WHERE meta_key='schema_version';
# Patch identifier
INSERT INTO meta (species_id, meta_key, meta_value)
VALUES (NULL, 'patch', 'patch_62_63_a.sql|schema_version');
Indexing changes for core database. (All Species)
-------------------------------------------------
Change stable Id version to not null, default 1 in exon_stable_id,
gene_stable_id, transcript_stable_id, translation_stable_id,
gene_archive.
Create unique index on stable_id and verision for tables
exon_stable_id, gene_stable_id, transcript_stable_id,
translation_stable_id, gene_archive.
Create a unique index for table umapped_object.
Remove field dbprimary_acc_linkable from external_db table. (All
Species)
-------------------------------------------------------------------------
Remove field dbprimary_acc_linkable from external_db table.
Import of new LRGs (Human)
--------------------------
Removal of lowercase letter at the end of all database names (All
Species)
--------------------------------------------------------------------------
For release 63 it has been decided to remove the lowercase letter at
the end of all database names as it is confusing for users and
provides little meaning about actual data changes.
Update xrefs for core databases (All Species)
---------------------------------------------
Update xrefs for Human, Mouse, Rat, Pig, Macaque, Chimp, Orangutan,
Fugu and Stickleback.
API web documentation overhaul (All Species)
--------------------------------------------
Replace PDoc system with Doxygen + Perl Filter to produce API
reference web pages.
xref sources to be moved to gene level (All Species)
----------------------------------------------------
The following External database sources have been moved up to the Gene
level :-
DBASS3, DBASS5, EntrezGene, miRBase, RFAM, UniGene, Uniprot_genename,
WikiGene, MIM_GENE and MIM_MORBID.
Funcgen
=======
Updated Regulation API Tutorial (All Species)
---------------------------------------------
patch_62_63_a - Schema Version (All Species)
--------------------------------------------
This patch updates the schema version
patch_62_63_b - binding_matrix.analysis_id (All Species)
--------------------------------------------------------
This patch updates the the analysis_id field of the binding_matrix
tables to a smallint
MicroArray Mapping (All Species)
--------------------------------
Mapping of expression arrays to Ensembl Transcripts has been updated
for all relevant species i.e. those with new assemblies or gene builds.
RegulatoryFeatureAdaptor::fetch_all (All Species)
-------------------------------------------------
The base fetch_all method has been over-ridden for the
RegulatoryFeatureAdaptor, this now defaults to returning only the
MultiCell RegulatoryFeatures, as the other generic methods do.
ResultFeatureAdaptor method over-rides (All Species)
----------------------------------------------------
Where appropriate some of the base feature adaptor methods have been
over-riden, this prevents some API errors due to the nature of the
ResultFeature storage
Added Motif Features for Missing Jaspar Matrices (Human)
--------------------------------------------------------
Motif Features were added for the following Jaspar Matrices:
E2F1: MA0024.1
NFKB: MA0105.1
BHLHE40: PB0111.1;PB0007.1
Nrsf: MA0138.1
Changed Motif Features score to [0-1] relative affinity scale (Human
and Mouse)
-------------------------------------------------------------------------------
Instead of showing the absolute score from the MOODs software, we now
display a [0-1] linear relative value between the maximum (1) and
minimum (0) score. This is to make it coherent with the API
BindingMatrix::relative_affinity function and to make it easier for
the user to interpret the score.
Added a threshold field to the BindingMatrix table (All Species)
----------------------------------------------------------------
A new threshold float field was added to the Binding Matrix to store
the minimum score for Motif Features from each matrix (patch_62_63_c).
Added species-specific thresholds to Binding Matrices (Human and Mouse)
-----------------------------------------------------------------------
Added to each Binding Matrix the lowest score for Motif Features
belonging to that matrix and that species. This will make it easier
for people using the API to know if the potential binding affinity for
a given sequence goes above the currently used threshold (ie would be
classified as a binding site).
Cleaned Regulatory Regions in chromosomal boundaries (Human and Mouse)
----------------------------------------------------------------------
In some rare cases, regulatory regions can pass the boundaries of
sequence regions (like chromosomes). These cases will be removed as
they are likely to be artifactual.
Update of Regulation Metadata (All Species)
-------------------------------------------
CTCF is now classified generically as a "Transcription Factor" instead
of "Insulator"
Genebuild
=========
New microbat assembly (Microbat)
--------------------------------
A full gene annotation on the new high coverage microbat assembly,
Myoluc2.0
Removed duplicated dna in panda (Panda)
---------------------------------------
Scaffold dna sequences removed from the dna table
Rabbit xrefs (Rabbit)
---------------------
Missing xrefs added for ncRNAs
Human Vega annotation (Human)
-----------------------------
Manual annotation of human from Havana has been updated. This
represents the annotation presented in Vega release 43
Zebrafish Vega annotation (Zebrafish)
-------------------------------------
Manual annotation of zebrafish from Havana has been updated. This
represents the annotation presented in Vega release 43
GRCh37.p4 (Human)
-----------------
GRCh37.p4 added to the human databases.
GRCh37.p4 annotation (Human)
----------------------------
Gene annotation of the patches in the otherfeatures db.
Human cDNA update (Human)
-------------------------
A new cDNA db for human.
Mouse cDNA update (Mouse)
-------------------------
A new cDNA db for mouse.
New Cow Assembly (Cow)
----------------------
The first genebuild on cow assembly UMD3.1.
Update to Ensembl-Havana GENCODE gene set (release 8) (Human)
-------------------------------------------------------------
Update to Ensembl-Havana GENCODE gene set (release 8) - this is based
on updated Ensembl gene set and latest Havana gene annotation.
Flagging obsolete Uniprot proteins (All Species)
------------------------------------------------
Flag the obsolete proteins in Uniprot used as supporting evidence
Flagging obsolete Ensembl proteins (All Species)
------------------------------------------------
Flag obsolete human Ensembl proteins used as supporting evidence
Logic name update (All Species)
-------------------------------
Whenever possible, logic names updated to be consistent across all
databases
Zebrafish Vega merge (Zebrafish)
--------------------------------
A new Vega gene set has been merged with the Ensembl geneset from
release 61.
Mart
====
BioMart 63 databases (All Species)
----------------------------------
Full build of all 7 marts.
Variation
=========
New rhesus macaque variation database (Macaque)
-----------------------------------------------
Based on dbSNP 131
Updates to human phenotype associations (Human)
-----------------------------------------------
OMIM, UniProt, NHGRI GWAS catalog, HGMD mutations, COSMIC
New mouse variation database (Mouse)
------------------------------------
Based on dbSNP 132
Add attrib_id column to variation_set (All Species)
---------------------------------------------------
An attrib_id column is added to variation_set in order to be able to
provide general and human-friendly names to variation sets without
breaking the web display.
Update structural variation data from DGVa (Dog, Human, Macaque, Mouse
and Pig)
-------------------------------------------------------------------------------
DGVa
Schema changes (All Species)
----------------------------
# structural variation schema changes:
- Change the columns name from bound_start to inner_start and
bound_end to inner_end
- Add a column for validation status
- Change the column class to class_attrib_id, using more detailled SO
terms.
# moved failed descriptions into attribute table
LRG data (Human)
----------------
import LRG variant data
add LRG consequences to the database
New individual genotypes (Human)
--------------------------------
Individual genotypes from Penn State University:
Han Chinese Individual (YanHuang Project)
Seong-Jin Kim (SJK, GUMS/KOBIC)
Anonymous Irish Male
Individual from the Extinct Palaeo-Eskimo Saqqaq (Saqqaq Genome Project)
Individual from the Extinct Palaeo-Eskimo Saqqaq, high confidence SNPs
(Saqqaq Genome Project)
Anonymous Korean individual, AK1 (Genomic Medicine Institute) :
Individual genotype
Misha Angrist (Personal Genome Project)
Henry Louis Gates Jr (Personal Genome Project)
Henry Louis Gates Sr (Personal Genome Project)
Rosalynn Gill (Personal Genome Project)
Marjolein Kriek (Leiden University Medical Centre)
Stephen Quake (Stanford)
update variation consequences (Cow, Zebrafish and Human)
--------------------------------------------------------
update variation consequences on human, zebrafish and cow due to new
gene sets
EnsemblGenomes
==============
New core database for Yeast (Yeast)
-----------------------------------
New core database for Saccharomyces cerevisiae to reflect the new
assembly and genebuild from SGD
New otherfeatures database for Yeast (Yeast)
--------------------------------------------
Rebuilt otherfeatures database with new EST alignments reflecting the
new assembly from SGD.
New funcgen database for Yeast (Yeast)
--------------------------------------
New functional genomics database for Yeast with new probe mapping to
reflect the assembly update from SGD.
New variation database for Yeast (Yeast)
----------------------------------------
New variation database for Yeast with mapped variation features to
reflect the latest assembly from SGD.
Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/announce_ensembl.org/attachments/20110509/cbee0b09/attachment.html>
More information about the Announce
mailing list