[ensembl-announce] Intentions for Ensembl release 63

Rhoda Kinsella rhoda at ebi.ac.uk
Mon May 9 13:55:09 BST 2011


Hi,
Please see below a list of intentions declared for Ensembl release 63  
(scheduled for the end of June). Please note that these are intentions  
and are not guaranteed to be in the release.
Regards,
Rhoda Kinsella

=======================================
Declarations of Intentions - Ensembl 63
=======================================


Compara
=======

pairwise alignments (All Species)
---------------------------------


human vs panda lastz

human vs marmoset lastz

human vs microbat lastz

human vs cow lastz

human haplotype alignments for high coverage blastz-net alignments





multiple alignments (All Species)
---------------------------------


6way-primate-epo alignments to incorporate new marmoset

12way-mammal-epo alignments to incorporate new marmoset and cow

19way-amniota-pecan alignments to incorporate new marmoset and cow

35way-mammal low-coverage-epo alignments ( new marmoset, panda, cow  
and microbat )

5way-fish (new mappings with HMM derived anchors)





syntenies (All Species)
-----------------------


human marmoset synteny

human cow synteny





ProteinTrees and homologies (All Species)
-----------------------------------------

GeneTrees (protein-coding) with new/updated genebuilds and assemblies


Clustering using hcluster_sg

Multiple sequence alignments using MCoffee, without the exon- 
disaligner module (AKA decaf)

Phylogenetic reconstruction using TreeBeST

Homology inference including the recent 'possible_ortholog','putative  
gene split' and 'contiguous gene split' exceptions

Pairwise gene-based dN/dS scores for high coverage species pairs only

GeneTree stable ID mapping





ncRNAtrees and homologies (All Species)
---------------------------------------


Classification based on RFAM model

Multiple sequence alignments with infernal

Phylogenetic reconstruction using RaxML

Additional multiple sequence alignments with Prank (w/ genomic flanks)

Additional phylogenetic reconstruction using PnyML and NJ

Phylogenetic tree merging using TreeBeST

Homology inference





families (All Species)
----------------------


Clustering by MCL

Multiple Sequence Alignments with MAFFT

Family stable ID mapping





data dumps (All Species)
------------------------


EMF dumps for 19 way PECAN multiple aligments

BED files for 19 way GERP constrained elements

EMF dumps for 12 way EPO multiple aligments

EMF dumps for 35 way low-coverage alignments

BED files for 35 way low-coverage alignments

EMF dumps for 6 way EPO primate multiple aligments

Data dumps for ProteinTrees

Data dumps for ncRNAtrees





schema changes (All Species)
----------------------------

The linking table 'species_set' is renamed into 'species_set_genome_db'

There will be a new, "header" table called 'species_set' for which  
'species_set_id' will be [obviously, unique] primary key.

species_set.species_set_id will become a foreign key for  
species_set_genome_db.species_set_id




Core
====

xref projection (All Species)
-----------------------------
Project GO ids and gene names to species. Make alterations to  
zebrafish projections.

EMBL & Genbank dumps (All Species)
----------------------------------
EMBL & Genbank dumps for all species

Update schema version (All Species)
-----------------------------------


# Description:

#   Update schema_version in meta table to 63.



UPDATE meta SET meta_value='63' WHERE meta_key='schema_version';



# Patch identifier

INSERT INTO meta (species_id, meta_key, meta_value)

   VALUES (NULL, 'patch', 'patch_62_63_a.sql|schema_version');



Indexing changes for core database. (All Species)
-------------------------------------------------


Change stable Id version to not null, default 1 in exon_stable_id,  
gene_stable_id, transcript_stable_id, translation_stable_id,  
gene_archive.

Create unique index on stable_id and verision for tables  
exon_stable_id, gene_stable_id, transcript_stable_id,  
translation_stable_id, gene_archive.

Create a unique index for table umapped_object.



Remove field dbprimary_acc_linkable from external_db table. (All  
Species)
-------------------------------------------------------------------------
Remove field dbprimary_acc_linkable from external_db table.

Import of new LRGs (Human)
--------------------------


Removal of lowercase letter at the end of all database names (All  
Species)
--------------------------------------------------------------------------
For release 63 it has been decided to remove the lowercase letter at  
the end of all database names as it is confusing for users and  
provides little meaning about actual data changes.

Update xrefs for core databases (All Species)
---------------------------------------------
Update xrefs for Human, Mouse, Rat, Pig, Macaque, Chimp, Orangutan,  
Fugu and Stickleback.

API web documentation overhaul (All Species)
--------------------------------------------
Replace PDoc system with Doxygen + Perl Filter to produce API  
reference web pages.

xref sources to be moved to gene level (All Species)
----------------------------------------------------
The following External database sources have been moved up to the Gene  
level :-

DBASS3, DBASS5, EntrezGene, miRBase, RFAM, UniGene, Uniprot_genename,  
WikiGene, MIM_GENE and  MIM_MORBID.


Funcgen
=======

Updated Regulation API Tutorial (All Species)
---------------------------------------------


patch_62_63_a - Schema Version (All Species)
--------------------------------------------
This patch updates the schema version

patch_62_63_b - binding_matrix.analysis_id (All Species)
--------------------------------------------------------
This patch updates the the analysis_id field of the binding_matrix  
tables to a smallint

MicroArray Mapping (All Species)
--------------------------------
Mapping of expression arrays to Ensembl Transcripts has been updated  
for all relevant species i.e. those with new assemblies or gene builds.

RegulatoryFeatureAdaptor::fetch_all (All Species)
-------------------------------------------------
The base fetch_all method has been over-ridden for the  
RegulatoryFeatureAdaptor, this now defaults to returning only the  
MultiCell RegulatoryFeatures, as the other generic methods do.

ResultFeatureAdaptor method over-rides (All Species)
----------------------------------------------------
Where appropriate some of the base feature adaptor methods have been  
over-riden, this prevents some API errors due to the nature of the  
ResultFeature storage

Added Motif Features for Missing Jaspar Matrices (Human)
--------------------------------------------------------
Motif Features were added for the following Jaspar Matrices:



E2F1: MA0024.1

NFKB: MA0105.1

BHLHE40: PB0111.1;PB0007.1

Nrsf: MA0138.1





Changed Motif Features score to [0-1] relative affinity scale (Human  
and Mouse)
-------------------------------------------------------------------------------
Instead of showing the absolute score from the MOODs software, we now  
display a [0-1] linear relative value between the maximum (1) and  
minimum (0) score. This is to make it coherent with the API  
BindingMatrix::relative_affinity function and to make it easier for  
the user to interpret the score.

Added a threshold field to the BindingMatrix table (All Species)
----------------------------------------------------------------
A new threshold float field was added to the Binding Matrix to store  
the minimum score for Motif Features from each matrix (patch_62_63_c).

Added species-specific thresholds to Binding Matrices (Human and Mouse)
-----------------------------------------------------------------------
Added to each Binding Matrix the lowest score for Motif Features  
belonging to that matrix and that species. This will make it easier  
for people using the API to know if the potential binding affinity for  
a given sequence goes above the currently used threshold (ie would be  
classified as a binding site).

Cleaned Regulatory Regions in chromosomal boundaries (Human and Mouse)
----------------------------------------------------------------------
In some rare cases, regulatory regions can pass the boundaries of  
sequence regions (like chromosomes). These cases will be removed as  
they are likely to be artifactual.

Update of Regulation Metadata (All Species)
-------------------------------------------

CTCF is now classified generically as a "Transcription Factor" instead  
of "Insulator"




Genebuild
=========

New microbat assembly (Microbat)
--------------------------------
A full gene annotation on the new high coverage microbat assembly,  
Myoluc2.0

Removed duplicated dna in panda (Panda)
---------------------------------------
Scaffold dna sequences removed from the dna table

Rabbit xrefs (Rabbit)
---------------------
Missing xrefs added for ncRNAs

Human Vega annotation (Human)
-----------------------------
Manual annotation of human from Havana has been updated. This  
represents the annotation presented in Vega release 43

Zebrafish Vega annotation (Zebrafish)
-------------------------------------
Manual annotation of zebrafish from Havana has been updated. This  
represents the annotation presented in Vega release 43

GRCh37.p4 (Human)
-----------------
GRCh37.p4 added to the human databases.

GRCh37.p4 annotation (Human)
----------------------------
Gene annotation of the patches in the otherfeatures db.

Human cDNA update (Human)
-------------------------
A new cDNA db for human.

Mouse cDNA update (Mouse)
-------------------------
A new cDNA db for mouse.

New Cow Assembly (Cow)
----------------------
The first genebuild on cow assembly UMD3.1.

Update to Ensembl-Havana GENCODE gene set (release 8) (Human)
-------------------------------------------------------------
Update to Ensembl-Havana GENCODE gene set (release 8) - this is based  
on updated Ensembl gene set and latest Havana gene annotation.

Flagging obsolete Uniprot proteins (All Species)
------------------------------------------------
Flag the obsolete proteins in Uniprot used as supporting evidence

Flagging obsolete Ensembl proteins (All Species)
------------------------------------------------
Flag obsolete human Ensembl proteins used as supporting evidence

Logic name update (All Species)
-------------------------------
Whenever possible, logic names updated to be consistent across all  
databases

Zebrafish Vega merge (Zebrafish)
--------------------------------
A new Vega gene set has been merged with the Ensembl geneset from  
release 61.


Mart
====

BioMart 63 databases (All Species)
----------------------------------
Full build of all 7 marts.


Variation
=========

New rhesus macaque variation database (Macaque)
-----------------------------------------------
Based on dbSNP 131

Updates to human phenotype associations (Human)
-----------------------------------------------
OMIM, UniProt, NHGRI GWAS catalog, HGMD mutations, COSMIC

New mouse variation database (Mouse)
------------------------------------
Based on dbSNP 132

Add attrib_id column to variation_set (All Species)
---------------------------------------------------
An attrib_id column is added to variation_set in order to be able to  
provide general and human-friendly names to variation sets without  
breaking the web display.

Update structural variation data from DGVa (Dog, Human, Macaque, Mouse  
and Pig)
-------------------------------------------------------------------------------
DGVa

Schema changes (All Species)
----------------------------
# structural variation schema changes:

- Change the columns name from bound_start to inner_start and  
bound_end to inner_end

- Add a column for validation status

- Change the column class to class_attrib_id, using more detailled SO  
terms.


# moved failed descriptions into attribute table

LRG data (Human)
----------------
import LRG variant data

add LRG consequences to the database

New individual genotypes (Human)
--------------------------------
Individual genotypes from Penn State                      University:


Han Chinese Individual (YanHuang Project)

Seong-Jin Kim (SJK, GUMS/KOBIC)

Anonymous Irish Male

Individual from the Extinct Palaeo-Eskimo Saqqaq (Saqqaq Genome Project)

Individual from the Extinct Palaeo-Eskimo Saqqaq, high confidence SNPs  
(Saqqaq Genome Project)

Anonymous Korean individual, AK1 (Genomic Medicine Institute) :  
Individual genotype

Misha Angrist (Personal Genome Project)

Henry Louis Gates Jr (Personal Genome Project)

Henry Louis Gates Sr (Personal Genome Project)

Rosalynn Gill (Personal Genome Project)

Marjolein Kriek (Leiden University Medical Centre)

Stephen Quake (Stanford)



update variation consequences (Cow, Zebrafish and Human)
--------------------------------------------------------
update variation consequences on human, zebrafish and cow due to new  
gene sets


EnsemblGenomes
==============

New core database for Yeast (Yeast)
-----------------------------------
New core database for Saccharomyces cerevisiae to reflect the new  
assembly and genebuild from SGD

New otherfeatures database for Yeast (Yeast)
--------------------------------------------
Rebuilt otherfeatures database with  new EST alignments reflecting the  
new assembly from SGD.

New funcgen database for Yeast (Yeast)
--------------------------------------
New functional genomics database for Yeast with new probe mapping to  
reflect the assembly update from SGD.

New variation database for Yeast (Yeast)
----------------------------------------
New variation database for Yeast with mapped variation features to  
reflect the latest assembly from SGD.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/announce_ensembl.org/attachments/20110509/cbee0b09/attachment.html>


More information about the Announce mailing list