[ensembl-announce] Intentions for Ensembl release 62
Daniel Sobral
sobral at ebi.ac.uk
Thu Feb 24 13:34:15 GMT 2011
Please see below a list of intentions declared for Ensembl 62 (scheduled
for mid April).
Note these are intentions and are not guaranteed to be in the release.
Regards,
Daniel Sobral
=======================================
Declarations of Intentions - Ensembl 62
=======================================
Compara
=======
Families (all species)
----------------------
Updated MCL families including all Ensembl transcript isoforms and
newest Uniprot Metazoa.
* Clustering by MCL
* Multiple Sequence Alignments with MAFFT
* Family stable ID mapping
Gene Homologies (all species)
-----------------------------
GeneTrees (protein-coding) with new/updated genebuilds and assemblies
* Clustering using hcluster_sg
* Multiple sequence alignments using MCoffee
* Phylogenetic reconstruction using TreeBeST
* Homology inference including the recent 'possible_ortholog', 'putative
gene split' and 'contiguous gene split' exceptions * Pairwise gene-based
dN/dS scores for high coverage species pairs only
* GeneTree stable ID mapping
GeneTrees (ncRNA) with new/updated genebuilds and assemblies (all species)
--------------------------------------------------------------------------
* Classification based on RFAM model
* Multiple sequence alignments with infernal
* Phylogenetic reconstruction using RaxML
* Additional multiple sequence alignments with Prank (w/ genomic flanks)
* Additional phylogenetic reconstruction using PhyML and NJ
* Phylogenetic tree merging using TreeBeST
* Homology inference
Pairwise Alignments (all species)
---------------------------------
* Non-reference alignments for human vs high coverage blastz-net
* human vs gibbon lastz.
* human vs marmoset lastz
* human vs rabbit lastz
* xenopus vs mouse tblat-net
* xenopus vs chicken tblat-net
* xenopus vs tetraodon tblat-net
* xenopus vs human tblat-net
* xenopus vs danio tblat-net
Multiple alignments (all species)
---------------------------------
* update 6way-primate-epo alignments to incorporate new marmoset
seq_region names
* update 12way-mammal-epo alignments to incorporate new marmoset
seq_region names
* update 19way-amniota-pecan alignments to incorporate new marmoset
seq_region names
* 35way-mammal low-coverage-epo alignments (addition of gibbon and new
marmoset seq_region names)
schema changes (all species)
----------------------------
* meta.meta_value has been extended to TEXT (previously it was VARCHAR)
and the corresponding indexes have been fixed.
* analysis.module has been extended to VARCHAR(255) - previously it was
VARCHAR(80)
* mapping_session.prefix column has been added to allow EnsEmblGenomes
to track their different types of stable_ids
Core
====
Bio::EnsEMBL::DBFile::FileAdaptor (all species)
-----------------------------------------------
A new base class for accessing data from flat files
Bio::EnsEMBL::DBFile::CollectionAdaptor (all species)
-----------------------------------------------------
A new class to access Collection Feature data stored in flat files.
patch_61_62_a: Schema version patch (all species)
-------------------------------------------------
Patch file patch_61_62_a.sql, updates the schema version of a core
database to 62.
patch_61_62_b: synonym field extension (all species)
----------------------------------------------------
Patch file patch_61_62_b.sql, extends field synonym in external_synonym
table to 100 chars.
patch_61_62_c: index for db_name (all species)
----------------------------------------------
Patch file patch_61_62_c.sql adds unique index to db_name field in
external_db table.
Ontology database (all species)
-------------------------------
Database ensembl_ontology_62 with latest available GO, SO, and EFO
ontologies. Synonyms will now be included in a new 'synonym' table.
Schema diagrams for online documentation (all species)
------------------------------------------------------
Schema diagrams for online for core database documentation.
Xrefs (Zebrafish)
-----------------
Update external database references.
xref projection (all species)
-----------------------------
Project GO ids and gene names to species. Make alterations to zebrafish
projections.
EMBL/Genbank dumps (all species)
--------------------------------
EMBL & Genbank dumps for all species
patch_61_62_d: remove field display_label_linkable (all species)
----------------------------------------------------------------
Patch file patch_61_62_d.sql removes field display_label_linkable from
table external_db.
Import of LRG sequences (Human)
-------------------------------
Newly published LRG sequences will be imported
Ontology API (all species)
--------------------------
Addition of fetch_all_by_name() method to the OntologyTermAdaptor to
fetch ontology terms by their names or synonyms. Additional synonym()
method for OntologyTerm objects to get their synonyms.
xrefs (Human)
-------------
Update human external database references.
xrefs (Mouse)
-------------
Update external database references
Funcgen
=======
patch_61_62a Update meta schema version (all species)
-----------------------------------------------------
meta.schema_version will be updated to 62
patch_61_62_b motif_feature.stable_id (all species)
---------------------------------------------------
A stable_id will be added to the motif_feature table. NOTE: This is not
an 'Ensembl stable ID', and will only be used internally to enable
inter-DB linking between the variation and funcgen schemas.
patch_61_62_c feature_type Sequence Ontology fields (all species)
-----------------------------------------------------------------
so_name and so_accession will be added to the feature_type table to
enable display of Sequence Ontology information and linking to the
ensembl_ontology DB
Patch_61_62_d: Experimental Group Description (all species)
-----------------------------------------------------------
This change serves to support a better annotation of data sources.
ResultFeature DBFile Collections (Human, Mouse)
-----------------------------------------------
Where possible data from the result_feature table has been moved outside
of the database to indexed binary '.col' files. The ResultFeatureAdaptor
now uses the new core DBFile::CollectionAdaptor and DBFile::FileAdaptor
to access these data directly.
Array Mapping (all species)
---------------------------
Genomic and transcript alignments and transcript xref annotation has
been re-run for all species with new genome assemblies or genebuilds.
Ilumina Methylation Arrays (Human)
----------------------------------
HumanMethylation27K and HumanMethylation450K have now been imported.
Update of Human functional genomics data (Human)
------------------------------------------------
New datasets from ENCODE and the Epigenomics Roadmap, covering existing
cell lines. The Regulatory Build was rerun for cell lines with new data.
Binding Matrix: simpler representation of matrix frequencies (all species)
--------------------------------------------------------------------------
This change intends to make the representation simpler, towards
something that can applied to different formats.
patch_61_62_e Addition of dbfile_regsitry table (all species)
-------------------------------------------------------------
A dbfile_registry table has been added to store the filepaths of result
feature collection (.col) files
PolIII Transcription Associated Regulatory Features (all species)
-----------------------------------------------------------------
The Regulatory Build now also annotates Regulatory Features associated
to PolIII Transcription.
Genebuild
=========
Patch for panda (Panda)
-----------------------
Transcript supporting features added for pseudogenes
Patch for rabbit (Rabbit)
-------------------------
Geneset re-clustered Transcript supporting features added for
pseudogenes Assembly updated to match the official ncbi one
Patch for mouse (Mouse)
-----------------------
Patched the mouse Ensembl-Havana merged gene set to maintain its
consistency with the latest CCDS gene set (as of 9 February 2011).
Human Vega annotation (Human)
-----------------------------
Manual annotation of human from Havana has been updated. This represents
the annotation presented in Vega release 42
Patch for marmoset (Marmoset)
-----------------------------
Deprecated contig sequences removed
Raw-computes re-run
Geneset re-clustered
Mapping added
Transcript supporting features for pseudogenes added
New seq region synonyms
Human otherfeatures (Human)
---------------------------
Removed EST alignments with hcoverage <90 and perc_ident <94.
GENCODE gene set update (release 7) (Human)
-------------------------------------------
Update to the Ensembl/Havana GENCODE gene set based on a complete
re-annotation of the Ensembl gene set and combined with the latest Vega
gene set
Human cDNA update (Human)
-------------------------
New cDNA db for human.
GRCh37.p3 (Human)
-----------------
Adding the third patch release for the human assembly. This alters the
assembly information in all human databases.
GRCh37.p3 annotation (Human)
----------------------------
Annotation of the patches in the other features db.
Gibbon build (Gibbon)
---------------------
First release of gene build for Gibbon, Nomascus leucogenys (Northern
white-cheeked gibbon). Assembly: Nleu1.0.
Zebrafish WGS/clone assembly track (Zebrafish)
----------------------------------------------
Added a WGS/clone assembly track.
Flagging obsolete Uniprot proteins (all species)
------------------------------------------------
Flagging Transcript attribute where the Uniprot evidence was removed
Flagging obsolete Ensembl proteins (Sloth, Armadillo, Kangaroo rat,
Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Pika, Bushbaby, Chimp,
Orangutan, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree
Shrew, Dolphin, Alpaca)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Flagging Transcript attribute where the evidence was removed from 2x genomes
Mouse RefSeq import (Mouse)
---------------------------
RefSeq annotations imported into the mouse otherfeatures database
Xenopus tropicalis new assembly 4.2 (Xenopus)
---------------------------------------------
New assembly of Xenopus tropicalis version 4.2
Human Body Map missing liver (Human)
------------------------------------
Add the liver models
Mouse cDNA update (Mouse)
-------------------------
New cDNA db for mouse.
Updated human otherfeatures db: new CCDS import (Human)
-------------------------------------------------------
Update to CCDS set for human
Updated mouse otherfeatures db: New CCDS import (Mouse)
-------------------------------------------------------
Update to CCDS set for mouse
Mart
====
Mart databases (all species)
----------------------------
Full build of all 7 marts for all species
Variation
=========
New variation consequences (all species)
----------------------------------------
New variation consequences due to a schema change linking consequences
to allele and transcript rather than just to a variation and transcript
HGVS coordinates stored in database (all species)
-------------------------------------------------
HGVS coordinates for variant alleles will be pre-calculated and stored
in the database. These were previously calculated on the fly.
New variation database (Human)
------------------------------
The human variation database will be built fresh from dbSNP release 132
due to data updates by dbSNP.
Data import/update from external sources (Human)
------------------------------------------------
Allele frequencies from 1000 Genomes Project. Variation submissions on
LRGs from UniProt. Structural variation data from DGVa. Somatic mutation
data from Cosmic. Variation phenotype data from OMIM, NHGRI, UniProt and
EGA. Variation synonyms from UniProt.
New variation database (Mouse)
------------------------------
Fresh build from dbSNP 132.
Data import/update from external sources (Dog, Mouse, Pig)
----------------------------------------------------------
Structural variation data from DGVa.
patch_61_62_a: Meta schema version (all species)
------------------------------------------------
Meta schema version update
patch_61_62_b: Alter failed_variation (all species)
---------------------------------------------------
Drop the subsnp_id column from failed_variation
patch_61_62_c: Introduce failed_allele table (all species)
----------------------------------------------------------
Add a table to store failed alleles
patch_61_62_d: Add type column to source table (all species)
------------------------------------------------------------
Introduce a type column (enum) to indicate the type of a source
patch: Table to store study data (all species)
----------------------------------------------
A new table to store description of studies will be introduced and
foreign keys to this table will be introduced in variation_annotation
and structural_variation tables.
patch: Rationalize data type for allele columns (all species)
-------------------------------------------------------------
The data type of allele columns in e.g. allele, variation and
variation_feature will be harmonized to use varchar.
patch: Table to store supporting structural variations (all species)
--------------------------------------------------------------------
A new table to store supporting structural variations will be introduced
patch: Table to store variation consequences on regulatory regions (all
species)
--------------------------------------------------------------------------------
A table to support storing variation consequences on regulatory regions
will be introduced
patch: Re-design of the transcript_variation table (all species)
----------------------------------------------------------------
Variation consequences will be stored by allele instead of by variation.
The transcript_variation table will be modified to accommodate this. In
addition, HGVS coordinates will be stored as well.
patch: Drop somatic column from source table (all species)
----------------------------------------------------------
The somatic column will be dropped from source and instead introduced in
the variation table.
API changes (all species)
-------------------------
The API will be updated to accommodate schema patches.
SIFT and PolyPhen consequences (all species)
--------------------------------------------
Non-synonymous coding consequences evaluated by SIFT and PolyPhen will
be calculated
Add a variation set for variations flagged as failed (Cat, Opossum, Pig,
Zebra Finch, Tetraodon)
------------------------------------------------------------------------------------------------
Variations that have been flagged as failed will be grouped in a
variation set named 'Failed variations'
Web
===
Support for BigWig format (all species)
---------------------------------------
In addition to BAM format, the Ensembl website now supports attachment
of BigWig data via URL. Click on "Manage Your Data" then select "Attach
Remote File" from the lefthand menu.
Export data on structural variation (all species)
-------------------------------------------------
Enabling data to be exported for the variation page. (same
functionalities as on location, gene and transcript)
Export on Karyotype (all species)
---------------------------------
Will try to get the karyotype exported to PDF and other formats. Export
button just below the karyotype image
BED Format export (all species)
-------------------------------
Adding BED format to the export functionality on location, genes,
transcript and variation.
Highlighting row in feature table for variation (all species)
-------------------------------------------------------------
When clicking on a SNP on the karyotype for phenotype, the corresponding
row (variation) is highlighted in the feature table
EnsemblGenomes
==============
Rebuild otherfeatures database for Yeast. (Yeast)
-------------------------------------------------
Rebuild otherfeatures database.
Rerun Xrefs pipeline for Yeast (Yeast)
--------------------------------------
Update the external_db table, and rerun the xrefs pipeline
new variation saccharomyces_cerevisiae database (Yeast)
-------------------------------------------------------
Provide the variation saccharomyces_cerevisiae database
New funcgen saccharomyces_cerevisiae database (Yeast)
-----------------------------------------------------
Provide the funcgen saccharomyces_cerevisiae database
BLAT patch (C.elegans)
----------------------
for aesthetic reasons, we will flip the strand of paired 3'-ESTs
More information about the Announce
mailing list