From st3 at sanger.ac.uk Wed Feb 2 18:19:30 2011 From: st3 at sanger.ac.uk (Stephen Trevanion) Date: Wed, 02 Feb 2011 18:19:30 +0000 Subject: [ensembl-announce] Ensembl release 61 Message-ID: <4D49A032.3090608@sanger.ac.uk> The Ensembl project is pleased to announce release 61 of Ensembl (http://www.ensembl.org/). Some highlights of this release are: * Human: - Update to annotation incoporating the latest geneset from Havana. - First analysis using RNASeq data (from the Illumina Bodymap project). - Updated CCDS database. - Import of dbSNP132, Cosmic and HGMD variation data. * Mouse: - Update to annotation involving regeneration of the core geneset and incoporation of the latest Havana geneset. - Updated CCDS database. - Rerun of regulatory build. * Other species: - Zebrafish - first merge of Havana and core genesets. - Anole lizard - new assembly and gene build. - C.elegans - new assembly and geneset. - Cat and opossum -addition of variation data. * New species: - Turkey. * Web features: - Favourite tracks for each view can be selected in Configuration Panels using the star button. - Hovering over a track name on images generates a popup allowing for configuration of that track. For more information visit: http://www.ensembl.org/info/website/news/index.html For the latest news on the Ensembl project visit our blog at http://ensembl.blogspot.com Steve Trevanion From jeff at ebi.ac.uk Wed Feb 9 08:29:10 2011 From: jeff at ebi.ac.uk (Jeff Almeida-King) Date: Wed, 09 Feb 2011 08:29:10 +0000 Subject: [ensembl-announce] Ensembl Genomes Release 8 Message-ID: <4D525056.1020403@ebi.ac.uk> The Ensembl Genomes Project is pleased to announce release 8 of Ensembl Genomes (http://www.ensemblgenomes.org/). The main highlights of this release are: * Software migration to Ensembl 61 * New Pan Compara database consisting a selection of vertebrate genomes from Ensembl 61 and genomes from Ensembl Genomes 8 (incorporating 8 new species), giving a species total of 313 . * 3 oomycete genomes added to Ensembl Protists , including /Phytopthora infestans /and /Phytopthora ramorum /responsible for potato blight and Sudden Oak Death disease respectively. // * 5 genomes added to Ensembl Metazoa , including /Strongylocentrotus purpuratus/ (Echinodermata) (sea urchin), /Apis mellifera/ (Arthropoda) (honey bee) and /Nematostella vectensis/ (Cnidaria) (sea anemone). For further details please visit the individual home pages: http://bacteria.ensembl.org http://protists.ensembl.org http://fungi.ensembl.org http://plants.ensembl.org http://metazoa.ensembl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jeff.vcf Type: text/x-vcard Size: 290 bytes Desc: not available URL: From dstaines at ebi.ac.uk Mon Feb 14 10:32:03 2011 From: dstaines at ebi.ac.uk (Dan Staines) Date: Mon, 14 Feb 2011 10:32:03 +0000 Subject: [ensembl-announce] Ensembl Genomes Release 9 Intentions Message-ID: <4D5904A3.3050905@ebi.ac.uk> Dear all, Please find attached a summary of our intentions for release 9 of Ensembl Genomes, due out on 19th April 2011. Please note these are intentions and are therefore not guaranteed to be completed for February. These can also be viewed online at: http://ensemblgenomes.org/info/release9 Best regards, Dan Staines, on behalf of the Ensembl Genomes team. General - release scheduled for 19th April 2011 - update to Ensembl 62 software - new Pan Compara database with species from -- Ensembl --- Gasterosteus aculeatus -- Ensembl Genomes --- Daphnia pulex --- Leishmania major --- Lottia gigantea - updated pan concern databases from Ensembl Bacteria - no significant updates planned Fungi - updated S. cerevisiae assembly and genebuild and associated variation - addition of Gibberella zeae - addition of Gibberella moniliformis - addition of Fusarium oxysporum - updated peptide compara - updated DNA compara database including Fusarium LASTZ alignments - updated marts Protists - addition of Leishmania major - addition of Pythium ultimum - addition of Hyaloperonospora arabidopsidis - updated peptide compara - updated marts Metazoa - updated core database for Drosophila melanogaster based on FlyBase FB2011_01 - updated core database for Drosophila pseudoobscura based on FlyBase FB2011_01 - updated core database for Anopheles gambiae based on VectorBase VB-2011-02 (new gene set and new datasources) - updated cross-references for Aedes aegypti based on VectorBase VB-2011-02 (new datasources) - updated cross-references for Strongylocentrotus purpuratus (sea urchin) - addition of Lottia gigantea (owl limpet) - addition of Daphnia pulex (water flea) - addition of Capitella capitata (bristleworm) - addition of Helobdella robusta (leech) - patch for BLAT 3'-EST dna_align_feature orientation in Caenorhabditis elegans (WormBase) - updated peptide compara - updated marts Plants - addition of Oryza glabberina - new variation database for Zea mays - updated peptide compara - updated marts -- Dan Staines, PhD Ensembl Genomes Technical Coordinator EMBL-EBI Tel: +44-(0)1223-492507 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From dstaines at ebi.ac.uk Mon Feb 14 10:34:59 2011 From: dstaines at ebi.ac.uk (Dan Staines) Date: Mon, 14 Feb 2011 10:34:59 +0000 Subject: [ensembl-announce] Ensembl Genomes Release 9 Intentions In-Reply-To: <4D5904A3.3050905@ebi.ac.uk> References: <4D5904A3.3050905@ebi.ac.uk> Message-ID: <4D590553.1080202@ebi.ac.uk> On 02/14/2011 10:32 AM, Dan Staines wrote: > Please note these are intentions and are therefore not guaranteed to be > completed for February. Apologies, this should of course be April... Dan. -- Dan Staines, PhD Ensembl Genomes Technical Coordinator EMBL-EBI Tel: +44-(0)1223-492507 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From sobral at ebi.ac.uk Thu Feb 24 13:34:15 2011 From: sobral at ebi.ac.uk (Daniel Sobral) Date: Thu, 24 Feb 2011 13:34:15 +0000 Subject: [ensembl-announce] Intentions for Ensembl release 62 In-Reply-To: <4CEE98FF.9000705@sanger.ac.uk> References: <4CEE98FF.9000705@sanger.ac.uk> Message-ID: <4D665E57.3080007@ebi.ac.uk> Please see below a list of intentions declared for Ensembl 62 (scheduled for mid April). Note these are intentions and are not guaranteed to be in the release. Regards, Daniel Sobral ======================================= Declarations of Intentions - Ensembl 62 ======================================= Compara ======= Families (all species) ---------------------- Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa. * Clustering by MCL * Multiple Sequence Alignments with MAFFT * Family stable ID mapping Gene Homologies (all species) ----------------------------- GeneTrees (protein-coding) with new/updated genebuilds and assemblies * Clustering using hcluster_sg * Multiple sequence alignments using MCoffee * Phylogenetic reconstruction using TreeBeST * Homology inference including the recent 'possible_ortholog', 'putative gene split' and 'contiguous gene split' exceptions * Pairwise gene-based dN/dS scores for high coverage species pairs only * GeneTree stable ID mapping GeneTrees (ncRNA) with new/updated genebuilds and assemblies (all species) -------------------------------------------------------------------------- * Classification based on RFAM model * Multiple sequence alignments with infernal * Phylogenetic reconstruction using RaxML * Additional multiple sequence alignments with Prank (w/ genomic flanks) * Additional phylogenetic reconstruction using PhyML and NJ * Phylogenetic tree merging using TreeBeST * Homology inference Pairwise Alignments (all species) --------------------------------- * Non-reference alignments for human vs high coverage blastz-net * human vs gibbon lastz. * human vs marmoset lastz * human vs rabbit lastz * xenopus vs mouse tblat-net * xenopus vs chicken tblat-net * xenopus vs tetraodon tblat-net * xenopus vs human tblat-net * xenopus vs danio tblat-net Multiple alignments (all species) --------------------------------- * update 6way-primate-epo alignments to incorporate new marmoset seq_region names * update 12way-mammal-epo alignments to incorporate new marmoset seq_region names * update 19way-amniota-pecan alignments to incorporate new marmoset seq_region names * 35way-mammal low-coverage-epo alignments (addition of gibbon and new marmoset seq_region names) schema changes (all species) ---------------------------- * meta.meta_value has been extended to TEXT (previously it was VARCHAR) and the corresponding indexes have been fixed. * analysis.module has been extended to VARCHAR(255) - previously it was VARCHAR(80) * mapping_session.prefix column has been added to allow EnsEmblGenomes to track their different types of stable_ids Core ==== Bio::EnsEMBL::DBFile::FileAdaptor (all species) ----------------------------------------------- A new base class for accessing data from flat files Bio::EnsEMBL::DBFile::CollectionAdaptor (all species) ----------------------------------------------------- A new class to access Collection Feature data stored in flat files. patch_61_62_a: Schema version patch (all species) ------------------------------------------------- Patch file patch_61_62_a.sql, updates the schema version of a core database to 62. patch_61_62_b: synonym field extension (all species) ---------------------------------------------------- Patch file patch_61_62_b.sql, extends field synonym in external_synonym table to 100 chars. patch_61_62_c: index for db_name (all species) ---------------------------------------------- Patch file patch_61_62_c.sql adds unique index to db_name field in external_db table. Ontology database (all species) ------------------------------- Database ensembl_ontology_62 with latest available GO, SO, and EFO ontologies. Synonyms will now be included in a new 'synonym' table. Schema diagrams for online documentation (all species) ------------------------------------------------------ Schema diagrams for online for core database documentation. Xrefs (Zebrafish) ----------------- Update external database references. xref projection (all species) ----------------------------- Project GO ids and gene names to species. Make alterations to zebrafish projections. EMBL/Genbank dumps (all species) -------------------------------- EMBL & Genbank dumps for all species patch_61_62_d: remove field display_label_linkable (all species) ---------------------------------------------------------------- Patch file patch_61_62_d.sql removes field display_label_linkable from table external_db. Import of LRG sequences (Human) ------------------------------- Newly published LRG sequences will be imported Ontology API (all species) -------------------------- Addition of fetch_all_by_name() method to the OntologyTermAdaptor to fetch ontology terms by their names or synonyms. Additional synonym() method for OntologyTerm objects to get their synonyms. xrefs (Human) ------------- Update human external database references. xrefs (Mouse) ------------- Update external database references Funcgen ======= patch_61_62a Update meta schema version (all species) ----------------------------------------------------- meta.schema_version will be updated to 62 patch_61_62_b motif_feature.stable_id (all species) --------------------------------------------------- A stable_id will be added to the motif_feature table. NOTE: This is not an 'Ensembl stable ID', and will only be used internally to enable inter-DB linking between the variation and funcgen schemas. patch_61_62_c feature_type Sequence Ontology fields (all species) ----------------------------------------------------------------- so_name and so_accession will be added to the feature_type table to enable display of Sequence Ontology information and linking to the ensembl_ontology DB Patch_61_62_d: Experimental Group Description (all species) ----------------------------------------------------------- This change serves to support a better annotation of data sources. ResultFeature DBFile Collections (Human, Mouse) ----------------------------------------------- Where possible data from the result_feature table has been moved outside of the database to indexed binary '.col' files. The ResultFeatureAdaptor now uses the new core DBFile::CollectionAdaptor and DBFile::FileAdaptor to access these data directly. Array Mapping (all species) --------------------------- Genomic and transcript alignments and transcript xref annotation has been re-run for all species with new genome assemblies or genebuilds. Ilumina Methylation Arrays (Human) ---------------------------------- HumanMethylation27K and HumanMethylation450K have now been imported. Update of Human functional genomics data (Human) ------------------------------------------------ New datasets from ENCODE and the Epigenomics Roadmap, covering existing cell lines. The Regulatory Build was rerun for cell lines with new data. Binding Matrix: simpler representation of matrix frequencies (all species) -------------------------------------------------------------------------- This change intends to make the representation simpler, towards something that can applied to different formats. patch_61_62_e Addition of dbfile_regsitry table (all species) ------------------------------------------------------------- A dbfile_registry table has been added to store the filepaths of result feature collection (.col) files PolIII Transcription Associated Regulatory Features (all species) ----------------------------------------------------------------- The Regulatory Build now also annotates Regulatory Features associated to PolIII Transcription. Genebuild ========= Patch for panda (Panda) ----------------------- Transcript supporting features added for pseudogenes Patch for rabbit (Rabbit) ------------------------- Geneset re-clustered Transcript supporting features added for pseudogenes Assembly updated to match the official ncbi one Patch for mouse (Mouse) ----------------------- Patched the mouse Ensembl-Havana merged gene set to maintain its consistency with the latest CCDS gene set (as of 9 February 2011). Human Vega annotation (Human) ----------------------------- Manual annotation of human from Havana has been updated. This represents the annotation presented in Vega release 42 Patch for marmoset (Marmoset) ----------------------------- Deprecated contig sequences removed Raw-computes re-run Geneset re-clustered Mapping added Transcript supporting features for pseudogenes added New seq region synonyms Human otherfeatures (Human) --------------------------- Removed EST alignments with hcoverage <90 and perc_ident <94. GENCODE gene set update (release 7) (Human) ------------------------------------------- Update to the Ensembl/Havana GENCODE gene set based on a complete re-annotation of the Ensembl gene set and combined with the latest Vega gene set Human cDNA update (Human) ------------------------- New cDNA db for human. GRCh37.p3 (Human) ----------------- Adding the third patch release for the human assembly. This alters the assembly information in all human databases. GRCh37.p3 annotation (Human) ---------------------------- Annotation of the patches in the other features db. Gibbon build (Gibbon) --------------------- First release of gene build for Gibbon, Nomascus leucogenys (Northern white-cheeked gibbon). Assembly: Nleu1.0. Zebrafish WGS/clone assembly track (Zebrafish) ---------------------------------------------- Added a WGS/clone assembly track. Flagging obsolete Uniprot proteins (all species) ------------------------------------------------ Flagging Transcript attribute where the Uniprot evidence was removed Flagging obsolete Ensembl proteins (Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Pika, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Flagging Transcript attribute where the evidence was removed from 2x genomes Mouse RefSeq import (Mouse) --------------------------- RefSeq annotations imported into the mouse otherfeatures database Xenopus tropicalis new assembly 4.2 (Xenopus) --------------------------------------------- New assembly of Xenopus tropicalis version 4.2 Human Body Map missing liver (Human) ------------------------------------ Add the liver models Mouse cDNA update (Mouse) ------------------------- New cDNA db for mouse. Updated human otherfeatures db: new CCDS import (Human) ------------------------------------------------------- Update to CCDS set for human Updated mouse otherfeatures db: New CCDS import (Mouse) ------------------------------------------------------- Update to CCDS set for mouse Mart ==== Mart databases (all species) ---------------------------- Full build of all 7 marts for all species Variation ========= New variation consequences (all species) ---------------------------------------- New variation consequences due to a schema change linking consequences to allele and transcript rather than just to a variation and transcript HGVS coordinates stored in database (all species) ------------------------------------------------- HGVS coordinates for variant alleles will be pre-calculated and stored in the database. These were previously calculated on the fly. New variation database (Human) ------------------------------ The human variation database will be built fresh from dbSNP release 132 due to data updates by dbSNP. Data import/update from external sources (Human) ------------------------------------------------ Allele frequencies from 1000 Genomes Project. Variation submissions on LRGs from UniProt. Structural variation data from DGVa. Somatic mutation data from Cosmic. Variation phenotype data from OMIM, NHGRI, UniProt and EGA. Variation synonyms from UniProt. New variation database (Mouse) ------------------------------ Fresh build from dbSNP 132. Data import/update from external sources (Dog, Mouse, Pig) ---------------------------------------------------------- Structural variation data from DGVa. patch_61_62_a: Meta schema version (all species) ------------------------------------------------ Meta schema version update patch_61_62_b: Alter failed_variation (all species) --------------------------------------------------- Drop the subsnp_id column from failed_variation patch_61_62_c: Introduce failed_allele table (all species) ---------------------------------------------------------- Add a table to store failed alleles patch_61_62_d: Add type column to source table (all species) ------------------------------------------------------------ Introduce a type column (enum) to indicate the type of a source patch: Table to store study data (all species) ---------------------------------------------- A new table to store description of studies will be introduced and foreign keys to this table will be introduced in variation_annotation and structural_variation tables. patch: Rationalize data type for allele columns (all species) ------------------------------------------------------------- The data type of allele columns in e.g. allele, variation and variation_feature will be harmonized to use varchar. patch: Table to store supporting structural variations (all species) -------------------------------------------------------------------- A new table to store supporting structural variations will be introduced patch: Table to store variation consequences on regulatory regions (all species) -------------------------------------------------------------------------------- A table to support storing variation consequences on regulatory regions will be introduced patch: Re-design of the transcript_variation table (all species) ---------------------------------------------------------------- Variation consequences will be stored by allele instead of by variation. The transcript_variation table will be modified to accommodate this. In addition, HGVS coordinates will be stored as well. patch: Drop somatic column from source table (all species) ---------------------------------------------------------- The somatic column will be dropped from source and instead introduced in the variation table. API changes (all species) ------------------------- The API will be updated to accommodate schema patches. SIFT and PolyPhen consequences (all species) -------------------------------------------- Non-synonymous coding consequences evaluated by SIFT and PolyPhen will be calculated Add a variation set for variations flagged as failed (Cat, Opossum, Pig, Zebra Finch, Tetraodon) ------------------------------------------------------------------------------------------------ Variations that have been flagged as failed will be grouped in a variation set named 'Failed variations' Web === Support for BigWig format (all species) --------------------------------------- In addition to BAM format, the Ensembl website now supports attachment of BigWig data via URL. Click on "Manage Your Data" then select "Attach Remote File" from the lefthand menu. Export data on structural variation (all species) ------------------------------------------------- Enabling data to be exported for the variation page. (same functionalities as on location, gene and transcript) Export on Karyotype (all species) --------------------------------- Will try to get the karyotype exported to PDF and other formats. Export button just below the karyotype image BED Format export (all species) ------------------------------- Adding BED format to the export functionality on location, genes, transcript and variation. Highlighting row in feature table for variation (all species) ------------------------------------------------------------- When clicking on a SNP on the karyotype for phenotype, the corresponding row (variation) is highlighted in the feature table EnsemblGenomes ============== Rebuild otherfeatures database for Yeast. (Yeast) ------------------------------------------------- Rebuild otherfeatures database. Rerun Xrefs pipeline for Yeast (Yeast) -------------------------------------- Update the external_db table, and rerun the xrefs pipeline new variation saccharomyces_cerevisiae database (Yeast) ------------------------------------------------------- Provide the variation saccharomyces_cerevisiae database New funcgen saccharomyces_cerevisiae database (Yeast) ----------------------------------------------------- Provide the funcgen saccharomyces_cerevisiae database BLAT patch (C.elegans) ---------------------- for aesthetic reasons, we will flip the strand of paired 3'-ESTs From st3 at sanger.ac.uk Wed Feb 2 18:19:30 2011 From: st3 at sanger.ac.uk (Stephen Trevanion) Date: Wed, 02 Feb 2011 18:19:30 +0000 Subject: [ensembl-announce] Ensembl release 61 Message-ID: <4D49A032.3090608@sanger.ac.uk> The Ensembl project is pleased to announce release 61 of Ensembl (http://www.ensembl.org/). Some highlights of this release are: * Human: - Update to annotation incoporating the latest geneset from Havana. - First analysis using RNASeq data (from the Illumina Bodymap project). - Updated CCDS database. - Import of dbSNP132, Cosmic and HGMD variation data. * Mouse: - Update to annotation involving regeneration of the core geneset and incoporation of the latest Havana geneset. - Updated CCDS database. - Rerun of regulatory build. * Other species: - Zebrafish - first merge of Havana and core genesets. - Anole lizard - new assembly and gene build. - C.elegans - new assembly and geneset. - Cat and opossum -addition of variation data. * New species: - Turkey. * Web features: - Favourite tracks for each view can be selected in Configuration Panels using the star button. - Hovering over a track name on images generates a popup allowing for configuration of that track. For more information visit: http://www.ensembl.org/info/website/news/index.html For the latest news on the Ensembl project visit our blog at http://ensembl.blogspot.com Steve Trevanion From jeff at ebi.ac.uk Wed Feb 9 08:29:10 2011 From: jeff at ebi.ac.uk (Jeff Almeida-King) Date: Wed, 09 Feb 2011 08:29:10 +0000 Subject: [ensembl-announce] Ensembl Genomes Release 8 Message-ID: <4D525056.1020403@ebi.ac.uk> The Ensembl Genomes Project is pleased to announce release 8 of Ensembl Genomes (http://www.ensemblgenomes.org/). The main highlights of this release are: * Software migration to Ensembl 61 * New Pan Compara database consisting a selection of vertebrate genomes from Ensembl 61 and genomes from Ensembl Genomes 8 (incorporating 8 new species), giving a species total of 313 . * 3 oomycete genomes added to Ensembl Protists , including /Phytopthora infestans /and /Phytopthora ramorum /responsible for potato blight and Sudden Oak Death disease respectively. // * 5 genomes added to Ensembl Metazoa , including /Strongylocentrotus purpuratus/ (Echinodermata) (sea urchin), /Apis mellifera/ (Arthropoda) (honey bee) and /Nematostella vectensis/ (Cnidaria) (sea anemone). For further details please visit the individual home pages: http://bacteria.ensembl.org http://protists.ensembl.org http://fungi.ensembl.org http://plants.ensembl.org http://metazoa.ensembl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jeff.vcf Type: text/x-vcard Size: 290 bytes Desc: not available URL: From dstaines at ebi.ac.uk Mon Feb 14 10:32:03 2011 From: dstaines at ebi.ac.uk (Dan Staines) Date: Mon, 14 Feb 2011 10:32:03 +0000 Subject: [ensembl-announce] Ensembl Genomes Release 9 Intentions Message-ID: <4D5904A3.3050905@ebi.ac.uk> Dear all, Please find attached a summary of our intentions for release 9 of Ensembl Genomes, due out on 19th April 2011. Please note these are intentions and are therefore not guaranteed to be completed for February. These can also be viewed online at: http://ensemblgenomes.org/info/release9 Best regards, Dan Staines, on behalf of the Ensembl Genomes team. General - release scheduled for 19th April 2011 - update to Ensembl 62 software - new Pan Compara database with species from -- Ensembl --- Gasterosteus aculeatus -- Ensembl Genomes --- Daphnia pulex --- Leishmania major --- Lottia gigantea - updated pan concern databases from Ensembl Bacteria - no significant updates planned Fungi - updated S. cerevisiae assembly and genebuild and associated variation - addition of Gibberella zeae - addition of Gibberella moniliformis - addition of Fusarium oxysporum - updated peptide compara - updated DNA compara database including Fusarium LASTZ alignments - updated marts Protists - addition of Leishmania major - addition of Pythium ultimum - addition of Hyaloperonospora arabidopsidis - updated peptide compara - updated marts Metazoa - updated core database for Drosophila melanogaster based on FlyBase FB2011_01 - updated core database for Drosophila pseudoobscura based on FlyBase FB2011_01 - updated core database for Anopheles gambiae based on VectorBase VB-2011-02 (new gene set and new datasources) - updated cross-references for Aedes aegypti based on VectorBase VB-2011-02 (new datasources) - updated cross-references for Strongylocentrotus purpuratus (sea urchin) - addition of Lottia gigantea (owl limpet) - addition of Daphnia pulex (water flea) - addition of Capitella capitata (bristleworm) - addition of Helobdella robusta (leech) - patch for BLAT 3'-EST dna_align_feature orientation in Caenorhabditis elegans (WormBase) - updated peptide compara - updated marts Plants - addition of Oryza glabberina - new variation database for Zea mays - updated peptide compara - updated marts -- Dan Staines, PhD Ensembl Genomes Technical Coordinator EMBL-EBI Tel: +44-(0)1223-492507 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From dstaines at ebi.ac.uk Mon Feb 14 10:34:59 2011 From: dstaines at ebi.ac.uk (Dan Staines) Date: Mon, 14 Feb 2011 10:34:59 +0000 Subject: [ensembl-announce] Ensembl Genomes Release 9 Intentions In-Reply-To: <4D5904A3.3050905@ebi.ac.uk> References: <4D5904A3.3050905@ebi.ac.uk> Message-ID: <4D590553.1080202@ebi.ac.uk> On 02/14/2011 10:32 AM, Dan Staines wrote: > Please note these are intentions and are therefore not guaranteed to be > completed for February. Apologies, this should of course be April... Dan. -- Dan Staines, PhD Ensembl Genomes Technical Coordinator EMBL-EBI Tel: +44-(0)1223-492507 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From sobral at ebi.ac.uk Thu Feb 24 13:34:15 2011 From: sobral at ebi.ac.uk (Daniel Sobral) Date: Thu, 24 Feb 2011 13:34:15 +0000 Subject: [ensembl-announce] Intentions for Ensembl release 62 In-Reply-To: <4CEE98FF.9000705@sanger.ac.uk> References: <4CEE98FF.9000705@sanger.ac.uk> Message-ID: <4D665E57.3080007@ebi.ac.uk> Please see below a list of intentions declared for Ensembl 62 (scheduled for mid April). Note these are intentions and are not guaranteed to be in the release. Regards, Daniel Sobral ======================================= Declarations of Intentions - Ensembl 62 ======================================= Compara ======= Families (all species) ---------------------- Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa. * Clustering by MCL * Multiple Sequence Alignments with MAFFT * Family stable ID mapping Gene Homologies (all species) ----------------------------- GeneTrees (protein-coding) with new/updated genebuilds and assemblies * Clustering using hcluster_sg * Multiple sequence alignments using MCoffee * Phylogenetic reconstruction using TreeBeST * Homology inference including the recent 'possible_ortholog', 'putative gene split' and 'contiguous gene split' exceptions * Pairwise gene-based dN/dS scores for high coverage species pairs only * GeneTree stable ID mapping GeneTrees (ncRNA) with new/updated genebuilds and assemblies (all species) -------------------------------------------------------------------------- * Classification based on RFAM model * Multiple sequence alignments with infernal * Phylogenetic reconstruction using RaxML * Additional multiple sequence alignments with Prank (w/ genomic flanks) * Additional phylogenetic reconstruction using PhyML and NJ * Phylogenetic tree merging using TreeBeST * Homology inference Pairwise Alignments (all species) --------------------------------- * Non-reference alignments for human vs high coverage blastz-net * human vs gibbon lastz. * human vs marmoset lastz * human vs rabbit lastz * xenopus vs mouse tblat-net * xenopus vs chicken tblat-net * xenopus vs tetraodon tblat-net * xenopus vs human tblat-net * xenopus vs danio tblat-net Multiple alignments (all species) --------------------------------- * update 6way-primate-epo alignments to incorporate new marmoset seq_region names * update 12way-mammal-epo alignments to incorporate new marmoset seq_region names * update 19way-amniota-pecan alignments to incorporate new marmoset seq_region names * 35way-mammal low-coverage-epo alignments (addition of gibbon and new marmoset seq_region names) schema changes (all species) ---------------------------- * meta.meta_value has been extended to TEXT (previously it was VARCHAR) and the corresponding indexes have been fixed. * analysis.module has been extended to VARCHAR(255) - previously it was VARCHAR(80) * mapping_session.prefix column has been added to allow EnsEmblGenomes to track their different types of stable_ids Core ==== Bio::EnsEMBL::DBFile::FileAdaptor (all species) ----------------------------------------------- A new base class for accessing data from flat files Bio::EnsEMBL::DBFile::CollectionAdaptor (all species) ----------------------------------------------------- A new class to access Collection Feature data stored in flat files. patch_61_62_a: Schema version patch (all species) ------------------------------------------------- Patch file patch_61_62_a.sql, updates the schema version of a core database to 62. patch_61_62_b: synonym field extension (all species) ---------------------------------------------------- Patch file patch_61_62_b.sql, extends field synonym in external_synonym table to 100 chars. patch_61_62_c: index for db_name (all species) ---------------------------------------------- Patch file patch_61_62_c.sql adds unique index to db_name field in external_db table. Ontology database (all species) ------------------------------- Database ensembl_ontology_62 with latest available GO, SO, and EFO ontologies. Synonyms will now be included in a new 'synonym' table. Schema diagrams for online documentation (all species) ------------------------------------------------------ Schema diagrams for online for core database documentation. Xrefs (Zebrafish) ----------------- Update external database references. xref projection (all species) ----------------------------- Project GO ids and gene names to species. Make alterations to zebrafish projections. EMBL/Genbank dumps (all species) -------------------------------- EMBL & Genbank dumps for all species patch_61_62_d: remove field display_label_linkable (all species) ---------------------------------------------------------------- Patch file patch_61_62_d.sql removes field display_label_linkable from table external_db. Import of LRG sequences (Human) ------------------------------- Newly published LRG sequences will be imported Ontology API (all species) -------------------------- Addition of fetch_all_by_name() method to the OntologyTermAdaptor to fetch ontology terms by their names or synonyms. Additional synonym() method for OntologyTerm objects to get their synonyms. xrefs (Human) ------------- Update human external database references. xrefs (Mouse) ------------- Update external database references Funcgen ======= patch_61_62a Update meta schema version (all species) ----------------------------------------------------- meta.schema_version will be updated to 62 patch_61_62_b motif_feature.stable_id (all species) --------------------------------------------------- A stable_id will be added to the motif_feature table. NOTE: This is not an 'Ensembl stable ID', and will only be used internally to enable inter-DB linking between the variation and funcgen schemas. patch_61_62_c feature_type Sequence Ontology fields (all species) ----------------------------------------------------------------- so_name and so_accession will be added to the feature_type table to enable display of Sequence Ontology information and linking to the ensembl_ontology DB Patch_61_62_d: Experimental Group Description (all species) ----------------------------------------------------------- This change serves to support a better annotation of data sources. ResultFeature DBFile Collections (Human, Mouse) ----------------------------------------------- Where possible data from the result_feature table has been moved outside of the database to indexed binary '.col' files. The ResultFeatureAdaptor now uses the new core DBFile::CollectionAdaptor and DBFile::FileAdaptor to access these data directly. Array Mapping (all species) --------------------------- Genomic and transcript alignments and transcript xref annotation has been re-run for all species with new genome assemblies or genebuilds. Ilumina Methylation Arrays (Human) ---------------------------------- HumanMethylation27K and HumanMethylation450K have now been imported. Update of Human functional genomics data (Human) ------------------------------------------------ New datasets from ENCODE and the Epigenomics Roadmap, covering existing cell lines. The Regulatory Build was rerun for cell lines with new data. Binding Matrix: simpler representation of matrix frequencies (all species) -------------------------------------------------------------------------- This change intends to make the representation simpler, towards something that can applied to different formats. patch_61_62_e Addition of dbfile_regsitry table (all species) ------------------------------------------------------------- A dbfile_registry table has been added to store the filepaths of result feature collection (.col) files PolIII Transcription Associated Regulatory Features (all species) ----------------------------------------------------------------- The Regulatory Build now also annotates Regulatory Features associated to PolIII Transcription. Genebuild ========= Patch for panda (Panda) ----------------------- Transcript supporting features added for pseudogenes Patch for rabbit (Rabbit) ------------------------- Geneset re-clustered Transcript supporting features added for pseudogenes Assembly updated to match the official ncbi one Patch for mouse (Mouse) ----------------------- Patched the mouse Ensembl-Havana merged gene set to maintain its consistency with the latest CCDS gene set (as of 9 February 2011). Human Vega annotation (Human) ----------------------------- Manual annotation of human from Havana has been updated. This represents the annotation presented in Vega release 42 Patch for marmoset (Marmoset) ----------------------------- Deprecated contig sequences removed Raw-computes re-run Geneset re-clustered Mapping added Transcript supporting features for pseudogenes added New seq region synonyms Human otherfeatures (Human) --------------------------- Removed EST alignments with hcoverage <90 and perc_ident <94. GENCODE gene set update (release 7) (Human) ------------------------------------------- Update to the Ensembl/Havana GENCODE gene set based on a complete re-annotation of the Ensembl gene set and combined with the latest Vega gene set Human cDNA update (Human) ------------------------- New cDNA db for human. GRCh37.p3 (Human) ----------------- Adding the third patch release for the human assembly. This alters the assembly information in all human databases. GRCh37.p3 annotation (Human) ---------------------------- Annotation of the patches in the other features db. Gibbon build (Gibbon) --------------------- First release of gene build for Gibbon, Nomascus leucogenys (Northern white-cheeked gibbon). Assembly: Nleu1.0. Zebrafish WGS/clone assembly track (Zebrafish) ---------------------------------------------- Added a WGS/clone assembly track. Flagging obsolete Uniprot proteins (all species) ------------------------------------------------ Flagging Transcript attribute where the Uniprot evidence was removed Flagging obsolete Ensembl proteins (Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Pika, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Flagging Transcript attribute where the evidence was removed from 2x genomes Mouse RefSeq import (Mouse) --------------------------- RefSeq annotations imported into the mouse otherfeatures database Xenopus tropicalis new assembly 4.2 (Xenopus) --------------------------------------------- New assembly of Xenopus tropicalis version 4.2 Human Body Map missing liver (Human) ------------------------------------ Add the liver models Mouse cDNA update (Mouse) ------------------------- New cDNA db for mouse. Updated human otherfeatures db: new CCDS import (Human) ------------------------------------------------------- Update to CCDS set for human Updated mouse otherfeatures db: New CCDS import (Mouse) ------------------------------------------------------- Update to CCDS set for mouse Mart ==== Mart databases (all species) ---------------------------- Full build of all 7 marts for all species Variation ========= New variation consequences (all species) ---------------------------------------- New variation consequences due to a schema change linking consequences to allele and transcript rather than just to a variation and transcript HGVS coordinates stored in database (all species) ------------------------------------------------- HGVS coordinates for variant alleles will be pre-calculated and stored in the database. These were previously calculated on the fly. New variation database (Human) ------------------------------ The human variation database will be built fresh from dbSNP release 132 due to data updates by dbSNP. Data import/update from external sources (Human) ------------------------------------------------ Allele frequencies from 1000 Genomes Project. Variation submissions on LRGs from UniProt. Structural variation data from DGVa. Somatic mutation data from Cosmic. Variation phenotype data from OMIM, NHGRI, UniProt and EGA. Variation synonyms from UniProt. New variation database (Mouse) ------------------------------ Fresh build from dbSNP 132. Data import/update from external sources (Dog, Mouse, Pig) ---------------------------------------------------------- Structural variation data from DGVa. patch_61_62_a: Meta schema version (all species) ------------------------------------------------ Meta schema version update patch_61_62_b: Alter failed_variation (all species) --------------------------------------------------- Drop the subsnp_id column from failed_variation patch_61_62_c: Introduce failed_allele table (all species) ---------------------------------------------------------- Add a table to store failed alleles patch_61_62_d: Add type column to source table (all species) ------------------------------------------------------------ Introduce a type column (enum) to indicate the type of a source patch: Table to store study data (all species) ---------------------------------------------- A new table to store description of studies will be introduced and foreign keys to this table will be introduced in variation_annotation and structural_variation tables. patch: Rationalize data type for allele columns (all species) ------------------------------------------------------------- The data type of allele columns in e.g. allele, variation and variation_feature will be harmonized to use varchar. patch: Table to store supporting structural variations (all species) -------------------------------------------------------------------- A new table to store supporting structural variations will be introduced patch: Table to store variation consequences on regulatory regions (all species) -------------------------------------------------------------------------------- A table to support storing variation consequences on regulatory regions will be introduced patch: Re-design of the transcript_variation table (all species) ---------------------------------------------------------------- Variation consequences will be stored by allele instead of by variation. The transcript_variation table will be modified to accommodate this. In addition, HGVS coordinates will be stored as well. patch: Drop somatic column from source table (all species) ---------------------------------------------------------- The somatic column will be dropped from source and instead introduced in the variation table. API changes (all species) ------------------------- The API will be updated to accommodate schema patches. SIFT and PolyPhen consequences (all species) -------------------------------------------- Non-synonymous coding consequences evaluated by SIFT and PolyPhen will be calculated Add a variation set for variations flagged as failed (Cat, Opossum, Pig, Zebra Finch, Tetraodon) ------------------------------------------------------------------------------------------------ Variations that have been flagged as failed will be grouped in a variation set named 'Failed variations' Web === Support for BigWig format (all species) --------------------------------------- In addition to BAM format, the Ensembl website now supports attachment of BigWig data via URL. Click on "Manage Your Data" then select "Attach Remote File" from the lefthand menu. Export data on structural variation (all species) ------------------------------------------------- Enabling data to be exported for the variation page. (same functionalities as on location, gene and transcript) Export on Karyotype (all species) --------------------------------- Will try to get the karyotype exported to PDF and other formats. Export button just below the karyotype image BED Format export (all species) ------------------------------- Adding BED format to the export functionality on location, genes, transcript and variation. Highlighting row in feature table for variation (all species) ------------------------------------------------------------- When clicking on a SNP on the karyotype for phenotype, the corresponding row (variation) is highlighted in the feature table EnsemblGenomes ============== Rebuild otherfeatures database for Yeast. (Yeast) ------------------------------------------------- Rebuild otherfeatures database. Rerun Xrefs pipeline for Yeast (Yeast) -------------------------------------- Update the external_db table, and rerun the xrefs pipeline new variation saccharomyces_cerevisiae database (Yeast) ------------------------------------------------------- Provide the variation saccharomyces_cerevisiae database New funcgen saccharomyces_cerevisiae database (Yeast) ----------------------------------------------------- Provide the funcgen saccharomyces_cerevisiae database BLAT patch (C.elegans) ---------------------- for aesthetic reasons, we will flip the strand of paired 3'-ESTs