From jeff at ebi.ac.uk Wed Sep 1 08:42:30 2010 From: jeff at ebi.ac.uk (Jeff Almeida-King) Date: Wed, 01 Sep 2010 08:42:30 +0100 Subject: [ensembl-announce] Ensembl Genomes Release 6 Message-ID: <4C7E03E6.8000507@ebi.ac.uk> The Ensembl Genomes Project is pleased to announce release 6 of Ensembl Genomes (http://www.ensemblgenomes.org/). The main highlights of this release are: * Software migration to Ensembl 59 * 2 new rodent malarial genomes, Plasmodium berghei and Plasmodium chabaudi,and an update to the Plasmodium falciparum gene-set in Ensembl Protists (http://protists.ensembl.org/index.html). * Variation BioMarts added for Plasmodium falciparum, Saccharomyces cerevisiae (http://fungi.ensembl.org/index.html), Arabidopsis thaliana, Oryza sativa indica, Oryza sativa japonica, Vitis vinifera (http://plants.ensembl.org/index.html), Anopheles gambiae and Drosophila melanogaster (http://metazoa.ensembl.org /index.html). See the individual homepages for Bacteria, Protists, Fungi, Plants and Metazoa for more information. From wm2 at ebi.ac.uk Thu Sep 9 14:47:48 2010 From: wm2 at ebi.ac.uk (Will McLaren) Date: Thu, 9 Sep 2010 14:47:48 +0100 Subject: [ensembl-announce] Ensembl Release 60 - summary of declarations of intentions Message-ID: Below is the summary of declarations of intentions for Ensembl release 60. Please note these are intentions and are not guaranteed to be in the release, which is currently scheduled for the 26th of October. Regards, William McLaren ==================================================== Summary of declarations of intentions for Ensembl 60 ==================================================== ### Compara # Families - Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa - Clustering by MCL - Multiple Sequence Alignments with MAFFT - Family stable ID mapping # Gene Homologies - GeneTrees with new/updated genebuilds and assemblies - Updated build of ncRNA trees - Clustering using hcluster_sg - Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm - Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions - Pairwise gene-based dN/dS calculations for high coverage species pairs only - GeneTree stable ID mapping # Pairwise Alignments -- Lastz-net alignments - H.sap-A.mel - H.sap-O.cun - C.fam-A.mel -- Blat-alignments - H.sap-D.rer - M.mus-D.rer - R.nor-D.rer - G.gal-D.rer - T.rub-D.rer - D.rer-X.tro - C.int-D.rer - C.sav-D.rer - G.acu-D.rer - O.lat-D.rer - D.rer-T.nig -- Non-reference alignments for human vs high coverage blastz-net alignments - H.sap-P.tro - H.sap-G.gor - H.sap-P.pyg - H.sap-M.mul - H.sap-M.mus - H.sap-R.nor - H.sap-C.fam - H.sap-B.tau - H.sap-S.scr - H.sap-E.cab - H.sap-O.ana - H.sap-M.dom - H.sap-G.gal # Multiple alignments - 34 way epo low coverage - 14 way epo eutherian mammals - 5 way epo fish # Synteny - H.sap-C.jac - H.sap-O.cun ### Core # Ontology database - A new ontology database ("ensembl_ontology_60") will be built using the latest data from GO and SO. # Gene name and GO term projections - Gene names and GO xrefs will be projected from species where there is high coverage to species where there is lower coverage. Panda will be included as a target for these projections. # external database references - Update external database references for human, mouse and Xenopus # GO Xrefs are now Ontology Xrefs - The go_xref table is renamed to ontology_xref. The Bio::EnsEMBL::GoXref Perl module is renamed to Bio::EnsEMBL::OntologyXref. ### Funcgen # Array Mapping - The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human. # BindingMatrix - A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data. # MotifFeature - A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates. # Schema patch: Schema version - patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60. # Schema patch: associated_feature_type - patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values. # RegulatoryBuild update - The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets. # Position Weight Matrix (PWM) mapping and visualisation - PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific = Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail. # New chip-seq datasets from ENCODE - 93 new ENCODE Chip-Seq datasets for existing cell lines will be added. # probe_feature.cigar_line patch - patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field. Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda ### Genebuild # Update to human vega annotation - An update to Vega human annotation # Gencode gene set update - Update to the Ensembl/Havana Gencode gene set using the latest Vega gene set. # Human cDNA update - Updated set of cDNA alignments to the human genome. # Rabbit chromosomes - Chromosome mapping added for the rabbit genome Coordinates updated accordingly # Human (GRCh37) assembly patch release 2 - Addition of the GRCh37 patch release 2 patches. These are toplevel, non-reference regions of the assembly. # Updated human otherfeatures db: EST alignments - Human ESTs were realigned. New EST-based genes were produced from these EST alignments. # Panda genebuild - The first genebuild for the panda genome # Update human otherfeatures db: new CCDS import - Update to CCDS set for human # Updated mouse otherfeatures db: New CCDS import - Update to CCDS set for mouse # cDNA based gene annotation of human assembly patches - Annotate the human assembly patches using Exonerate's cDNA2genome model, which aligns cDNAs to the genome using annotation identifying the coding regions of the cDNAs. # Zebrafish genebuild - Full genebuild on the new Zv9 assembly # Mouse cDNA update - Updated set of cDNA alignments to the mouse genome # Flagging Translation attribute where the evidence was removed - Add a flag to the translation where a human Ensembl translation used as evidence was removed from the current human database. Species: Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Microbat, Pika, Bushbaby, Chimp, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca # Flagging Translation attribute where the Uniprot evidence was removed - Add a flag to the translation where a supporting evidence from Uniprot was removed from Uniprot database Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda # Updating the ENCODE excluded regions - Update of the ENCODE excluded regions # Fix duplicate transcript attributes - Duplicate transcript attributes removed Species: Anole lizard, Armadillo, Chicken, Human, Mouse, Platypus, Zebra Finch # homo_sapiens rnaseq data - Rnaseq data from transcriptome sequencing done by illumina on human tissues will be provided in a stand-alone database, ie no mart / compara relationships. ### Mart # Ensembl marts for release 60 - Full build of the seven marts: Ensembl Mart, SNP Mart, Functional Genomics Mart, Genomic Features Mart, Ontology Mart, Vega Mart, Sequence Mart ### Variation # Data - update of UniProt identifier links including phenotype information - import of new information from NHGRI and EGA Genome Wide Association Studies - import of new data sets for structural variants from DGVa - import of an expanded data set for all short somatic sequence variants from COSMIC - GVF (Genome Variation Format) dumps for all variants - update of variant consequences for new human gene set - update of variant consequences for new zebrafish assembly and gene set - import new set of 150,000 Zebrafish variants # API and schema change - schema change for ensembl genomes to store the population size for each frequency calculation -------------- next part -------------- An HTML attachment was scrubbed... URL: From dstaines at ebi.ac.uk Tue Sep 21 10:30:10 2010 From: dstaines at ebi.ac.uk (Dan Staines) Date: Tue, 21 Sep 2010 10:30:10 +0100 Subject: [ensembl-announce] Ensembl Genomes Release 7 Intentions Message-ID: <4C987B22.3060404@ebi.ac.uk> Dear all, Please find attached a summary of our intentions for release 7 of Ensembl Genomes, due out on November 9th 2010. Please note these are intentions and are therefore not guaranteed to be completed for November. These can also be viewed online at: http://ensemblgenomes.org/releases/release7 Best regards, Dan Staines, on behalf of the Ensembl Genomes team. ------------------------------------ Ensembl Genomes 7 Release Intentions ------------------------------------ General - release scheduled for 9th November 2010 - update to Ensembl 60 software - new Pan Compara database to include e60 vertebrate genomes and wheat rust - updated pan concern databases from Ensembl Bacteria - updates to core databases for all collections to include latest data from ENA and UniProtKB (including over 60 new genomes acress 5 collections) - updated funcgen databases for Escherichia/Shigella and Staphylococcus clades - updated DNA and peptide compara databases - updated gene and sequence biomarts Protists - standardisation of seq_region names and analysis types - updated tracks for P. falciparum RNASeq data in web interface Fungi - standardisation of seq_region names and analysis types - new core database for Puccinia graminis f. sp. tritici (wheat rust) - updated peptide compara database - updated gene and sequence biomarts Metazoa - new core database for Acyrthosiphon pisum (pea aphid) - updated A. gambiae variation database - updated DNA compara - updated peptide compara - updated gene, sequence and variation biomarts Plants - updated core database for A. thaliana based on TAIR 10 - new core db for Physcomitrella patens - updated core db for new gene for O. sativa indica - updated funcgen database for A. thaliana, O. sativa indica, O. sativa japonica - updated variation database A. thaliana including new data from Nordborg 3.04 and WTCHG - updated variation database for O. sativa indica variation based on new gene set - updated compara databases - updated gene, sequence and variation biomarts -- Dan Staines, PhD Ensembl Genomes Technical Coordinator EMBL-EBI Tel: +44-(0)1223-492507 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From wm2 at ebi.ac.uk Mon Sep 20 16:17:45 2010 From: wm2 at ebi.ac.uk (wm2 at ebi.ac.uk) Date: Mon, 20 Sep 2010 16:17:45 +0100 (BST) Subject: [ensembl-announce] Ensembl Release 60 - summary of declarations of intentions Message-ID: <46333.172.22.68.209.1284995865.squirrel@webmail.ebi.ac.uk> Below is the summary of declarations of intentions for Ensembl release 60. Please note these are intentions and are not guaranteed to be in the release, which is currently scheduled for the 26th of October. Regards, William McLaren ==================================================== Summary of declarations of intentions for Ensembl 60 ==================================================== ### Compara # Families - Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa - Clustering by MCL - Multiple Sequence Alignments with MAFFT - Family stable ID mapping # Gene Homologies - GeneTrees with new/updated genebuilds and assemblies - Updated build of ncRNA trees - Clustering using hcluster_sg - Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm - Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions - Pairwise gene-based dN/dS calculations for high coverage species pairs only - GeneTree stable ID mapping # Pairwise Alignments -- Lastz-net alignments - H.sap-A.mel - H.sap-O.cun - C.fam-A.mel -- Blat-alignments - H.sap-D.rer - M.mus-D.rer - R.nor-D.rer - G.gal-D.rer - T.rub-D.rer - D.rer-X.tro - C.int-D.rer - C.sav-D.rer - G.acu-D.rer - O.lat-D.rer - D.rer-T.nig -- Non-reference alignments for human vs high coverage blastz-net alignments - H.sap-P.tro - H.sap-G.gor - H.sap-P.pyg - H.sap-M.mul - H.sap-M.mus - H.sap-R.nor - H.sap-C.fam - H.sap-B.tau - H.sap-S.scr - H.sap-E.cab - H.sap-O.ana - H.sap-M.dom - H.sap-G.gal # Multiple alignments - 34 way epo low coverage - 14 way epo eutherian mammals - 5 way epo fish # Synteny - H.sap-C.jac - H.sap-O.cun ### Core # Ontology database - A new ontology database ("ensembl_ontology_60") will be built using the latest data from GO and SO. # Gene name and GO term projections - Gene names and GO xrefs will be projected from species where there is high coverage to species where there is lower coverage. Panda will be included as a target for these projections. # external database references - Update external database references for human, mouse and Xenopus # GO Xrefs are now Ontology Xrefs - The go_xref table is renamed to ontology_xref. The Bio::EnsEMBL::GoXref Perl module is renamed to Bio::EnsEMBL::OntologyXref. ### Funcgen # Array Mapping - The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human. # BindingMatrix - A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data. # MotifFeature - A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates. # Schema patch: Schema version - patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60. # Schema patch: associated_feature_type - patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values. # RegulatoryBuild update - The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets. # Position Weight Matrix (PWM) mapping and visualisation - PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific = Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail. # New chip-seq datasets from ENCODE - 93 new ENCODE Chip-Seq datasets for existing cell lines will be added. # probe_feature.cigar_line patch - patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field. Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda ### Genebuild # Update to human vega annotation - An update to Vega human annotation # Gencode gene set update - Update to the Ensembl/Havana Gencode gene set using the latest Vega gene set. # Human cDNA update - Updated set of cDNA alignments to the human genome. # Rabbit chromosomes - Chromosome mapping added for the rabbit genome Coordinates updated accordingly # Human (GRCh37) assembly patch release 2 - Addition of the GRCh37 patch release 2 patches. These are toplevel, non-reference regions of the assembly. # Updated human otherfeatures db: EST alignments - Human ESTs were realigned. New EST-based genes were produced from these EST alignments. # Panda genebuild - The first genebuild for the panda genome # Update human otherfeatures db: new CCDS import - Update to CCDS set for human # Updated mouse otherfeatures db: New CCDS import - Update to CCDS set for mouse # cDNA based gene annotation of human assembly patches - Annotate the human assembly patches using Exonerate's cDNA2genome model, which aligns cDNAs to the genome using annotation identifying the coding regions of the cDNAs. # Zebrafish genebuild - Full genebuild on the new Zv9 assembly # Mouse cDNA update - Updated set of cDNA alignments to the mouse genome # Flagging Translation attribute where the evidence was removed - Add a flag to the translation where a human Ensembl translation used as evidence was removed from the current human database. Species: Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Microbat, Pika, Bushbaby, Chimp, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca # Flagging Translation attribute where the Uniprot evidence was removed - Add a flag to the translation where a supporting evidence from Uniprot was removed from Uniprot database Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda # Updating the ENCODE excluded regions - Update of the ENCODE excluded regions # Fix duplicate transcript attributes - Duplicate transcript attributes removed Species: Anole lizard, Armadillo, Chicken, Human, Mouse, Platypus, Zebra Finch # homo_sapiens rnaseq data - Rnaseq data from transcriptome sequencing done by illumina on human tissues will be provided in a stand-alone database, ie no mart / compara relationships. ### Mart # Ensembl marts for release 60 - Full build of the seven marts: Ensembl Mart, SNP Mart, Functional Genomics Mart, Genomic Features Mart, Ontology Mart, Vega Mart, Sequence Mart ### Variation # Data - update of UniProt identifier links including phenotype information - import of new information from NHGRI and EGA Genome Wide Association Studies - import of new data sets for structural variants from DGVa - import of an expanded data set for all short somatic sequence variants from COSMIC - GVF (Genome Variation Format) dumps for all variants - update of variant consequences for new human gene set - update of variant consequences for new zebrafish assembly and gene set - import new set of 150,000 Zebrafish variants # API and schema change - schema change for ensembl genomes to store the population size for each frequency calculation From wm2 at ebi.ac.uk Mon Sep 20 16:30:02 2010 From: wm2 at ebi.ac.uk (wm2 at ebi.ac.uk) Date: Mon, 20 Sep 2010 16:30:02 +0100 (BST) Subject: [ensembl-announce] Ensembl Release 60 - summary of declarations of intentions Message-ID: <57949.172.22.68.209.1284996602.squirrel@webmail.ebi.ac.uk> Below is the summary of declarations of intentions for Ensembl release 60. Please note these are intentions and are not guaranteed to be in the release, which is currently scheduled for the 26th of October. Regards, William McLaren ==================================================== Summary of declarations of intentions for Ensembl 60 ==================================================== ### Compara # Families - Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa - Clustering by MCL - Multiple Sequence Alignments with MAFFT - Family stable ID mapping # Gene Homologies - GeneTrees with new/updated genebuilds and assemblies - Updated build of ncRNA trees - Clustering using hcluster_sg - Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm - Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions - Pairwise gene-based dN/dS calculations for high coverage species pairs only - GeneTree stable ID mapping # Pairwise Alignments -- Lastz-net alignments - H.sap-A.mel - H.sap-O.cun - C.fam-A.mel -- Blat-alignments - H.sap-D.rer - M.mus-D.rer - R.nor-D.rer - G.gal-D.rer - T.rub-D.rer - D.rer-X.tro - C.int-D.rer - C.sav-D.rer - G.acu-D.rer - O.lat-D.rer - D.rer-T.nig -- Non-reference alignments for human vs high coverage blastz-net alignments - H.sap-P.tro - H.sap-G.gor - H.sap-P.pyg - H.sap-M.mul - H.sap-M.mus - H.sap-R.nor - H.sap-C.fam - H.sap-B.tau - H.sap-S.scr - H.sap-E.cab - H.sap-O.ana - H.sap-M.dom - H.sap-G.gal # Multiple alignments - 34 way epo low coverage - 14 way epo eutherian mammals - 5 way epo fish # Synteny - H.sap-C.jac - H.sap-O.cun ### Core # Ontology database - A new ontology database ("ensembl_ontology_60") will be built using the latest data from GO and SO. # Gene name and GO term projections - Gene names and GO xrefs will be projected from species where there is high coverage to species where there is lower coverage. Panda will be included as a target for these projections. # external database references - Update external database references for human, mouse and Xenopus # GO Xrefs are now Ontology Xrefs - The go_xref table is renamed to ontology_xref. The Bio::EnsEMBL::GoXref Perl module is renamed to Bio::EnsEMBL::OntologyXref. ### Funcgen # Array Mapping - The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human. # BindingMatrix - A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data. # MotifFeature - A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates. # Schema patch: Schema version - patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60. # Schema patch: associated_feature_type - patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values. # RegulatoryBuild update - The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets. # Position Weight Matrix (PWM) mapping and visualisation - PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific = Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail. # New chip-seq datasets from ENCODE - 93 new ENCODE Chip-Seq datasets for existing cell lines will be added. # probe_feature.cigar_line patch - patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field. Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda ### Genebuild # Update to human vega annotation - An update to Vega human annotation # Gencode gene set update - Update to the Ensembl/Havana Gencode gene set using the latest Vega gene set. # Human cDNA update - Updated set of cDNA alignments to the human genome. # Rabbit chromosomes - Chromosome mapping added for the rabbit genome Coordinates updated accordingly # Human (GRCh37) assembly patch release 2 - Addition of the GRCh37 patch release 2 patches. These are toplevel, non-reference regions of the assembly. # Updated human otherfeatures db: EST alignments - Human ESTs were realigned. New EST-based genes were produced from these EST alignments. # Panda genebuild - The first genebuild for the panda genome # Update human otherfeatures db: new CCDS import - Update to CCDS set for human # Updated mouse otherfeatures db: New CCDS import - Update to CCDS set for mouse # cDNA based gene annotation of human assembly patches - Annotate the human assembly patches using Exonerate's cDNA2genome model, which aligns cDNAs to the genome using annotation identifying the coding regions of the cDNAs. # Zebrafish genebuild - Full genebuild on the new Zv9 assembly # Mouse cDNA update - Updated set of cDNA alignments to the mouse genome # Flagging Translation attribute where the evidence was removed - Add a flag to the translation where a human Ensembl translation used as evidence was removed from the current human database. Species: Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Microbat, Pika, Bushbaby, Chimp, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca # Flagging Translation attribute where the Uniprot evidence was removed - Add a flag to the translation where a supporting evidence from Uniprot was removed from Uniprot database Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda # Updating the ENCODE excluded regions - Update of the ENCODE excluded regions # Fix duplicate transcript attributes - Duplicate transcript attributes removed Species: Anole lizard, Armadillo, Chicken, Human, Mouse, Platypus, Zebra Finch # homo_sapiens rnaseq data - Rnaseq data from transcriptome sequencing done by illumina on human tissues will be provided in a stand-alone database, ie no mart / compara relationships. ### Mart # Ensembl marts for release 60 - Full build of the seven marts: Ensembl Mart, SNP Mart, Functional Genomics Mart, Genomic Features Mart, Ontology Mart, Vega Mart, Sequence Mart ### Variation # Data - update of UniProt identifier links including phenotype information - import of new information from NHGRI and EGA Genome Wide Association Studies - import of new data sets for structural variants from DGVa - import of an expanded data set for all short somatic sequence variants from COSMIC - GVF (Genome Variation Format) dumps for all variants - update of variant consequences for new human gene set - update of variant consequences for new zebrafish assembly and gene set - import new set of 150,000 Zebrafish variants # API and schema change - schema change for ensembl genomes to store the population size for each frequency calculation From bert at ebi.ac.uk Tue Sep 21 13:10:50 2010 From: bert at ebi.ac.uk (bert at ebi.ac.uk) Date: Tue, 21 Sep 2010 13:10:50 +0100 (BST) Subject: [ensembl-announce] Ensembl API workshop Cambridge 1-3 December 2010 Message-ID: <51431.172.22.68.250.1285071050.squirrel@webmail.ebi.ac.uk> Hello all, >From Wednesday December 1st till Friday December 3rd 2010 we will give another Ensembl Developers workshop at the Genetics Department of the University of Cambridge in the UK. This 3-day workshop will cover the Ensembl Core API as well as the Functional Genomics, Variation and Compara APIs and will be given by experts of the respective Ensembl teams. For the workshop some experience with coding in Perl is required. There are no costs for the workshop (and organiser David Judge will even provide lots of free coffee, tea, orange juice, water, cookies, fruit etc. to keep you going ....). After coding the whole day, we will also be happy to show you some of the pubs in Cambridge, e.g. the famous (but touristy) "Eagle", the place where Francis Crick interrupted patrons' lunchtime on 28 February 1953 to announce that he and James Watson had "discovered the secret of life" after they had come up with their proposal for the structure of DNA. To register for this workshop, please go to: http://www.biomed.cam.ac.uk/gradschool/skills/bioinformatics.html (Note that the description of the workshop at the moment still says that three of the four APIs will be covered. This is incorrect as they will all four be covered!) If you have any questions about the workshop you can mail me at bert at ebi.ac.uk. Cheers from sunny Hinxton, Bert Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom http://www.ebi.ac.uk/~bert From glenn at ebi.ac.uk Tue Sep 21 14:18:14 2010 From: glenn at ebi.ac.uk (Glenn Proctor) Date: Tue, 21 Sep 2010 14:18:14 +0100 Subject: [ensembl-announce] Old Ensembl mailing list aliases to be removed Message-ID: As most of you will know, we recently changed to a new infrastructure for the Ensembl mailing lists. Full details are here: http://www.ensembl.org/info/about/contact/mailing.html - in brief, the mailing list addresses are: dev at ensembl.org for general discussion, allows users to post announce at ensembl.org for Ensembl announcements. Posting restricted to Ensembl staff The old addresses (ensembl-dev at ebi.ac.uk and ensembl-announce at ebi.ac.uk) have been kept active up to now but will be switched off on Friday 24th September. You won't have to do anything, and the only difference you'll see is if you accidentally try to post to the old address - it will now bounce immediately. Thanks Glenn. From glenn at ebi.ac.uk Fri Sep 24 09:23:40 2010 From: glenn at ebi.ac.uk (Glenn Proctor) Date: Fri, 24 Sep 2010 09:23:40 +0100 Subject: [ensembl-announce] Old Ensembl mailing list aliases to be removed In-Reply-To: References: Message-ID: As mentioned in the email I sent earlier in the week, the old mailing list aliases have now been removed. Regards Glenn. On Tue, Sep 21, 2010 at 2:18 PM, Glenn Proctor wrote: > As most of you will know, we recently changed to a new infrastructure > for the Ensembl mailing lists. Full details are here: > http://www.ensembl.org/info/about/contact/mailing.html - in brief, the > mailing list addresses are: > > ?dev at ensembl.org for general discussion, allows users to post > > ?announce at ensembl.org for Ensembl announcements. Posting restricted > to Ensembl staff > > The old addresses (ensembl-dev at ebi.ac.uk and > ensembl-announce at ebi.ac.uk) have been kept active up to now but will > be switched off on Friday 24th September. > > You won't have to do anything, and the only difference you'll see is > if you accidentally try to post to the old address - it will now > bounce immediately. > > Thanks > > Glenn. > From jeff at ebi.ac.uk Wed Sep 1 08:42:30 2010 From: jeff at ebi.ac.uk (Jeff Almeida-King) Date: Wed, 01 Sep 2010 08:42:30 +0100 Subject: [ensembl-announce] Ensembl Genomes Release 6 Message-ID: <4C7E03E6.8000507@ebi.ac.uk> The Ensembl Genomes Project is pleased to announce release 6 of Ensembl Genomes (http://www.ensemblgenomes.org/). The main highlights of this release are: * Software migration to Ensembl 59 * 2 new rodent malarial genomes, Plasmodium berghei and Plasmodium chabaudi,and an update to the Plasmodium falciparum gene-set in Ensembl Protists (http://protists.ensembl.org/index.html). * Variation BioMarts added for Plasmodium falciparum, Saccharomyces cerevisiae (http://fungi.ensembl.org/index.html), Arabidopsis thaliana, Oryza sativa indica, Oryza sativa japonica, Vitis vinifera (http://plants.ensembl.org/index.html), Anopheles gambiae and Drosophila melanogaster (http://metazoa.ensembl.org /index.html). See the individual homepages for Bacteria, Protists, Fungi, Plants and Metazoa for more information. From wm2 at ebi.ac.uk Thu Sep 9 14:47:48 2010 From: wm2 at ebi.ac.uk (Will McLaren) Date: Thu, 9 Sep 2010 14:47:48 +0100 Subject: [ensembl-announce] Ensembl Release 60 - summary of declarations of intentions Message-ID: Below is the summary of declarations of intentions for Ensembl release 60. Please note these are intentions and are not guaranteed to be in the release, which is currently scheduled for the 26th of October. Regards, William McLaren ==================================================== Summary of declarations of intentions for Ensembl 60 ==================================================== ### Compara # Families - Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa - Clustering by MCL - Multiple Sequence Alignments with MAFFT - Family stable ID mapping # Gene Homologies - GeneTrees with new/updated genebuilds and assemblies - Updated build of ncRNA trees - Clustering using hcluster_sg - Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm - Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions - Pairwise gene-based dN/dS calculations for high coverage species pairs only - GeneTree stable ID mapping # Pairwise Alignments -- Lastz-net alignments - H.sap-A.mel - H.sap-O.cun - C.fam-A.mel -- Blat-alignments - H.sap-D.rer - M.mus-D.rer - R.nor-D.rer - G.gal-D.rer - T.rub-D.rer - D.rer-X.tro - C.int-D.rer - C.sav-D.rer - G.acu-D.rer - O.lat-D.rer - D.rer-T.nig -- Non-reference alignments for human vs high coverage blastz-net alignments - H.sap-P.tro - H.sap-G.gor - H.sap-P.pyg - H.sap-M.mul - H.sap-M.mus - H.sap-R.nor - H.sap-C.fam - H.sap-B.tau - H.sap-S.scr - H.sap-E.cab - H.sap-O.ana - H.sap-M.dom - H.sap-G.gal # Multiple alignments - 34 way epo low coverage - 14 way epo eutherian mammals - 5 way epo fish # Synteny - H.sap-C.jac - H.sap-O.cun ### Core # Ontology database - A new ontology database ("ensembl_ontology_60") will be built using the latest data from GO and SO. # Gene name and GO term projections - Gene names and GO xrefs will be projected from species where there is high coverage to species where there is lower coverage. Panda will be included as a target for these projections. # external database references - Update external database references for human, mouse and Xenopus # GO Xrefs are now Ontology Xrefs - The go_xref table is renamed to ontology_xref. The Bio::EnsEMBL::GoXref Perl module is renamed to Bio::EnsEMBL::OntologyXref. ### Funcgen # Array Mapping - The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human. # BindingMatrix - A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data. # MotifFeature - A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates. # Schema patch: Schema version - patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60. # Schema patch: associated_feature_type - patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values. # RegulatoryBuild update - The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets. # Position Weight Matrix (PWM) mapping and visualisation - PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific = Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail. # New chip-seq datasets from ENCODE - 93 new ENCODE Chip-Seq datasets for existing cell lines will be added. # probe_feature.cigar_line patch - patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field. Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda ### Genebuild # Update to human vega annotation - An update to Vega human annotation # Gencode gene set update - Update to the Ensembl/Havana Gencode gene set using the latest Vega gene set. # Human cDNA update - Updated set of cDNA alignments to the human genome. # Rabbit chromosomes - Chromosome mapping added for the rabbit genome Coordinates updated accordingly # Human (GRCh37) assembly patch release 2 - Addition of the GRCh37 patch release 2 patches. These are toplevel, non-reference regions of the assembly. # Updated human otherfeatures db: EST alignments - Human ESTs were realigned. New EST-based genes were produced from these EST alignments. # Panda genebuild - The first genebuild for the panda genome # Update human otherfeatures db: new CCDS import - Update to CCDS set for human # Updated mouse otherfeatures db: New CCDS import - Update to CCDS set for mouse # cDNA based gene annotation of human assembly patches - Annotate the human assembly patches using Exonerate's cDNA2genome model, which aligns cDNAs to the genome using annotation identifying the coding regions of the cDNAs. # Zebrafish genebuild - Full genebuild on the new Zv9 assembly # Mouse cDNA update - Updated set of cDNA alignments to the mouse genome # Flagging Translation attribute where the evidence was removed - Add a flag to the translation where a human Ensembl translation used as evidence was removed from the current human database. Species: Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Microbat, Pika, Bushbaby, Chimp, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca # Flagging Translation attribute where the Uniprot evidence was removed - Add a flag to the translation where a supporting evidence from Uniprot was removed from Uniprot database Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda # Updating the ENCODE excluded regions - Update of the ENCODE excluded regions # Fix duplicate transcript attributes - Duplicate transcript attributes removed Species: Anole lizard, Armadillo, Chicken, Human, Mouse, Platypus, Zebra Finch # homo_sapiens rnaseq data - Rnaseq data from transcriptome sequencing done by illumina on human tissues will be provided in a stand-alone database, ie no mart / compara relationships. ### Mart # Ensembl marts for release 60 - Full build of the seven marts: Ensembl Mart, SNP Mart, Functional Genomics Mart, Genomic Features Mart, Ontology Mart, Vega Mart, Sequence Mart ### Variation # Data - update of UniProt identifier links including phenotype information - import of new information from NHGRI and EGA Genome Wide Association Studies - import of new data sets for structural variants from DGVa - import of an expanded data set for all short somatic sequence variants from COSMIC - GVF (Genome Variation Format) dumps for all variants - update of variant consequences for new human gene set - update of variant consequences for new zebrafish assembly and gene set - import new set of 150,000 Zebrafish variants # API and schema change - schema change for ensembl genomes to store the population size for each frequency calculation -------------- next part -------------- An HTML attachment was scrubbed... URL: From dstaines at ebi.ac.uk Tue Sep 21 10:30:10 2010 From: dstaines at ebi.ac.uk (Dan Staines) Date: Tue, 21 Sep 2010 10:30:10 +0100 Subject: [ensembl-announce] Ensembl Genomes Release 7 Intentions Message-ID: <4C987B22.3060404@ebi.ac.uk> Dear all, Please find attached a summary of our intentions for release 7 of Ensembl Genomes, due out on November 9th 2010. Please note these are intentions and are therefore not guaranteed to be completed for November. These can also be viewed online at: http://ensemblgenomes.org/releases/release7 Best regards, Dan Staines, on behalf of the Ensembl Genomes team. ------------------------------------ Ensembl Genomes 7 Release Intentions ------------------------------------ General - release scheduled for 9th November 2010 - update to Ensembl 60 software - new Pan Compara database to include e60 vertebrate genomes and wheat rust - updated pan concern databases from Ensembl Bacteria - updates to core databases for all collections to include latest data from ENA and UniProtKB (including over 60 new genomes acress 5 collections) - updated funcgen databases for Escherichia/Shigella and Staphylococcus clades - updated DNA and peptide compara databases - updated gene and sequence biomarts Protists - standardisation of seq_region names and analysis types - updated tracks for P. falciparum RNASeq data in web interface Fungi - standardisation of seq_region names and analysis types - new core database for Puccinia graminis f. sp. tritici (wheat rust) - updated peptide compara database - updated gene and sequence biomarts Metazoa - new core database for Acyrthosiphon pisum (pea aphid) - updated A. gambiae variation database - updated DNA compara - updated peptide compara - updated gene, sequence and variation biomarts Plants - updated core database for A. thaliana based on TAIR 10 - new core db for Physcomitrella patens - updated core db for new gene for O. sativa indica - updated funcgen database for A. thaliana, O. sativa indica, O. sativa japonica - updated variation database A. thaliana including new data from Nordborg 3.04 and WTCHG - updated variation database for O. sativa indica variation based on new gene set - updated compara databases - updated gene, sequence and variation biomarts -- Dan Staines, PhD Ensembl Genomes Technical Coordinator EMBL-EBI Tel: +44-(0)1223-492507 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From wm2 at ebi.ac.uk Mon Sep 20 16:17:45 2010 From: wm2 at ebi.ac.uk (wm2 at ebi.ac.uk) Date: Mon, 20 Sep 2010 16:17:45 +0100 (BST) Subject: [ensembl-announce] Ensembl Release 60 - summary of declarations of intentions Message-ID: <46333.172.22.68.209.1284995865.squirrel@webmail.ebi.ac.uk> Below is the summary of declarations of intentions for Ensembl release 60. Please note these are intentions and are not guaranteed to be in the release, which is currently scheduled for the 26th of October. Regards, William McLaren ==================================================== Summary of declarations of intentions for Ensembl 60 ==================================================== ### Compara # Families - Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa - Clustering by MCL - Multiple Sequence Alignments with MAFFT - Family stable ID mapping # Gene Homologies - GeneTrees with new/updated genebuilds and assemblies - Updated build of ncRNA trees - Clustering using hcluster_sg - Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm - Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions - Pairwise gene-based dN/dS calculations for high coverage species pairs only - GeneTree stable ID mapping # Pairwise Alignments -- Lastz-net alignments - H.sap-A.mel - H.sap-O.cun - C.fam-A.mel -- Blat-alignments - H.sap-D.rer - M.mus-D.rer - R.nor-D.rer - G.gal-D.rer - T.rub-D.rer - D.rer-X.tro - C.int-D.rer - C.sav-D.rer - G.acu-D.rer - O.lat-D.rer - D.rer-T.nig -- Non-reference alignments for human vs high coverage blastz-net alignments - H.sap-P.tro - H.sap-G.gor - H.sap-P.pyg - H.sap-M.mul - H.sap-M.mus - H.sap-R.nor - H.sap-C.fam - H.sap-B.tau - H.sap-S.scr - H.sap-E.cab - H.sap-O.ana - H.sap-M.dom - H.sap-G.gal # Multiple alignments - 34 way epo low coverage - 14 way epo eutherian mammals - 5 way epo fish # Synteny - H.sap-C.jac - H.sap-O.cun ### Core # Ontology database - A new ontology database ("ensembl_ontology_60") will be built using the latest data from GO and SO. # Gene name and GO term projections - Gene names and GO xrefs will be projected from species where there is high coverage to species where there is lower coverage. Panda will be included as a target for these projections. # external database references - Update external database references for human, mouse and Xenopus # GO Xrefs are now Ontology Xrefs - The go_xref table is renamed to ontology_xref. The Bio::EnsEMBL::GoXref Perl module is renamed to Bio::EnsEMBL::OntologyXref. ### Funcgen # Array Mapping - The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human. # BindingMatrix - A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data. # MotifFeature - A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates. # Schema patch: Schema version - patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60. # Schema patch: associated_feature_type - patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values. # RegulatoryBuild update - The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets. # Position Weight Matrix (PWM) mapping and visualisation - PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific = Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail. # New chip-seq datasets from ENCODE - 93 new ENCODE Chip-Seq datasets for existing cell lines will be added. # probe_feature.cigar_line patch - patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field. Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda ### Genebuild # Update to human vega annotation - An update to Vega human annotation # Gencode gene set update - Update to the Ensembl/Havana Gencode gene set using the latest Vega gene set. # Human cDNA update - Updated set of cDNA alignments to the human genome. # Rabbit chromosomes - Chromosome mapping added for the rabbit genome Coordinates updated accordingly # Human (GRCh37) assembly patch release 2 - Addition of the GRCh37 patch release 2 patches. These are toplevel, non-reference regions of the assembly. # Updated human otherfeatures db: EST alignments - Human ESTs were realigned. New EST-based genes were produced from these EST alignments. # Panda genebuild - The first genebuild for the panda genome # Update human otherfeatures db: new CCDS import - Update to CCDS set for human # Updated mouse otherfeatures db: New CCDS import - Update to CCDS set for mouse # cDNA based gene annotation of human assembly patches - Annotate the human assembly patches using Exonerate's cDNA2genome model, which aligns cDNAs to the genome using annotation identifying the coding regions of the cDNAs. # Zebrafish genebuild - Full genebuild on the new Zv9 assembly # Mouse cDNA update - Updated set of cDNA alignments to the mouse genome # Flagging Translation attribute where the evidence was removed - Add a flag to the translation where a human Ensembl translation used as evidence was removed from the current human database. Species: Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Microbat, Pika, Bushbaby, Chimp, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca # Flagging Translation attribute where the Uniprot evidence was removed - Add a flag to the translation where a supporting evidence from Uniprot was removed from Uniprot database Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda # Updating the ENCODE excluded regions - Update of the ENCODE excluded regions # Fix duplicate transcript attributes - Duplicate transcript attributes removed Species: Anole lizard, Armadillo, Chicken, Human, Mouse, Platypus, Zebra Finch # homo_sapiens rnaseq data - Rnaseq data from transcriptome sequencing done by illumina on human tissues will be provided in a stand-alone database, ie no mart / compara relationships. ### Mart # Ensembl marts for release 60 - Full build of the seven marts: Ensembl Mart, SNP Mart, Functional Genomics Mart, Genomic Features Mart, Ontology Mart, Vega Mart, Sequence Mart ### Variation # Data - update of UniProt identifier links including phenotype information - import of new information from NHGRI and EGA Genome Wide Association Studies - import of new data sets for structural variants from DGVa - import of an expanded data set for all short somatic sequence variants from COSMIC - GVF (Genome Variation Format) dumps for all variants - update of variant consequences for new human gene set - update of variant consequences for new zebrafish assembly and gene set - import new set of 150,000 Zebrafish variants # API and schema change - schema change for ensembl genomes to store the population size for each frequency calculation From wm2 at ebi.ac.uk Mon Sep 20 16:30:02 2010 From: wm2 at ebi.ac.uk (wm2 at ebi.ac.uk) Date: Mon, 20 Sep 2010 16:30:02 +0100 (BST) Subject: [ensembl-announce] Ensembl Release 60 - summary of declarations of intentions Message-ID: <57949.172.22.68.209.1284996602.squirrel@webmail.ebi.ac.uk> Below is the summary of declarations of intentions for Ensembl release 60. Please note these are intentions and are not guaranteed to be in the release, which is currently scheduled for the 26th of October. Regards, William McLaren ==================================================== Summary of declarations of intentions for Ensembl 60 ==================================================== ### Compara # Families - Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa - Clustering by MCL - Multiple Sequence Alignments with MAFFT - Family stable ID mapping # Gene Homologies - GeneTrees with new/updated genebuilds and assemblies - Updated build of ncRNA trees - Clustering using hcluster_sg - Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons) and exon-skipping aware "skipper" algorithm - Homology inference including the recent 'possible_ortholog' type and 'putative gene split' and 'contiguous gene split' exceptions - Pairwise gene-based dN/dS calculations for high coverage species pairs only - GeneTree stable ID mapping # Pairwise Alignments -- Lastz-net alignments - H.sap-A.mel - H.sap-O.cun - C.fam-A.mel -- Blat-alignments - H.sap-D.rer - M.mus-D.rer - R.nor-D.rer - G.gal-D.rer - T.rub-D.rer - D.rer-X.tro - C.int-D.rer - C.sav-D.rer - G.acu-D.rer - O.lat-D.rer - D.rer-T.nig -- Non-reference alignments for human vs high coverage blastz-net alignments - H.sap-P.tro - H.sap-G.gor - H.sap-P.pyg - H.sap-M.mul - H.sap-M.mus - H.sap-R.nor - H.sap-C.fam - H.sap-B.tau - H.sap-S.scr - H.sap-E.cab - H.sap-O.ana - H.sap-M.dom - H.sap-G.gal # Multiple alignments - 34 way epo low coverage - 14 way epo eutherian mammals - 5 way epo fish # Synteny - H.sap-C.jac - H.sap-O.cun ### Core # Ontology database - A new ontology database ("ensembl_ontology_60") will be built using the latest data from GO and SO. # Gene name and GO term projections - Gene names and GO xrefs will be projected from species where there is high coverage to species where there is lower coverage. Panda will be included as a target for these projections. # external database references - Update external database references for human, mouse and Xenopus # GO Xrefs are now Ontology Xrefs - The go_xref table is renamed to ontology_xref. The Bio::EnsEMBL::GoXref Perl module is renamed to Bio::EnsEMBL::OntologyXref. ### Funcgen # Array Mapping - The array mapping pipeline will be run for those species which have new assemblies, gene build or new array designs. This includes an update to the latest version of the Phalanx OneArray for human. # BindingMatrix - A new BindingMatrix class will represent position weight matrices (PWMs) loaded from Jaspar or inferred directly from Chip-Seq data. This will ultimately be able to identify the consequence of a sequence change at a given location, with respect to the PWM score. patch_59_60_c.sql contains the relevant changes to update the schema to support this data. # MotifFeature - A new MotifFeature class has been added to represent the genomic mapping of a position weight matrix (BindingMatrix). patch_59_60_c.sql contains the relevant schema updates. # Schema patch: Schema version - patch_59_60_a.sql updates the meta table, changing the schema_version meta_value to 60. # Schema patch: associated_feature_type - patch_59_60_b.sql updates the associated_feature_type table to support feature_type to feature_type associations. The relevant adaptors have also been updated to reflect the new table fields and values. # RegulatoryBuild update - The human RegulatoryBuild has been updated and re-annotated based on the new ChIP-Seq data sets. # Position Weight Matrix (PWM) mapping and visualisation - PWM mappings which used to be associated with the RegulatoryFeatures, are now associated with the AnnotatedFeatures representing the specific = Transcription Factor Binding Site predictions. This utilises the new MotifFeature and BindingMatrix classes. These new data are available as new tracks in the Regulation panel as well as Region in Detail. # New chip-seq datasets from ENCODE - 93 new ENCODE Chip-Seq datasets for existing cell lines will be added. # probe_feature.cigar_line patch - patch_59_60_d.sql The probe_feature table has been patched to change the cigar_line field to a varchar from a free text field. Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda ### Genebuild # Update to human vega annotation - An update to Vega human annotation # Gencode gene set update - Update to the Ensembl/Havana Gencode gene set using the latest Vega gene set. # Human cDNA update - Updated set of cDNA alignments to the human genome. # Rabbit chromosomes - Chromosome mapping added for the rabbit genome Coordinates updated accordingly # Human (GRCh37) assembly patch release 2 - Addition of the GRCh37 patch release 2 patches. These are toplevel, non-reference regions of the assembly. # Updated human otherfeatures db: EST alignments - Human ESTs were realigned. New EST-based genes were produced from these EST alignments. # Panda genebuild - The first genebuild for the panda genome # Update human otherfeatures db: new CCDS import - Update to CCDS set for human # Updated mouse otherfeatures db: New CCDS import - Update to CCDS set for mouse # cDNA based gene annotation of human assembly patches - Annotate the human assembly patches using Exonerate's cDNA2genome model, which aligns cDNAs to the genome using annotation identifying the coding regions of the cDNAs. # Zebrafish genebuild - Full genebuild on the new Zv9 assembly # Mouse cDNA update - Updated set of cDNA alignments to the mouse genome # Flagging Translation attribute where the evidence was removed - Add a flag to the translation where a human Ensembl translation used as evidence was removed from the current human database. Species: Sloth, Armadillo, Kangaroo rat, Tenrec, Hedgehog, Cat, Wallaby, Mouse Lemur, Microbat, Pika, Bushbaby, Chimp, Rock Hyrax, Megabat, Shrew, Ground Squirrel, Tarsier, Tree Shrew, Dolphin, Alpaca # Flagging Translation attribute where the Uniprot evidence was removed - Add a flag to the translation where a supporting evidence from Uniprot was removed from Uniprot database Species: Anole lizard, Cow, C.elegans, Marmoset, Dog, Guinea Pig, Sloth, C.intestinalis, C.savignyi, Zebrafish, Armadillo, Kangaroo rat, Fly, Tenrec, Horse, Hedgehog, Cat, Chicken, Stickleback, Gorilla, Human, Elephant, Macaque, Wallaby, Mouse Lemur, Opossum, Mouse, Microbat, Pika, Platypus, Rabbit, Medaka, Bushbaby, Chimp, Orangutan, Rock Hyrax, Megabat, Rat, Yeast, Shrew, Ground Squirrel, Pig, Zebra Finch, Fugu, Tarsier, Tetraodon, Tree Shrew, Dolphin, Alpaca, Xenopus, Panda # Updating the ENCODE excluded regions - Update of the ENCODE excluded regions # Fix duplicate transcript attributes - Duplicate transcript attributes removed Species: Anole lizard, Armadillo, Chicken, Human, Mouse, Platypus, Zebra Finch # homo_sapiens rnaseq data - Rnaseq data from transcriptome sequencing done by illumina on human tissues will be provided in a stand-alone database, ie no mart / compara relationships. ### Mart # Ensembl marts for release 60 - Full build of the seven marts: Ensembl Mart, SNP Mart, Functional Genomics Mart, Genomic Features Mart, Ontology Mart, Vega Mart, Sequence Mart ### Variation # Data - update of UniProt identifier links including phenotype information - import of new information from NHGRI and EGA Genome Wide Association Studies - import of new data sets for structural variants from DGVa - import of an expanded data set for all short somatic sequence variants from COSMIC - GVF (Genome Variation Format) dumps for all variants - update of variant consequences for new human gene set - update of variant consequences for new zebrafish assembly and gene set - import new set of 150,000 Zebrafish variants # API and schema change - schema change for ensembl genomes to store the population size for each frequency calculation From bert at ebi.ac.uk Tue Sep 21 13:10:50 2010 From: bert at ebi.ac.uk (bert at ebi.ac.uk) Date: Tue, 21 Sep 2010 13:10:50 +0100 (BST) Subject: [ensembl-announce] Ensembl API workshop Cambridge 1-3 December 2010 Message-ID: <51431.172.22.68.250.1285071050.squirrel@webmail.ebi.ac.uk> Hello all, >From Wednesday December 1st till Friday December 3rd 2010 we will give another Ensembl Developers workshop at the Genetics Department of the University of Cambridge in the UK. This 3-day workshop will cover the Ensembl Core API as well as the Functional Genomics, Variation and Compara APIs and will be given by experts of the respective Ensembl teams. For the workshop some experience with coding in Perl is required. There are no costs for the workshop (and organiser David Judge will even provide lots of free coffee, tea, orange juice, water, cookies, fruit etc. to keep you going ....). After coding the whole day, we will also be happy to show you some of the pubs in Cambridge, e.g. the famous (but touristy) "Eagle", the place where Francis Crick interrupted patrons' lunchtime on 28 February 1953 to announce that he and James Watson had "discovered the secret of life" after they had come up with their proposal for the structure of DNA. To register for this workshop, please go to: http://www.biomed.cam.ac.uk/gradschool/skills/bioinformatics.html (Note that the description of the workshop at the moment still says that three of the four APIs will be covered. This is incorrect as they will all four be covered!) If you have any questions about the workshop you can mail me at bert at ebi.ac.uk. Cheers from sunny Hinxton, Bert Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom http://www.ebi.ac.uk/~bert From glenn at ebi.ac.uk Tue Sep 21 14:18:14 2010 From: glenn at ebi.ac.uk (Glenn Proctor) Date: Tue, 21 Sep 2010 14:18:14 +0100 Subject: [ensembl-announce] Old Ensembl mailing list aliases to be removed Message-ID: As most of you will know, we recently changed to a new infrastructure for the Ensembl mailing lists. Full details are here: http://www.ensembl.org/info/about/contact/mailing.html - in brief, the mailing list addresses are: dev at ensembl.org for general discussion, allows users to post announce at ensembl.org for Ensembl announcements. Posting restricted to Ensembl staff The old addresses (ensembl-dev at ebi.ac.uk and ensembl-announce at ebi.ac.uk) have been kept active up to now but will be switched off on Friday 24th September. You won't have to do anything, and the only difference you'll see is if you accidentally try to post to the old address - it will now bounce immediately. Thanks Glenn. From glenn at ebi.ac.uk Fri Sep 24 09:23:40 2010 From: glenn at ebi.ac.uk (Glenn Proctor) Date: Fri, 24 Sep 2010 09:23:40 +0100 Subject: [ensembl-announce] Old Ensembl mailing list aliases to be removed In-Reply-To: References: Message-ID: As mentioned in the email I sent earlier in the week, the old mailing list aliases have now been removed. Regards Glenn. On Tue, Sep 21, 2010 at 2:18 PM, Glenn Proctor wrote: > As most of you will know, we recently changed to a new infrastructure > for the Ensembl mailing lists. Full details are here: > http://www.ensembl.org/info/about/contact/mailing.html - in brief, the > mailing list addresses are: > > ?dev at ensembl.org for general discussion, allows users to post > > ?announce at ensembl.org for Ensembl announcements. Posting restricted > to Ensembl staff > > The old addresses (ensembl-dev at ebi.ac.uk and > ensembl-announce at ebi.ac.uk) have been kept active up to now but will > be switched off on Friday 24th September. > > You won't have to do anything, and the only difference you'll see is > if you accidentally try to post to the old address - it will now > bounce immediately. > > Thanks > > Glenn. >