[ensembl-dev] Gorilla assembly coverage depth

Amonida Zadissa amonida at sanger.ac.uk
Tue Sep 20 14:39:31 BST 2011


Dear Greg,

I have some further information about the coverage depth for gorilla.
This original value of 'low' is actually correct because the assembly
behaves more like a low-coverage genome than a high-coverage one.

The decision for this setting was based on the fact that we were
unable to produce an annotation of the same high quality as from an
assembly entirely based on WGS sequencing. Therefore, we will retain
this setting as it is in upcoming releases too.

Hopefully this clarifies the situation.

Please contact us if you have any further questions.

Cheers,
Amonida

On Sat, Sep 17, 2011 at 04:52:23PM +0100, amonida wrote:
> Hi Greg,
> 
> Thanks for pointing out this error. The gorilla annotation is high
> coverage genebuild so the assembly coverage should have been set to 'high'
> instead of its current value. We will rectify this in the next Ensembl
> release. Meanwhile, you can set the value for the assembly.coverage_depth
> to 'high' and use the gorilla annotation in your work.
> 
> Sorry about this oversight.
> 
> Cheers,
> Amonida
> 
> --
> Amonida Zadissa
> Ensembl Genebuild Team
> Wellcome Trust Sanger Institute
> 
> On Sat, 17 Sep 2011 12:46:52 +0100, Gregory Jordan <greg at ebi.ac.uk> wrote:
> > Hi Matthieu,
> > 
> > Thanks for the tips. I was just about to run some analyses on v64 (the
> > addition of lamprey looks pretty cool!), but maybe I'll hold off till
> the
> > next release if there's a problem with gorilla.
> > 
> > Regarding dN/dS estimates -- I've got some ideas on how one might
> sensibly
> > incorporate more accurate phylogenetic (rather than pairwise) dN/dS
> > calculations into the Compara pipeline, with most of the code already
> > written. Perhaps we should chat sometime.
> > 
> > --greg
> > 
> > On Fri, Sep 16, 2011 at 1:04 PM, Matthieu Muffato <muffato at ebi.ac.uk>
> > wrote:
> > 
> >> Hi Will and Greg
> >>
> >> For the protein tree pipeline, this tag is used to select the genomes
> for
> >> the dn/ds calculation. Because it was considered as a low-coverage
> >> genome,
> >> we don't have any dn/ds value for gorilla vs * homologues.
> >>
> >> By the way, some of the gorilla CDS sequences stored in the Compara
> >> database are erroneous
> >>
> (http://www.ensembl.info/**contact-us/known-bugs/<http://www.ensembl.info/contact-us/known-bugs/>),
> >> so any comparative analysis using the gorilla should go to the core
> >> database
> >> to fetch the CDS sequences (the protein sequences are unaffected)
> >>
> >> Hope this helps,
> >> Matthieu
> >>
> >>
> >> On 15/09/11 22:56, William Spooner wrote:
> >>
> >>> Thanks for the heads-up Greg,
> >>>
> >>> This meta_key is certainly used by the Compara ProteinTrees pipeline
> >>>
> (Bio::EnsEMBL::Compara::**RunnableDB::ProteinTrees::**GroupGenomesUnderTaxa),
> >>> although I don't know what the downstream ramifications of the 'low'
> >>> (basically not 'high' or '6X') setting are. I tend to set everything
> to
> >>> 'high' to be on the safe side.
> >>>
> >>> Will
> >>>
> >>> On 15 Sep 2011, at 18:36, Gregory Jordan wrote:
> >>>
> >>>  I understand that things in the 'meta' table tend to be for internal
> >>>  use
> >>>> only. But the assembly coverage depth information is only accessible
> >>>> from
> >>>> there, and surely this can't be accurate anymore:
> >>>>
> >>>>  mysql -uensro -hens-livemirror -e "select * from
> >>>>> gorilla_gorilla_core_64_31.**meta where
> meta_key='assembly.coverage_**
> >>>>> depth'\G"
> >>>>>
> >>>> *************************** 1. row ***************************
> >>>>    meta_id: 81
> >>>> species_id: 1
> >>>>   meta_key: assembly.coverage_depth
> >>>> meta_value: low
> >>>>
> >>>> I doubt many people are actually using this undocumented
> information...
> >>>> but it caught me off guard, and it would be a shame for someone
> >>>> attempting
> >>>> to filter out low-coverage genomes to end up throwing the baby out
> >>>> with the
> >>>> bathwater, so to speak!
> >>>>
> >>>> Cheers,
> >>>>  greg
> >>>>
> >>>
> >>> --
> >>> William Spooner
> >>> whs at eaglegenomics.com
> >>> http://www.eaglegenomics.com
> >>>
> >>>
> >> --
> >> Matthieu Muffato, Ph.D.
> >> Ensembl Developer - Comparative Genomics
> >> European Bioinformatics Institute (EMBL-EBI)
> >> Wellcome Trust Genome Campus, Hinxton
> >> Cambridge, CB10 1SD, United Kingdom
> >>




More information about the Dev mailing list