[ensembl-dev] Gorilla assembly coverage depth

amonida amonida at sanger.ac.uk
Sat Sep 17 16:52:23 BST 2011


Hi Greg,

Thanks for pointing out this error. The gorilla annotation is high
coverage genebuild so the assembly coverage should have been set to 'high'
instead of its current value. We will rectify this in the next Ensembl
release. Meanwhile, you can set the value for the assembly.coverage_depth
to 'high' and use the gorilla annotation in your work.

Sorry about this oversight.

Cheers,
Amonida

--
Amonida Zadissa
Ensembl Genebuild Team
Wellcome Trust Sanger Institute

On Sat, 17 Sep 2011 12:46:52 +0100, Gregory Jordan <greg at ebi.ac.uk> wrote:
> Hi Matthieu,
> 
> Thanks for the tips. I was just about to run some analyses on v64 (the
> addition of lamprey looks pretty cool!), but maybe I'll hold off till
the
> next release if there's a problem with gorilla.
> 
> Regarding dN/dS estimates -- I've got some ideas on how one might
sensibly
> incorporate more accurate phylogenetic (rather than pairwise) dN/dS
> calculations into the Compara pipeline, with most of the code already
> written. Perhaps we should chat sometime.
> 
> --greg
> 
> On Fri, Sep 16, 2011 at 1:04 PM, Matthieu Muffato <muffato at ebi.ac.uk>
> wrote:
> 
>> Hi Will and Greg
>>
>> For the protein tree pipeline, this tag is used to select the genomes
for
>> the dn/ds calculation. Because it was considered as a low-coverage
>> genome,
>> we don't have any dn/ds value for gorilla vs * homologues.
>>
>> By the way, some of the gorilla CDS sequences stored in the Compara
>> database are erroneous
>>
(http://www.ensembl.info/**contact-us/known-bugs/<http://www.ensembl.info/contact-us/known-bugs/>),
>> so any comparative analysis using the gorilla should go to the core
>> database
>> to fetch the CDS sequences (the protein sequences are unaffected)
>>
>> Hope this helps,
>> Matthieu
>>
>>
>> On 15/09/11 22:56, William Spooner wrote:
>>
>>> Thanks for the heads-up Greg,
>>>
>>> This meta_key is certainly used by the Compara ProteinTrees pipeline
>>>
(Bio::EnsEMBL::Compara::**RunnableDB::ProteinTrees::**GroupGenomesUnderTaxa),
>>> although I don't know what the downstream ramifications of the 'low'
>>> (basically not 'high' or '6X') setting are. I tend to set everything
to
>>> 'high' to be on the safe side.
>>>
>>> Will
>>>
>>> On 15 Sep 2011, at 18:36, Gregory Jordan wrote:
>>>
>>>  I understand that things in the 'meta' table tend to be for internal
>>>  use
>>>> only. But the assembly coverage depth information is only accessible
>>>> from
>>>> there, and surely this can't be accurate anymore:
>>>>
>>>>  mysql -uensro -hens-livemirror -e "select * from
>>>>> gorilla_gorilla_core_64_31.**meta where
meta_key='assembly.coverage_**
>>>>> depth'\G"
>>>>>
>>>> *************************** 1. row ***************************
>>>>    meta_id: 81
>>>> species_id: 1
>>>>   meta_key: assembly.coverage_depth
>>>> meta_value: low
>>>>
>>>> I doubt many people are actually using this undocumented
information...
>>>> but it caught me off guard, and it would be a shame for someone
>>>> attempting
>>>> to filter out low-coverage genomes to end up throwing the baby out
>>>> with the
>>>> bathwater, so to speak!
>>>>
>>>> Cheers,
>>>>  greg
>>>>
>>>
>>> --
>>> William Spooner
>>> whs at eaglegenomics.com
>>> http://www.eaglegenomics.com
>>>
>>>
>> --
>> Matthieu Muffato, Ph.D.
>> Ensembl Developer - Comparative Genomics
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge, CB10 1SD, United Kingdom
>>




More information about the Dev mailing list