[ensembl-dev] Gorilla assembly coverage depth

Gregory Jordan greg at ebi.ac.uk
Sat Sep 17 12:46:52 BST 2011


Hi Matthieu,

Thanks for the tips. I was just about to run some analyses on v64 (the
addition of lamprey looks pretty cool!), but maybe I'll hold off till the
next release if there's a problem with gorilla.

Regarding dN/dS estimates -- I've got some ideas on how one might sensibly
incorporate more accurate phylogenetic (rather than pairwise) dN/dS
calculations into the Compara pipeline, with most of the code already
written. Perhaps we should chat sometime.

--greg

On Fri, Sep 16, 2011 at 1:04 PM, Matthieu Muffato <muffato at ebi.ac.uk> wrote:

> Hi Will and Greg
>
> For the protein tree pipeline, this tag is used to select the genomes for
> the dn/ds calculation. Because it was considered as a low-coverage genome,
> we don't have any dn/ds value for gorilla vs * homologues.
>
> By the way, some of the gorilla CDS sequences stored in the Compara
> database are erroneous (http://www.ensembl.info/**contact-us/known-bugs/<http://www.ensembl.info/contact-us/known-bugs/>),
> so any comparative analysis using the gorilla should go to the core database
> to fetch the CDS sequences (the protein sequences are unaffected)
>
> Hope this helps,
> Matthieu
>
>
> On 15/09/11 22:56, William Spooner wrote:
>
>> Thanks for the heads-up Greg,
>>
>> This meta_key is certainly used by the Compara ProteinTrees pipeline
>> (Bio::EnsEMBL::Compara::**RunnableDB::ProteinTrees::**GroupGenomesUnderTaxa),
>> although I don't know what the downstream ramifications of the 'low'
>> (basically not 'high' or '6X') setting are. I tend to set everything to
>> 'high' to be on the safe side.
>>
>> Will
>>
>> On 15 Sep 2011, at 18:36, Gregory Jordan wrote:
>>
>>  I understand that things in the 'meta' table tend to be for internal use
>>> only. But the assembly coverage depth information is only accessible from
>>> there, and surely this can't be accurate anymore:
>>>
>>>  mysql -uensro -hens-livemirror -e "select * from
>>>> gorilla_gorilla_core_64_31.**meta where meta_key='assembly.coverage_**
>>>> depth'\G"
>>>>
>>> *************************** 1. row ***************************
>>>    meta_id: 81
>>> species_id: 1
>>>   meta_key: assembly.coverage_depth
>>> meta_value: low
>>>
>>> I doubt many people are actually using this undocumented information...
>>> but it caught me off guard, and it would be a shame for someone attempting
>>> to filter out low-coverage genomes to end up throwing the baby out with the
>>> bathwater, so to speak!
>>>
>>> Cheers,
>>>  greg
>>>
>>
>> --
>> William Spooner
>> whs at eaglegenomics.com
>> http://www.eaglegenomics.com
>>
>>
> --
> Matthieu Muffato, Ph.D.
> Ensembl Developer - Comparative Genomics
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge, CB10 1SD, United Kingdom
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110917/9ccfc14c/attachment.html>


More information about the Dev mailing list