[ensembl-dev] Gorilla assembly coverage depth

Matthieu Muffato muffato at ebi.ac.uk
Fri Sep 16 13:04:08 BST 2011


Hi Will and Greg

For the protein tree pipeline, this tag is used to select the genomes 
for the dn/ds calculation. Because it was considered as a low-coverage 
genome, we don't have any dn/ds value for gorilla vs * homologues.

By the way, some of the gorilla CDS sequences stored in the Compara 
database are erroneous (http://www.ensembl.info/contact-us/known-bugs/), 
so any comparative analysis using the gorilla should go to the core 
database to fetch the CDS sequences (the protein sequences are unaffected)

Hope this helps,
Matthieu

On 15/09/11 22:56, William Spooner wrote:
> Thanks for the heads-up Greg,
>
> This meta_key is certainly used by the Compara ProteinTrees pipeline (Bio::EnsEMBL::Compara::RunnableDB::ProteinTrees::GroupGenomesUnderTaxa), although I don't know what the downstream ramifications of the 'low' (basically not 'high' or '6X') setting are. I tend to set everything to 'high' to be on the safe side.
>
> Will
>
> On 15 Sep 2011, at 18:36, Gregory Jordan wrote:
>
>> I understand that things in the 'meta' table tend to be for internal use only. But the assembly coverage depth information is only accessible from there, and surely this can't be accurate anymore:
>>
>>> mysql -uensro -hens-livemirror -e "select * from gorilla_gorilla_core_64_31.meta where meta_key='assembly.coverage_depth'\G"
>> *************************** 1. row ***************************
>>     meta_id: 81
>> species_id: 1
>>    meta_key: assembly.coverage_depth
>> meta_value: low
>>
>> I doubt many people are actually using this undocumented information... but it caught me off guard, and it would be a shame for someone attempting to filter out low-coverage genomes to end up throwing the baby out with the bathwater, so to speak!
>>
>> Cheers,
>>   greg
>
> --
> William Spooner
> whs at eaglegenomics.com
> http://www.eaglegenomics.com
>

-- 
Matthieu Muffato, Ph.D.
Ensembl Developer - Comparative Genomics
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom




More information about the Dev mailing list