[ensembl-dev] a question about dN and dS

Yuan Chen yuan at sanger.ac.uk
Thu Aug 30 18:32:21 BST 2012


Thanks for the clear explanation .

Yuan

Sent from my mobile


On 30 Aug 2012, at 17:16, "Matthieu Muffato" <muffato at ebi.ac.uk> wrote:

> Hi Yuan and Michael
> 
> In Compara, homologies always contain exactly 2 genes. For each homology, the pairwise protein alignment is sent to PAML (BioPerl module: Bio::Tools::Run::Phylo::PAML::Codeml), which returns the values of n, s, dn, and ds.
> However, I am not familiar with PAML and I cannot tell why n and s are decimal values.
> 
> Regards,
> Matthieu
> 
> On 30/08/12 16:59, Yuan Chen wrote:
>> So do compara calculate number of synonymous change for the gene in
>> paired species ? ie like human is ACG chimp is AGG, then you will count
>> as one non-synonymous change, do you store this in database as n ?
>> 
>> if a homology consists more than 2 species, do you calculate multi pair
>> species, than average as I do got n and s as decimal number, such as for
>> gene : ENSG00000251258 n=543.5 s=245.5 ?
>> 
>> Thanks
>> 
>> yuan
>> On 30 Aug 2012, at 15:46, Michael Paulini wrote:
>> 
>>> I have to take that back, it is per pair and therefore the maximum N
>>> should be 2*alignment size and is exact. Whereas the dN and dS is
>>> estimated through PAML.
>>> 
>>> M
>>> 
>>> 
>>> On 30/08/12 15:07, Michael Paulini wrote:
>>>> as far as I interpret it, it is the total number of nonsynonymous bases
>>>> it the alignment, so you can get a maximum of
>>>> size-of-alignment-block*number-of-sequences.
>>>> It can't be the average, as it returns integers .... that is unless it
>>>> rounds them.
>>>> 
>>>> But there is a $homology->dn method, that returns the nonsynonymous
>>>> substitution rate ... as in: the average rate per bp.
>>>> 
>>>> 
>>>> M
>>>> 
>>>> On 30/08/12 14:51, Yuan Chen wrote:
>>>>> Dear Michael,
>>>>> In document, it said "number of nonsynonymous positions for the homology"
>>>>> 
>>>>> Suppose the homology consist of 10 species, is the average number (i.e total number of nonsynonymous changes devided by number of species) or just total number of nonsynonymous positions ?
>>>>> 
>>>>> As the n or s is not a integer, it's a decimal number, so I thought it would be some kind of average number ?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> yuan
>>>>> On 30 Aug 2012, at 14:17, Michael Paulini wrote:
>>>>> 
>>>>>> Have a look here:http://www.ensembl.org/info/docs/Doxygen/compara-api/classBio_1_1EnsEMBL_1_1Compara_1_1Homology.html
>>>>>> 
>>>>>> there you can find the documentation to the methods.
>>>>>> 
>>>>>> M
>>>>>> 
>>>>>> 
>>>>>> On 30/08/12 14:05, Yuan Chen wrote:
>>>>>>> On the same line, can any one explain what is n and s obtained by :
>>>>>>> 
>>>>>>> $homology->n; $homology->s;
>>>>>>> 
>>>>>>> Is this a number of non_synonymous or synonymous changes for the gene ?
>>>>>>> 
>>>>>>> yuan
>>>>>>> On 30 Aug 2012, at 09:16, Matthieu Muffato wrote:
>>>>>>> 
>>>>>>>> Dear Mei
>>>>>>>> 
>>>>>>>> It seems that you are querying a fruit-fly gene. Unfortunately, the dN/dS values are only computed for close enough species: mammals, reptiles, and tetraodontiformes.
>>>>>>>> 
>>>>>>>> Nevertheless, your script is correct and would print some values if you use a human gene as query
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Matthieu
>>>>>>>> 
>>>>>>>> On 30/08/12 01:49, JiangMei wrote:
>>>>>>>>> Hi All.
>>>>>>>>> 
>>>>>>>>> Sorry to bother you. I'm trying to use ensembl-compara (database version
>>>>>>>>> 67) to extract the homologues. I also want to get the dN, dS and dN/dS.
>>>>>>>>> However, ENSEMBL can't output these values. Can anyone help me?
>>>>>>>>> 
>>>>>>>>> The following is the script I used:
>>>>>>>>> 
>>>>>>>>> use Bio::EnsEMBL::Registry;
>>>>>>>>> my $registry = 'Bio::EnsEMBL::Registry';
>>>>>>>>> $registry->load_registry_from_db(
>>>>>>>>>       -host       =>'ensembldb.ensembl.org',
>>>>>>>>>       -user       =>'anonymous',
>>>>>>>>>       -db_version =>'67');
>>>>>>>>> my $member_adaptor=$registry->get_adaptor('Multi','compara','Member');
>>>>>>>>> my
>>>>>>>>> $member=$member_adaptor->fetch_by_source_stable_id('ENSEMBLGENE','FBgn0002780');
>>>>>>>>> my $homology_adaptor=$registry->get_adaptor('Multi','compara','Homology');
>>>>>>>>> my $homologies=$homology_adaptor->fetch_all_by_Member($member);
>>>>>>>>> 
>>>>>>>>> for $homology(@{$hom ologies}){
>>>>>>>>>     for $mem(@{$homology->get_all_Members}){
>>>>>>>>>         my $taxon=$mem->taxon; #check Bio::EnsEMBL::Compara::NCBITaxon
>>>>>>>>> for methods
>>>>>>>>>         my $id=$mem->stable_id;
>>>>>>>>>         print "$id\t",$taxon->taxon_id,"\t",$taxon->genus,"
>>>>>>>>> ",$taxon->species,"\t";
>>>>>>>>>      }
>>>>>>>>>     print $homology->description,"\t",$homology->subtype,"\t";
>>>>>>>>>     my $dn=$homology->dn;
>>>>>>>>>     my $ds=$homology->ds;
>>>>>>>>>     my $dnds=$homology->dnds_ratio;
>>>>>>>>>     my $lnl=$homology->lnl;
>>>>>>>>>     ($dn)?print "$dn\t$ds\t$dnds\t$lnl\n":print OUT "NA\tNA\tNA\tNA\n";
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Wish your help! Thanks very much in advance!
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best, Mei
>>>>>>>>> 




More information about the Dev mailing list