[ensembl-dev] ortholog vs paralog?

Thomason, James thomason at cshl.edu
Thu Jul 5 19:47:06 BST 2012


Could somebody point me towards the proper way to differentiate between an orthologous vs paralogous gene?

I've got a simple script that needs to dump out all orthologs and paralogs, simplified here:

my @genes = (get list of genes somehow);

while (my $gene = shift @genes) {
    my $orthologues = $gene->get_all_homologous_Genes;
    while (my $o = shift @$orthologues) {
        #print out the pair

Inside that inner while loop, I was just look at the the description ($o->[1]->description, in this case), and checking a regex against /ortholog/ or /paralog/ and printing out to the appropriate file that way. But I know that that's not quite right because I found a paralog pair that has a description of "putative_gene_split", which fails my handy little regex check.

So what I wanted to know is - what's the proper way to differentiate between the two of them? My biology knowledge is incredibly light, but doing a little poking around looks like it might be as simple as just comparing the species of the two genes (the one I originally called get_all_homologous_Genes and the one I just shifted off the @$orthologues array). If it's the same species, they're a paralog, and if different an ortholog.

Am I on the right track and is that sufficient? If not, what's the proper way to go about it?

Many thanks,

-Jim Thomason...

Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory

More information about the Dev mailing list