[ensembl-dev] genomic_align_blocks sort order

Caffrey, Daniel Daniel.Caffrey at umassmed.edu
Wed Jan 29 18:16:51 GMT 2014


Dear Ensembl developers,

I'm interested in identifying the best pairwise LASTZ alignment within a list of genomic_align_blocks (mouse v human).  I'm using a modified version of the example provided at :

http://useast.ensembl.org/info/docs/api/compara/compara_tutorial.html

1) The score() method is not particularly useful as it does not appear to take the alignment length into account. Is there another method that returns something similar to an E-value?

2) Based on crude metrics (% nucleotides aligned and eyeballing), the alignments appear to be ordered in the list from best to worst . Can anyone confirm how/if the alignments are ordered?


Thanks,
Daniel

command line options (API V73):
--seq_region X --seq_region_start 105500254 --seq_region_end 105500638 --output_file out.aln --species mouse



Relevant code snippet:

foreach my $this_genomic_align_block (@$genomic_align_blocks) {
    my $restricted_gab = $this_genomic_align_block->restrict_between_reference_positions($seq_region_start, $seq_region_end);
   my  $this_align = $restricted_gab->get_SimpleAlign;
    print $alignIO  $this_align;
    #my $pid= $this_align->percentage_identity; #non-gap columns
    my $overallPid= $this_align->overall_percentage_identity; #all columns
    my $seq1=$this_align->get_seq_by_pos(1);
    my $seq2=$this_align->get_seq_by_pos(2);
    my $seqName2=$seq2->display_id();
    my $start=$seq2->start();
    my $end=$seq2->end();

    my $seqStr1=$seq1->seq();
    my $seqStr2=$seq2->seq();
    my $length=length($seqStr1);
    my $aligned=0;
    for (my $i=0;$i<$length;$i++){
        my $nuc1=substr($seqStr1,$i,1);
        my $nuc2=substr($seqStr2,$i,1);
        if ($nuc1 ne '-' && $nuc2 ne '-'){
        $aligned++;
        }
    }
    my $score=$this_genomic_align_block->score();
    my $rscore=$restricted_gab->score();
    my $percentAligned=$aligned/$sliceLength;
    print STDERR "$seqName2 $start $end $aligned $sliceLength perceAligned:$percentAligned score:$score restricted score:$rscore\n";
}

STDERR:

homo_sapiens/X 150684357 150684616 236 385 perceAligned:0.612987012987013 score:13262 restricted score:
homo_sapiens/X 145891632 145891730 99 385 perceAligned:0.257142857142857 score:29716 restricted score:
homo_sapiens/X 145895821 145895919 99 385 perceAligned:0.257142857142857 score:29716 restricted score:
homo_sapiens/HG1459_PATCH 145896018 145896116 99 385 perceAligned:0.257142857142857 score:29716 restricted score:
homo_sapiens/HG1459_PATCH 145891829 145891927 99 385 perceAligned:0.257142857142857 score:29716 restricted score:
homo_sapiens/X 145891846 145891927 80 385 perceAligned:0.207792207792208 score:29716 restricted score:
homo_sapiens/X 145895624 145895705 80 385 perceAligned:0.207792207792208 score:29716 restricted score:
homo_sapiens/HG1459_PATCH 145895821 145895902 80 385 perceAligned:0.207792207792208 score:29716 restricted score:
homo_sapiens/HG1459_PATCH 145892043 145892124 80 385 perceAligned:0.207792207792208 score:29716 restricted score:
homo_sapiens/X 150684688 150684739 52 385 perceAligned:0.135064935064935 score:13262 restricted score:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140129/f457034f/attachment.html>


More information about the Dev mailing list