[ensembl-dev] Mapping transcript coordinate to toplevel (genome)

Sung Gong sung at bio.cc
Thu Nov 10 10:31:15 GMT 2011


Hi,

I wanted to convert transcript coordinates onto the top level using
Bio::EnsEMBL::TranscriptMapper.

Below is the script:

sub cds2genome{
    my ($enst_id,$cds_from,$cds_to)=@_;

    my $transcript=$tr_adaptor->fetch_by_stable_id($enst_id);
    my $tm = Bio::EnsEMBL::TranscriptMapper->new($transcript);
    my $chr=$transcript->transform('toplevel')?
$transcript->transform('toplevel')->slice->seq_region_name():'\N';
    my $t_strand=defined $transcript->seq_region_strand?
$transcript->seq_region_strand : '+1';

    my ($g_start,$g_end,$seq);

    my $cdna_start=$cds_from + $transcript->cdna_coding_start - 1;
    my $cdna_end=$cds_to + $transcript->cdna_coding_start - 1;

    my @coord=$tm->cdna2genomic($cdna_start,$cdna_end);
    foreach my $coord(@coord){
        if($coord->isa('Bio::EnsEMBL::Mapper::Gap')){
            warn
"\t[GAP]ENST:$enst_id\tCDS_START:$cds_from\tCDS_END:$cds_to\tG_START-END:",$coord->start,"-",$coord->end,"\n";
        }else{
            $g_start=$coord->start if defined $coord->start;
            $g_end=$coord->end if defined $coord->end;

            my $slice = $slice_adaptor->fetch_by_region('chromosome', $chr);

            my $feature=Bio::EnsEMBL::Feature->new(
                -start=>$g_start,
                -end=>$g_end,
                -strand=>$slice->strand, # 1 as a default
                -slice=>$slice,
            );
            $seq=defined $feature->seq? $feature->seq : '\N';
        }
    }
    return($chr,$g_start,$g_end,$seq,$t_strand);
}

The code reads very well I guess ;)

It works very well, except only one position shown below:
ENST00000155840 921 923
which returns:
chr:11  2604665  2604666

I am wondering why the length is different (921-923 & 2604665-2604666)?
Only one such a case out of more than 100 entries, which are OK.

Any thoughts?

Cheers,
Sung




More information about the Dev mailing list