[ensembl-dev] possible 'off by one error' in ensembl-functgenomics/scripts/miscellaneous/sam2bed.pl ?

Nathan Johnson njohnson at ebi.ac.uk
Tue Oct 30 11:33:35 GMT 2012


This is now fixed on the head, and hence will make it our into the wild in v70.  The default is now 0 based, use -one_based for previous behaviour. 

Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

http://www.ensembl.info/
http://twitter.com/#!/ensembl






On 30 Aug 2012, at 14:05, Hans-Rudolf Hotz wrote:

> Hi
> 
> I am struggling with the sam2bed.pl script, and I wonder whether it has one of those famous 'off by one error' bugs?
> 
> 
> SAM files (like GFF files) use the 1-based coordinate system and are end inclusive. BED files use the o-based coordinate system and are end exclusive (see:  the SAM spec http://samtools.sourceforge.net/SAM1.pdf or http://genome.ucsc.edu/FAQ/FAQformat.html)
> 
> 
> Now,I look at the following script:
> 
> ~/ensembl-67/ensembl-functgenomics/scripts/miscellaneous/sam2bed.pl
> 
> 
> I get the position in line 120, ie:
> 
> my ($name, $flag, $slice_name, $pos, $mapq, undef, undef, undef, undef, $read) = split("\t");
> 
> 
> The $pos variable is not modified and directly used in line 130
> 
> push @cache, join("\t", ($seq_region_name, $pos, ($pos +length($read) -1), $name, $mapq, $strand));
> 
> 
> Shouldn't this rather be written like:
> 
> push @cache, join("\t", ($seq_region_name, ($pos -1), ($pos +length($read) -1), $name, $mapq, $strand));
> 
> 
> for the end coordinate: ($pos +length($read) is correct (ie half-closed-half-open interval or end exclusive regions used in BED files) .
> 
> 
> Is this a oversight in the script?
> 
> 
> Thank you very much for any clarification
> 
> Regards, Hans
> 
> 
> 
> -- 
> 
> 
> 
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
> 
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list