[ensembl-dev] possible 'off by one error' in ensembl-functgenomics/scripts/miscellaneous/sam2bed.pl ?

Nathan Johnson njohnson at ebi.ac.uk
Tue Oct 30 09:13:40 GMT 2012


Hi Hans

The issue with this script is that we wrote it for use within the ensembl pipelines, which use 1 based loci. So it worked for our needs (and was more efficient), but yes it did corrupt the bed format in the process.  I will update the script to ensure format validity asap.

Aslo, apologies for the tardy response, this one slipped through the net somehow.

Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

http://www.ensembl.info/
http://twitter.com/#!/ensembl






On 30 Aug 2012, at 14:05, Hans-Rudolf Hotz wrote:

> Hi
> 
> I am struggling with the sam2bed.pl script, and I wonder whether it has one of those famous 'off by one error' bugs?
> 
> 
> SAM files (like GFF files) use the 1-based coordinate system and are end inclusive. BED files use the o-based coordinate system and are end exclusive (see:  the SAM spec http://samtools.sourceforge.net/SAM1.pdf or http://genome.ucsc.edu/FAQ/FAQformat.html)
> 
> 
> Now,I look at the following script:
> 
> ~/ensembl-67/ensembl-functgenomics/scripts/miscellaneous/sam2bed.pl
> 
> 
> I get the position in line 120, ie:
> 
> my ($name, $flag, $slice_name, $pos, $mapq, undef, undef, undef, undef, $read) = split("\t");
> 
> 
> The $pos variable is not modified and directly used in line 130
> 
> push @cache, join("\t", ($seq_region_name, $pos, ($pos +length($read) -1), $name, $mapq, $strand));
> 
> 
> Shouldn't this rather be written like:
> 
> push @cache, join("\t", ($seq_region_name, ($pos -1), ($pos +length($read) -1), $name, $mapq, $strand));
> 
> 
> for the end coordinate: ($pos +length($read) is correct (ie half-closed-half-open interval or end exclusive regions used in BED files) .
> 
> 
> Is this a oversight in the script?
> 
> 
> Thank you very much for any clarification
> 
> Regards, Hans
> 
> 
> 
> -- 
> 
> 
> 
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
> 
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list