[ensembl-dev] possible 'off by one error' in ensembl-functgenomics/scripts/miscellaneous/sam2bed.pl ?
Nathan Johnson
njohnson at ebi.ac.uk
Tue Oct 30 11:33:35 GMT 2012
This is now fixed on the head, and hence will make it our into the wild in v70. The default is now 0 based, use -one_based for previous behaviour.
Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
http://www.ensembl.info/
http://twitter.com/#!/ensembl
On 30 Aug 2012, at 14:05, Hans-Rudolf Hotz wrote:
> Hi
>
> I am struggling with the sam2bed.pl script, and I wonder whether it has one of those famous 'off by one error' bugs?
>
>
> SAM files (like GFF files) use the 1-based coordinate system and are end inclusive. BED files use the o-based coordinate system and are end exclusive (see: the SAM spec http://samtools.sourceforge.net/SAM1.pdf or http://genome.ucsc.edu/FAQ/FAQformat.html)
>
>
> Now,I look at the following script:
>
> ~/ensembl-67/ensembl-functgenomics/scripts/miscellaneous/sam2bed.pl
>
>
> I get the position in line 120, ie:
>
> my ($name, $flag, $slice_name, $pos, $mapq, undef, undef, undef, undef, $read) = split("\t");
>
>
> The $pos variable is not modified and directly used in line 130
>
> push @cache, join("\t", ($seq_region_name, $pos, ($pos +length($read) -1), $name, $mapq, $strand));
>
>
> Shouldn't this rather be written like:
>
> push @cache, join("\t", ($seq_region_name, ($pos -1), ($pos +length($read) -1), $name, $mapq, $strand));
>
>
> for the end coordinate: ($pos +length($read) is correct (ie half-closed-half-open interval or end exclusive regions used in BED files) .
>
>
> Is this a oversight in the script?
>
>
> Thank you very much for any clarification
>
> Regards, Hans
>
>
>
> --
>
>
>
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
>
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list