[ensembl-dev] Conceptual Confusion about Strands for Data mining
sumir at sanbi.ac.za
Thu Feb 9 09:21:03 GMT 2012
I am using Ensembl API version 65 to obtain a 1000bp slice for each
gene for promoter content analysis. Basically my script uses the start
position of each gene and expands by a 1000bp from the 5' end of a gene
using co-ordinates obtained from the "fetch_by_gene_stable_id()" method.
On the plus strand this would be a 1000bp upstream from the 5' end on a
gene (the first UTR / exon). My conceptual difficulties are :
1) Does this orientation hold / differ for genes on the negative strand?
2) Do I need to reverse this when obtaining a slice from a gene located
on the negative strand i.e instead of obtaining a 1000bp using the start
co-ordinates I should use the end co-ordinates obtained by the
The reason I am confused is that the API documentation states :
"Note that for historical reasons the fetch_by_gene_stable_id() method
always returns a slice on the forward strand even if the gene is on the
Does this mean that all 5' slices obtained for genes using co-ordinates
from this method would ideally capture the transcription start site
regardless of strand orientation?
Another portion of the API documentation states :
"Like all Ensembl features the start of an exon is always less than or
equal to the end of the exon, regardless of the strand it is on. The
start of the transcript is the start of the first exon of a transcript
on the forward strand or the end of the last exon of a transcript on the
reverse strand. The start and end of a gene are defined to be the lowest
start value of its transcripts and the highest end value respectively. "
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Dev