[ensembl-dev] Conceptual Confusion about Strands for Data mining

Thu Feb 9 09:21:03 GMT 2012

  Hello,
  I am using Ensembl API version 65 to obtain a 1000bp slice for each 
gene for promoter content analysis. Basically my script uses the start 
position of each gene and expands by a 1000bp from the 5' end of a gene 
using co-ordinates obtained from the "fetch_by_gene_stable_id()" method. 
On the plus strand this would be a 1000bp upstream from the 5' end on a 
gene (the first UTR / exon). My conceptual difficulties are :

1) Does this orientation hold / differ for genes on the negative strand?

2) Do I need to reverse this when obtaining a slice from a gene located 
on the negative strand i.e instead of obtaining a 1000bp using the start 
co-ordinates I should use the end co-ordinates obtained by the 
"fetch_by_gene_stable_id()" method?

The reason I am confused is that the API documentation states :
"Note that for historical reasons the fetch_by_gene_stable_id() method 
always returns a slice on the forward strand even if the gene is on the 
reverse strand."

Does this mean that all 5' slices obtained for genes using co-ordinates 
from this method would ideally capture the transcription start site 
regardless of strand orientation?

Another portion of the API documentation states :

"Like all Ensembl features the start of an exon is always less than or 
equal to the end of the exon, regardless of the strand it is on. The 
start of the transcript is the start of the first exon of a transcript 
on the forward strand or the end of the last exon of a transcript on the 
reverse strand. The start and end of a gene are defined to be the lowest 
start value of its transcripts and the highest end value respectively. "

  Thank you,

  Sumir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120209/146e2029/attachment.html>