[ensembl-dev] RNAseq counts per gene/transcript

Thibaut Hourlier thibaut at ebi.ac.uk
Wed Jul 9 14:53:56 BST 2014


Dear Hardip,
You will need to use the EnsEMBL Perl API to retrieve information about RNASeq data in our RNASeq database. For human it is: homo_sapiens_rnaseq_75_37, http://www.ensembl.org/info/data/mysql.html
If you will work intensively on the databases, it is better to have a local copy which you can get here (whole database column): http://www.ensembl.org/info/data/ftp/index.html.

We only store the models and the evidence that supports the intron boundaries. Using the API, from a transcript object you can get the IntronSupportingEvidence (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html#a80480a144f029f090dccd52d33762887) and then use the score method to know the number of reads that overlap the intron. Only reads spanning the intron of the models are counted.

If you want to see all the intron spanning reads you will need to get the DnaAlignFeature for your region of interest. The score method will tell you the number of reads which span the intron.

You can also use the bam files http://www.ensembl.org/info/data/ftp/index.html (column BAM) that we generated using BWA. You will need to filter the reads using the exon boundaries as we allow some mismatches during the alignment.

About the source of the data, we do not really store information in the database but you can find it either in the summary section of a species or in ENA (http://www.ebi.ac.uk/ena/data/view/ERP000546 this is the Illumina human Bodymap URL)
Using the API, Bio::EnsEMBL::Analysis->logic_name, the method will indirectly tell you the tissue as we name the analysis based on the tissue: "human_brain_rnaseq" => gene models using only reads from brain tissue

Hope this helps
Regards

Thibaut
 
On 30 Jun 2014, at 14:23, Hardip Patel <Hardip.Patel at anu.edu.au> wrote:

> Dear
> 
> I was wondering if there is a way to pull out the information for RNAseq data in the Ensembl v75. Specifically, I am interested in knowing the counts per gene and counts per transcript for the species where such RNAseq data is used in the Ensembl. Also, the source of RNA, i.e. meta data of the RNA samples would be useful too such as tissue, age, sex etc.
> 
> Any help is greatly appreciated.
> 
> Kind regards
> 
> Hardip
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list