[ensembl-dev] next gen sequencing mapping data

Simon White sw4 at sanger.ac.uk
Fri Mar 18 09:41:37 GMT 2011


Hi Hardip,
I have been storing RNASeq data in the Ensembl DnaAlignFeature table  
for a while now, there is code in ensembl-analysis to do this using  
Exonerate as an aligner.
Whilst this has worked for us in the past and we have been able to  
store and process fairly large datasets, ( approx 800M reads in one  
case ), the tables are not really designed for this type of data, and  
it can get very slow and put a large strain on the databases,  
particularly when writing to very full tables.
More recently we have been using BAM files to store the data, and  
parsing the data using Bio::Db::Sam which integrates with the ensembl  
API and has been very useful for comparing RNASeq data to Ensembl  
annotations. Additionally, we can now visualise the BAM files  
directly in the web browser which is helpful when checking your results.
I would recommend using BAM files and taking a look at Bio::DB::Sam,  
it makes life a lot easier.

Thanks.
Simon

On 18 Mar 2011, at 06:13, Hardip Patel wrote:

> Hi All
>
> I was wondering if anybody has tried or has opinions about using  
> mysql tables of Ensembl to store next gen sequencing mapping data.  
> Our lab works on variety of projects (small RNA, transcriptome,  
> methylation, genome resequencing) in  human and mouse.
>
> I was interested in putting all our mapping data coming from  
> different projects in one place organized according to the  
> respective genomes.
>
> Main reasons being we can interrogate our data in terms of Ensembl  
> annotations of the genomes and compare various data from these two  
> species using one co-ordinate system.
>
> For example, if I have miRNA sequencing data from human and mouse.  
> Then I would like to see if the transcripts present in human are  
> also present in mouse in my dataset for corresponding genomic  
> locations, using genomic alignments as the basis for such  
> comparisons and draw meaningful information out of it.
>
> I would also appreciate if somebody can point me in the right  
> direction in terms of what tables need to be updated, and how to go  
> about it.
>
> Kind regards
>
> -- 
> Hardip R. Patel
> Bioinformatician, Molecular Genetics Division,
> Victor Chang Cardiac Research Institute,
> Darlinghurst, NSW – 2010, Australia
> (W): +61 – 2 – 9295 8611
> (M): 0449 180 715
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list