[ensembl-dev] next gen sequencing mapping data
Simon White
sw4 at sanger.ac.uk
Fri Mar 18 09:41:37 GMT 2011
Hi Hardip,
I have been storing RNASeq data in the Ensembl DnaAlignFeature table
for a while now, there is code in ensembl-analysis to do this using
Exonerate as an aligner.
Whilst this has worked for us in the past and we have been able to
store and process fairly large datasets, ( approx 800M reads in one
case ), the tables are not really designed for this type of data, and
it can get very slow and put a large strain on the databases,
particularly when writing to very full tables.
More recently we have been using BAM files to store the data, and
parsing the data using Bio::Db::Sam which integrates with the ensembl
API and has been very useful for comparing RNASeq data to Ensembl
annotations. Additionally, we can now visualise the BAM files
directly in the web browser which is helpful when checking your results.
I would recommend using BAM files and taking a look at Bio::DB::Sam,
it makes life a lot easier.
Thanks.
Simon
On 18 Mar 2011, at 06:13, Hardip Patel wrote:
> Hi All
>
> I was wondering if anybody has tried or has opinions about using
> mysql tables of Ensembl to store next gen sequencing mapping data.
> Our lab works on variety of projects (small RNA, transcriptome,
> methylation, genome resequencing) in human and mouse.
>
> I was interested in putting all our mapping data coming from
> different projects in one place organized according to the
> respective genomes.
>
> Main reasons being we can interrogate our data in terms of Ensembl
> annotations of the genomes and compare various data from these two
> species using one co-ordinate system.
>
> For example, if I have miRNA sequencing data from human and mouse.
> Then I would like to see if the transcripts present in human are
> also present in mouse in my dataset for corresponding genomic
> locations, using genomic alignments as the basis for such
> comparisons and draw meaningful information out of it.
>
> I would also appreciate if somebody can point me in the right
> direction in terms of what tables need to be updated, and how to go
> about it.
>
> Kind regards
>
> --
> Hardip R. Patel
> Bioinformatician, Molecular Genetics Division,
> Victor Chang Cardiac Research Institute,
> Darlinghurst, NSW – 2010, Australia
> (W): +61 – 2 – 9295 8611
> (M): 0449 180 715
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list