[ensembl-dev] MySql Database
Andreas Kusalananda Kähäri
ak4 at sanger.ac.uk
Tue Mar 5 10:39:37 GMT 2013
The naive SQL approach probably would find it difficult to incorporate
things like assembly exceptions and indels too...
Cheers,
Andreas
On Tue, Mar 05, 2013 at 10:30:00AM +0000, Magali Ruffier wrote:
> Hi Sunita,
>
> We only store the dna sequence for contigs, as any other sequence can be
> inferred from there.
> The assembly table stores the relationship between contigs and other
> coord systems, for example chromosomes.
>
> Something like
> select asm.* from assembly asm, gene g where g.seq_region_id =
> asm.seq_region_id and stable_id = 'ENSG0000014648'
> will give you all the relationships which exist for the chromosome your
> feature is stored on.
>
> You would then need to restrict it to the portion of chromosome where
> the gene is located, as well as to the chromosome-contig relationships.
> Eventually, linking those results back to the dna table would give you
> the genomic sequence.
>
> This is why we need an API to deal with these kind of queries.
> Storing an entire chromosome sequence in one chunk is probably not
> doable in a database.
>
>
> Hope that helps,
> Magali
>
>
> On 04/03/13 20:18, Koul, Sunita wrote:
> >
> > Magali.
> >
> > Thanks for your response. I will surely try the API. However I am
> > trying to figure out the schema and a simple query like this doesn't
> > yield anything(just trying to get the genomic sequence of EGFR gene)
> >
> >
> >
> > select*fromdna whereseq_region_id in(selectseq_region_id fromgene
> > wherestable_id ='ENSG00000146648') --EGFR
> >
> >
> >
> > I think I am still missing something here
> >
> > Also the dna doesn't seem to have the complete sequence at chromosome
> > level
> >
> > For example the seq_region_id from the following query doesn't have
> > any entry in dna.
> >
> > select*fromdna whereseq_region_id in(selectseq_region_idfrom
> > seq_region wherecoord_system_id =*2*andname notlike'H%')
> >
> >
> >
> > Essentially how Can I build the entire genomic sequence for a given
> > chromosome from this schema
> >
> >
> >
> > Thanks
> >
> > Sunita
> >
> >
> >
> >
> >
> > *From:*Magali [mailto:mr6 at ebi.ac.uk]
> > *Sent:* Monday, March 04, 2013 11:42 AM
> > *To:* Ensembl developers list
> > *Cc:* Koul, Sunita
> > *Subject:* Re: [ensembl-dev] MySql Database
> >
> >
> >
> > Hi Sunita,
> >
> > The genomic sequence of the assembly is stored in the dna table, only
> > in the core database.
> > Generally though, features are stored on chromosomes, which themselves
> > are assembled from the various bits of genomic sequence stored in the
> > dna table.
> >
> > We do not recommend querying the database directly to retrieve sequences.
> > We provide a number of tools to do so in an easier fashion.
> >
> > If you are familiar with perl programming, you can use our perl API.
> > http://www.ensembl.org/info/docs/Doxygen/core-api/index.html
> >
> > The sequences can also be downloaded via our ftp site
> > http://www.ensembl.org/info/data/ftp/index.html
> >
> > Various endpoints can be retrieved using our new rest service
> > http://beta.rest.ensembl.org/
> >
> > Or you can use our biomart server
> > http://www.ensembl.org/biomart/martview/07e32c2046773639dad947c91ff9ef7a
> >
> >
> > Hope that helps,
> > Magali
> >
> > On 04/03/13 17:34, Koul, Sunita wrote:
> >
> > Hi,
> >
> > I have locally setup the mysql databases for Homo Sapiens
> >
> > Here are the different schemas I have
> >
> >
> >
> > homo_sapiens_core_69_37
> >
> > homo_sapiens_cdna_69_37
> >
> > homo_sapiens_coreexpressionatlas_69_37
> >
> > ....
> >
> >
> >
> > I am trying to figure out where the actual sequences are stored .
> >
> > For e.g if I am interested in a genomic sequence of "EGFR" gene
> > how do I get it from these tables
> >
> >
> >
> > Thanks
> >
> > Sunita
> >
> >
> >
> > */The materials in this email are private and may contain
> > Protected Health Information. If you are not the intended
> > recipient, be advised that any unauthorized use, disclosure,
> > copying, distribution or the taking of any action in reliance on
> > the contents of this information is strictly prohibited. If you
> > have received this email in error, please immediately notify the
> > sender via telephone or return email./*
> >
> >
> >
> >
> > _______________________________________________
> >
> > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> >
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> >
> > Ensembl Blog: http://www.ensembl.info/
> >
> >
> >
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
Andreas Kusalananda Kähäri
Ensembl Gene Annotation Team
Tap tap tap.
More information about the Dev
mailing list