[ensembl-dev] Create local ensembl SQL database for new organism

Zhou Albert 030bug at gmail.com
Fri Aug 14 15:41:21 BST 2015


Hi all,

Thanks for all the responses.

James: your method looks great. I will try it in the next few weeks.

best wishes,
Albert


> 在 2015年8月14日,下午3:06,James Allen <jallen at ebi.ac.uk> 写道:
> 
> Hello,
> There are scripts that will allow you to load an assembly and geneset into a core db; all the code is in git repositories, and while there's documentation within the scripts, I'm not aware of anything that provides an overview. Below is  methodology I have used successfully - I make no claim that this is the best/right way to do it...
> 
> 
> Schema:
> Load the schema from the ensembl repo: https://github.com/Ensembl/ensembl/blob/master/sql/table.sql
> There are a few lookup tables which will need to be populated, but I'm not sure how easy this is outside of the EBI; if the populate_production_db_tables.pl (in the ensembl-production repo: https://github.com/Ensembl/ensembl-production/tree/master/scripts/production_database) doesn't work, copy these tables from one of the cores on the public mysql server: attrib_type, external_db, misc_set, unmapped_reason.
> 
> 
> Assembly:
> The scripts for loading an assembly are in the ensembl-pipeline repo: https://github.com/Ensembl/ensembl-pipeline/tree/master/scripts/. You'll need the contig sequences in fasta format, and an AGP file; if you don't have AGP, it's fairly easy to generate from scaffold fasta - I've used this script in the past: http://hmpdacc.org/doc/fasta2apg.pl
> 
> 1. run load_seq_region.pl with the -agp_file parameter to create the scaffolds
> 2. run load_seq_region.pl with the -fasta_file parameter to create the contigs
> 3. run load_agp.pl with the -agp_file parameter to create the links between scaffolds and contigs
> 4. run set_toplevel.pl to add some metadata about the scaffolds
> 5. run load_taxonomy.pl to add some metadata about the species
> 
> 
> Genes:
> Loading genes from GFF can be complicated, because the GFF3 spec allows quite a lot of variation in formatting, even if the spec was religiously adhered to (which it pretty much never is). There's a git repo (https://github.com/dsth/GffDoc) with code for this; use the GffDoc.pl script, which is documented (after a fashion) here: http://www.ebi.ac.uk/~jallen/GffDoc.html.
> 
> If you struggle to get the gff import working, then please let me know, I have an alternative script (that only works if your GFF is valid) but it's undocumented and not in a public repo, so I'd need to provide more detailed guidance on using it...
> 
> 
> Please let me know if you have any questions/problems...
> 
> Cheers,
> James
> 
> 
> On Fri, 14 Aug 2015 13:12:35 +0000
> Luke Goodsell <Luke.Goodsell at ogt.com> wrote:
> 
>> Hi Albert,
>> 
>> Unfortunately, there isn't any documentation (that I know of) for the creation of the tables for a new species, probably because it's quite a varied
>> process depending on the origin of the data. If you want to pursue this approach, I'd suggest studying the schema documentation
>> (http://www.ensembl.org/info/docs/api/core/core_schema.html) and trying to replicate the structure for your new species, using one of the simpler
>> species' architecture as a template. This would be a time-consuming task, though.
>> 
>> Kind regards,
>> Luke
>> 
>> -----Original Message-----
>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Zhou Albert
>> Sent: 14 August 2015 11:11
>> To: Ensembl developers list
>> Subject: Re: [ensembl-dev] Create local ensembl SQL database for new organism
>> 
>> Hi Luke,
>> 
>> Many thanks for the response! 
>> 
>> Yes I have read these pages. However what I would like to do is creating a new SQL database in core schema that contains new organism’s genome data
>> (currently in GFF and FASTA), so that it can be recognized and used in the web code. I’m looking for the proper tool to fulfill this task. 
>> 
>> best wishes,
>> Albert
>> 
>> 
>>> 在 2015年8月14日,上午10:15,Luke Goodsell <Luke.Goodsell at ogt.com> 写道:
>>> 
>>> Hi again, Albert,
>>> 
>>> This section might also be useful: http://www.ensembl.org/info/docs/webcode/custom/index.html
>>> 
>>> And more generally: http://www.ensembl.org/info/docs/webcode/index.html
>>> 
>>> Kind regards,
>>> Luke
>>> 
>>> -----Original Message-----
>>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Luke Goodsell
>>> Sent: 14 August 2015 09:30
>>> To: Ensembl developers list
>>> Subject: Re: [ensembl-dev] Create local ensembl SQL database for new organism
>>> 
>>> Dear Albert
>>> 
>>> Have you seen this section of the EnsEMBL website: http://www.ensembl.org/info/docs/webcode/mirror/index.html
>>> 
>>> Kind regards,
>>> Luke
>>> 
>>> -----Original Message-----
>>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Zhou Albert
>>> Sent: 12 August 2015 13:16
>>> To: dev at ensembl.org
>>> Subject: [ensembl-dev] Create local ensembl SQL database for new organism
>>> 
>>> Dear all,
>>> 
>>> I'm currently working on building a local ensembl database web server, within which we would like to include our own genome data from a new organism.
>>> However after googling this subject, I still can't find any documents explaining how this can be happened. 
>>> 
>>> Could someone please show me where can I find such document / guide, or perhaps the ensembl simply does not provide such function?
>>> 
>>> Many thanks!
>>> 
>>> Albert
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list