[ensembl-dev] problem importing GenBank file into local core DB

梁薰文 a24681012142002 at gmail.com
Mon Jul 11 16:45:52 BST 2016


Hi Dan,

We are importing thirteen Klebsiella pneumonia strains, one of them is PMK1(ASM76461v1).
After your noticing, I indeed found those genomes existing in EnsemblBacteria.
However, I noticed that NCBI RefSeq provides updates to the annotation of the genomes of these strains.  
The mainly difference between RefSeq and GenBank assembly lies in the feature annotation and its number, such as gene number and protein number. Here are predicted number of PMK1 in GenBank and RefSeq assembly:
	a. GenBank version: 5,705 genes, 5,594 proteins
	b. RefSeq version:    5,879 genes, 5,672 proteins

Below lists detailed information of PMK1 strain as reference. (NCBI refseq URL: http://www.ncbi.nlm.nih.gov/refseq/ <http://www.ncbi.nlm.nih.gov/refseq/>)
Strain: PMK1 (direct URL to the assembly record http://www.ncbi.nlm.nih.gov/assembly/GCA_000764615.1 <http://www.ncbi.nlm.nih.gov/assembly/GCA_000764615.1>)
GenBank assembly accession: GCA_000764615.1 (latest) 
gb file URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000764615.1_ASM76461v1/GCA_000764615.1_ASM76461v1_genomic.gbff.gz <ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000764615.1_ASM76461v1/GCA_000764615.1_ASM76461v1_genomic.gbff.gz>
RefSeq assembly accession: GCF_000764615.1 (latest)
gb file URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000764615.1_ASM76461v1/GCF_000764615.1_ASM76461v1_genomic.gbff.gz <ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000764615.1_ASM76461v1/GCF_000764615.1_ASM76461v1_genomic.gbff.gz>

Last time you also mentioned "origin-spanning features”, but I only found this on google.
It’s one of the ensembl-dev topics in 2005.
Please tell me if I find wrong.
http://dev.ensembl.narkive.com/9KWpLOT4/circular-sequences <http://dev.ensembl.narkive.com/9KWpLOT4/circular-sequences>

Thanks very much for your help and improve our code.
Please find it in the attachment.

Susan


> On Jul 7, 2016, at 4:52 PM, Dan Staines <dstaines at ebi.ac.uk> wrote:
> 
> Hi Susan,
> 
> Ensembl does support origin-spanning features - we have these in Ensembl
> Bacteria. Can you please share with me a small piece of code showing how
> you are storing the data so we can see what the problem might be?
> 
> Out of interest, which prokaryotic genomes are you importing? There are
> over 40,000 in Ensembl Bacteria which come from EMBL/GenBank so its
> possible that the genomes you are interested in are already present.
> 
> Thanks,
> 
> Dan.
> 
> -- 
> Dan Staines, PhD
> Genomics Technology Infrastructure Coordinator
> EMBL-EBI, Wellcome Trust Genome Campus
> Cambridge CB10 1SD, UK
> Tel: +44-(0)1223-492507

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160711/8a3e77c1/attachment.html>


More information about the Dev mailing list