[ensembl-dev] Problem with formatdb and several pep FASTA files

Andy Yates ayates at ebi.ac.uk
Wed Jun 13 17:13:59 BST 2012


Hi Toni,

We are currently aware of this issue. These 0 length sequences have appeared due to a bug in our FASTA serialiser being unable to handle sequences of length 1. This was not picked up during our dumping process as we do not generate NCBI blast indexes. The files are now being regenerated. The current list of known affected species and their protein counts are:

callithrix_jacchus	1
danio_rerio	1
homo_sapiens	13
mus_musculus	5

Does this correspond to your own list?

All the best,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 13 Jun 2012, at 15:25, Toni Hermoso Pulido wrote:

> Hello,
> 
> there seems to be a problem with a few FASTA pep files of some
> organisms when performing a formatdb (2.2.25 and 2.2.26 tested):
> 
> $ blast/blast-2.2.26/bin/formatdb -i Mus_musculus.NCBIM37.67.pep.all.fa
> [formatdb] WARNING: Cannot add sequence number 19278
> (lcl|19278_Mus_musculus.NCBIM37.67.pep.all.) because it has
> zero-length.
> 
> [formatdb] FATAL ERROR: Fatal error when adding sequence to BLAST database.
> 
> This happens with empty FASTA, in this case:
>> ENSMUSP00000118372 pep:known chromosome:NCBIM37:4:117507600:117515714:1 gene:ENSMUSG00000028542 transcript:ENSMUST00000151316 gene_biotype:protein_coding transcript_biotype:protein_coding
> 
> I haven't experienced a similar issue in the past.
> 
> Best,
> 
> Toni
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list