[ensembl-dev] Problem with formatdb and several pep FASTA files
Toni Hermoso Pulido
toni.hermoso at crg.cat
Wed Jun 13 17:37:25 BST 2012
Hi Andy,
if I rely on my pipeline, I would dare to say that there are more
files affected (I don't know how many empty seqs per file, though)
One excerpt of error messages when FASTA files cannot be formatted by
NCBI Blast below:
mv: cannot stat
`/db/ensembl/release-67/callithrix_jacchus/proteome/Callithrix_jacchus.C_jacchus3.2.1.67.pep.all.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/choloepus_hoffmanni/proteome/Choloepus_hoffmanni.choHof1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/danio_rerio/proteome/Danio_rerio.Zv9.67.pep.all.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/echinops_telfairi/proteome/Echinops_telfairi.TENREC.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/erinaceus_europaeus/proteome/Erinaceus_europaeus.HEDGEHOG.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/felis_catus/proteome/Felis_catus.CAT.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/gadus_morhua/proteome/Gadus_morhua.gadMor1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/homo_sapiens/proteome/Homo_sapiens.GRCh37.67.pep.all.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/macropus_eugenii/proteome/Macropus_eugenii.Meug_1.0.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/mus_musculus/proteome/Mus_musculus.NCBIM37.67.pep.all.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/ochotona_princeps/proteome/Ochotona_princeps.pika.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/ornithorhynchus_anatinus/proteome/Ornithorhynchus_anatinus.OANA5.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/oryzias_latipes/proteome/Oryzias_latipes.MEDAKA1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/procavia_capensis/proteome/Procavia_capensis.proCap1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/sorex_araneus/proteome/Sorex_araneus.COMMON_SHREW1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/tarsius_syrichta/proteome/Tarsius_syrichta.tarSyr1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/tetraodon_nigroviridis/proteome/Tetraodon_nigroviridis.TETRAODON8.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/vicugna_pacos/proteome/Vicugna_pacos.vicPac1.67.pep.abinitio.fa.*':
No such file or directory
mv: cannot stat
`/db/ensembl/release-67/xenopus_tropicalis/proteome/Xenopus_tropicalis.JGI_4.2.67.pep.abinitio.fa.*':
No such file or directory
So I understand you plan to replace the files in the FTP site, don't you?
Thanks for all,
2012/6/13 Andy Yates <ayates at ebi.ac.uk>:
> Hi Toni,
>
> We are currently aware of this issue. These 0 length sequences have appeared due to a bug in our FASTA serialiser being unable to handle sequences of length 1. This was not picked up during our dumping process as we do not generate NCBI blast indexes. The files are now being regenerated. The current list of known affected species and their protein counts are:
>
> callithrix_jacchus 1
> danio_rerio 1
> homo_sapiens 13
> mus_musculus 5
>
> Does this correspond to your own list?
>
> All the best,
>
> Andy
>
> Andrew Yates Ensembl Core Software Project Leader
> EMBL-EBI Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK http://www.ensembl.org/
>
> On 13 Jun 2012, at 15:25, Toni Hermoso Pulido wrote:
>
>> Hello,
>>
>> there seems to be a problem with a few FASTA pep files of some
>> organisms when performing a formatdb (2.2.25 and 2.2.26 tested):
>>
>> $ blast/blast-2.2.26/bin/formatdb -i Mus_musculus.NCBIM37.67.pep.all.fa
>> [formatdb] WARNING: Cannot add sequence number 19278
>> (lcl|19278_Mus_musculus.NCBIM37.67.pep.all.) because it has
>> zero-length.
>>
>> [formatdb] FATAL ERROR: Fatal error when adding sequence to BLAST database.
>>
>> This happens with empty FASTA, in this case:
>>> ENSMUSP00000118372 pep:known chromosome:NCBIM37:4:117507600:117515714:1 gene:ENSMUSG00000028542 transcript:ENSMUST00000151316 gene_biotype:protein_coding transcript_biotype:protein_coding
>>
>> I haven't experienced a similar issue in the past.
>>
More information about the Dev
mailing list