[ensembl-dev] Gene build pipeline - 2 issues

Thibaut Hourlier th3 at sanger.ac.uk
Tue Aug 27 13:02:05 BST 2013


Hi Marc,

On 26 Aug 2013, at 10:59, Marc Hoeppner <mphoeppner at gmail.com> wrote:

> Hi EnsEMBL team, 
> 
> been playing with the pipeline again, but am having problems (again). Please see below for details - am happy about any suggestions. 
> 
> Cheers, 
> 
> Marc 
> 
> ######## 
> 1) Pmatch 
> ######## 
> 
> I set up a pmatch analysis as by the documentation and it runs fine on my test dataset (small chicken chromosome) when I try it with test_RunnableDB. However, when I run the pipeline, I get this: 
> 
> TARGET  0.064u 0.008s 0+0k 0pf 0sw 
> BUILD   0.116u 0.040s 0+0k 0pf 0sw 
> SEARCH  22.949u 0.172s 0+0k 0pf 0sw 
> WARN: For multiple species use species attribute in DBAdaptor->new() 
> WRITING: Lost the will to live Error 
> Job 1198 failed: [ 
> -------------------- EXCEPTION -------------------- 
> MSG: Problems for Pmatch writing output for chromosome:vchicken_test:10:1:19911089:1 [Can't call method "version" on an undefined value at /opt/bioinformatics/ensembl-70/ensembl/modules/Bio/EnsEMBL/DBSQL/MetaContainer.pm line 218. 
> ] 
> STACK Bio::EnsEMBL::Pipeline::Job::run_module /opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/Job.pm:720
> STACK (eval) /opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/runner.pl:219
> STACK main::run_jobs_with_lsfcopy /opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/runner.pl:218
> STACK toplevel /opt/bioinformatics/ensembl-70/ensembl-pipeline/modules/Bio/EnsEMBL/Pipeline/runner.pl:128
> Date (localtime)    = Fri Aug 23 14:53:27 2013 
> Ensembl API version = 70 
> 
We would need to see how your coord_system and meta tables are populated.
The API complains that it can't find the version of your assembly. Your coord_system table should look like this one:
+-----------------------+----------------+------------------+------------+-------+--------------------------------+
| coord_system_id | species_id | name               | version   | rank | attrib                         |
+-----------------------+----------------+------------------+------------+-------+--------------------------------+
|                             1 |                   1 | contig              | NULL      |       3 | default_version,sequence_level |
|                             2 |                   1 | scaffold           | oryCun2 |       2 | default_version                |
|                             3 |                   1 | chromosome | oryCun2 |       1 | default_version                |
> 
> ########## 
> 2) Unigene 
> ########## 
> 
> This one really bothers me I think everything is set up correctly (downloaded the unigene file, header seems to comply with the reference formatting in Blast.pm etc), bit I cannot for the life of me get it to work. Specifically, I am trying to use ncbi blast and the command just looks off - seems like it tries to do a mix of Wublast and Ncbi blast (works fine with Uniprot though - so perhaps something with the BlastGenscanDna module?). 
> 
> Running job 1791 
> Module is BlastGenscanDNA 
> Input id is contig:vchicken_test:10_68:1:50000:1 
> Analysis is unigene 
> Files are /data2/projects/annotation/EnsEMBL/chicken/output//unigene/0/contig:vchicken_test:10_114:1:50000:1.unigene.55.retry2.out /data2/projects/annotation/EnsEMBL/chicken/output//unigene/0/contig:vchicken_test:10_114:1:50000:1.unige$
> 
> -------------------- WARNING ---------------------- 
> MSG: Error running Blast cmd </usr/bin/blastall -d /data2/projects/annotation/EnsEMBL/chicken/refseqs/unigene.fa -i /tmp/seq.22305.24863.fa -cpus=1 2>&1 > /tmp/unigene.fa.22305.5651.blast.out>. Returned error 256 BLAST EXIT: '1', SIGNA$ 
> FILE: Analysis/Runnable/Blast.pm LINE: 380 
> CALLED BY: EnsEMBL/Analysis/Runnable.pm  LINE: 729 
> Date (localtime)    = Fri Aug 23 14:54:47 2013 
> Ensembl API version = 70 
Have you tried to run the command by itself to see if it works? The error message you have seems to be from the ncbi blast program.
As the module dies the temporary file containing your chicken sequence should still exists. If not, you will need to comment a line in the run method of ensembl-analysis/modules/Bio/EnsEMBL/Analysis/Runnable.pm:

  #$self->delete_files;

You probably need to change your parameters in the analysis table of your reference database. We use WU blast at the moment.

Also, the parameters for blast should be "-cpus 1 -hitdist 40" instead of "-cpus => 1, -hitdist => 40"

Regards
Thibaut

> 
> And here the config for the unigene search: 
> 
> [unigene] 
> db=unigene 
> db_file=/data2/projects/annotation/EnsEMBL/chicken/refseqs/unigene.fa 
> program=blastall 
> program_file=blastall 
> parameters=-cpus => 1, -hitdist => 40 
> module=BlastGenscanDNA 
> input_id_type=CONTIG 
> 
> (Blast.pm is configured to use 'ncbi' as default type, so unigene should inherit that, no?)
> 
> 
> -- 
> Marc P. Hoeppner, PhD 
> Department of Medical Biochemistry and Microbiology 
> Uppsala University, Sweden 
> marc.hoeppner at imbim.uu.se 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/0ec4fbe0/attachment.html>


More information about the Dev mailing list