[ensembl-dev] empty SpeciesFactory in FASTA pipeline

Lel Eory lel.eory at roslin.ed.ac.uk
Thu Dec 18 16:12:52 GMT 2014


Hi Mag,

Thank you for the detailed reply, I much appreciate it. From the query 
it is clear that I do not have changelog and changelog_species populated 
with information relevant to my species although the species table is 
populated.
Do you run a script to populate these tables?
If not I will look into the ensembl_production db and figure out what 
information do I nee to add.

Thanks again,
Lel


On 12/18/2014 11:11 AM, mr6 at ebi.ac.uk wrote:
> Hi Lel,
>
> This pipeline relies heavily on the production database.
>
> As we only want to dump fasta files for species which have changed for the
> release, the ScheduleSpecies module checks in the production database if
> there have been any changes declared for this species, for this release.
> If there is no declaration, the species is skipped.
>
> To circumvent this, you should be able to use the -run_all option, as this
> is used to tell the pipeline to ignore the declarations and just run
> everything. It needs an argument though, so you would need to add -run_all
> 1 to your init_pipeline command line
>
> The -species argument is meant to filter out a species or list of species.
> So if you specify -species homo_sapiens, it will only take into account
> human when deciding whether or not it should dump data. This will still
> check in the production database if there is anything to dump for that
> species though.
>
> The -force_species argument will allow you to skip the production database
> check for a species or list of species.
>
> So running the following command line should work for you without needing
> to add anything in the production database
> init_pipeline.pl
> Bio::EnsEMBL::Production::Pipeline::PipeConfig::FASTA_conf -user
> write_user -password ******  -host=my_local_host -no_scp 1 -base_path
>
>
> If you want to try to get the expected data in the production database,
> this is the check that decides whether a species needs data dumping or
> not:
> my_base_path -registry my_registry.conf -run_all 1
> select count(*)
> from changelog c
> join changelog_species cs using (changelog_id)
> join species s using (species_id)
> where c.release_id = 78
> and (c.assembly = 'Y' or c.repeat_masking = 'Y')
> and c.status = 'handed_over'
> and s.production_name = 'species_name'
>
>
> Let me know if that helps,
> mag
>
>> Hello Developers,
>>
>> I try to run the FASTA pipeline per the document
>> https://github.com/Ensembl/ensembl-production/blob/release/78/docs/fasta.textile
>> My registry file is set-up as suggested.
>> I run init_pipeline either with -run_all or with -species defined, but
>> then beekeeper skips the analyses as no species defined for the pipeline.
>>
>>   From beekeeper:
>> Scheduler : Discarded 17 analyses because they do not need any Workers.
>>
>> ....
>> Worker 1 [ UNSPECIALIZED ] specializing to ScheduleSpecies(1)
>>
>> -------------------- WARNING ----------------------
>> MSG: acanthisitta_chloris is not a valid species name (check DB and API
>> version)
>> FILE: Bio/EnsEMBL/Registry.pm LINE: 1200
>> CALLED BY: Production/Pipeline/SpeciesFactory.pm  LINE: 85
>> Date (localtime)    = Thu Dec 18 10:25:53 2014
>> Ensembl API version = 78
>> ---------------------------------------------------
>>
>> The problem is most likely related to my production database, as I can
>> list all the core databases using my registry file and these are present
>> for release 78. Can someone suggest a way to check what is wrong and why
>> SpeciesFactory does not generate the list of species beekeeper needs?
>>
>> Thank you.
>>
>> Best wishes,
>> Lel
>>
>>
>>
>> ---------------------------------------------
>> Lel Eory, PhD
>> The Roslin Institute
>> University of Edinburgh Easter Bush Campus
>> Midlothian EH25 9RG
>> Scotland UK
>> Phone: +44 131 6519212
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/


---------------------------------------------
Lel Eory, PhD
The Roslin Institute
University of Edinburgh Easter Bush Campus
Midlothian EH25 9RG
Scotland UK
Phone: +44 131 6519212


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.





More information about the Dev mailing list