[ensembl-dev] empty SpeciesFactory in FASTA pipeline

Thu Dec 18 11:11:42 GMT 2014

Hi Lel,

This pipeline relies heavily on the production database.

As we only want to dump fasta files for species which have changed for the
release, the ScheduleSpecies module checks in the production database if
there have been any changes declared for this species, for this release.
If there is no declaration, the species is skipped.

To circumvent this, you should be able to use the -run_all option, as this
is used to tell the pipeline to ignore the declarations and just run
everything. It needs an argument though, so you would need to add -run_all
1 to your init_pipeline command line

The -species argument is meant to filter out a species or list of species.
So if you specify -species homo_sapiens, it will only take into account
human when deciding whether or not it should dump data. This will still
check in the production database if there is anything to dump for that
species though.

The -force_species argument will allow you to skip the production database
check for a species or list of species.

So running the following command line should work for you without needing
to add anything in the production database
init_pipeline.pl
Bio::EnsEMBL::Production::Pipeline::PipeConfig::FASTA_conf -user
write_user -password ******  -host=my_local_host -no_scp 1 -base_path

If you want to try to get the expected data in the production database,
this is the check that decides whether a species needs data dumping or
not:
my_base_path -registry my_registry.conf -run_all 1
select count(*)
from changelog c
join changelog_species cs using (changelog_id)
join species s using (species_id)
where c.release_id = 78
and (c.assembly = 'Y' or c.repeat_masking = 'Y')
and c.status = 'handed_over'
and s.production_name = 'species_name'

Let me know if that helps,
mag

> Hello Developers,
>
> I try to run the FASTA pipeline per the document
> https://github.com/Ensembl/ensembl-production/blob/release/78/docs/fasta.textile
> My registry file is set-up as suggested.
> I run init_pipeline either with -run_all or with -species defined, but
> then beekeeper skips the analyses as no species defined for the pipeline.
>
>  From beekeeper:
> Scheduler : Discarded 17 analyses because they do not need any Workers.
>
> ....
> Worker 1 [ UNSPECIALIZED ] specializing to ScheduleSpecies(1)
>
> -------------------- WARNING ----------------------
> MSG: acanthisitta_chloris is not a valid species name (check DB and API
> version)
> FILE: Bio/EnsEMBL/Registry.pm LINE: 1200
> CALLED BY: Production/Pipeline/SpeciesFactory.pm  LINE: 85
> Date (localtime)    = Thu Dec 18 10:25:53 2014
> Ensembl API version = 78
> ---------------------------------------------------
>
> The problem is most likely related to my production database, as I can
> list all the core databases using my registry file and these are present
> for release 78. Can someone suggest a way to check what is wrong and why
> SpeciesFactory does not generate the list of species beekeeper needs?
>
> Thank you.
>
> Best wishes,
> Lel
>
>
>
> ---------------------------------------------
> Lel Eory, PhD
> The Roslin Institute
> University of Edinburgh Easter Bush Campus
> Midlothian EH25 9RG
> Scotland UK
> Phone: +44 131 6519212
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>