[ensembl-dev] empty SpeciesFactory in FASTA pipeline

Anne Lyle annelyle at ebi.ac.uk
Thu Dec 18 16:22:10 GMT 2014


Hi Lel

The changelog and changelog_tables are populated “manually” by our developers via a web interface - they’re mainly needed for the website news, though we’ve hooked other processes into them to avoid duplication of effort. As Magali says, you should run your pipeline to ignore the changelog tables.

Cheers

Anne


On 18 Dec 2014, at 16:12, Lel Eory <lel.eory at roslin.ed.ac.uk> wrote:

> Hi Mag,
> 
> Thank you for the detailed reply, I much appreciate it. From the query it is clear that I do not have changelog and changelog_species populated with information relevant to my species although the species table is populated.
> Do you run a script to populate these tables?
> If not I will look into the ensembl_production db and figure out what information do I nee to add.
> 
> Thanks again,
> Lel
> 
> 
> On 12/18/2014 11:11 AM, mr6 at ebi.ac.uk wrote:
>> Hi Lel,
>> 
>> This pipeline relies heavily on the production database.
>> 
>> As we only want to dump fasta files for species which have changed for the
>> release, the ScheduleSpecies module checks in the production database if
>> there have been any changes declared for this species, for this release.
>> If there is no declaration, the species is skipped.
>> 
>> To circumvent this, you should be able to use the -run_all option, as this
>> is used to tell the pipeline to ignore the declarations and just run
>> everything. It needs an argument though, so you would need to add -run_all
>> 1 to your init_pipeline command line
>> 
>> The -species argument is meant to filter out a species or list of species.
>> So if you specify -species homo_sapiens, it will only take into account
>> human when deciding whether or not it should dump data. This will still
>> check in the production database if there is anything to dump for that
>> species though.
>> 
>> The -force_species argument will allow you to skip the production database
>> check for a species or list of species.
>> 
>> So running the following command line should work for you without needing
>> to add anything in the production database
>> init_pipeline.pl
>> Bio::EnsEMBL::Production::Pipeline::PipeConfig::FASTA_conf -user
>> write_user -password ******  -host=my_local_host -no_scp 1 -base_path
>> 
>> 
>> If you want to try to get the expected data in the production database,
>> this is the check that decides whether a species needs data dumping or
>> not:
>> my_base_path -registry my_registry.conf -run_all 1
>> select count(*)
>> from changelog c
>> join changelog_species cs using (changelog_id)
>> join species s using (species_id)
>> where c.release_id = 78
>> and (c.assembly = 'Y' or c.repeat_masking = 'Y')
>> and c.status = 'handed_over'
>> and s.production_name = 'species_name'
>> 
>> 
>> Let me know if that helps,
>> mag
>> 
>>> Hello Developers,
>>> 
>>> I try to run the FASTA pipeline per the document
>>> https://github.com/Ensembl/ensembl-production/blob/release/78/docs/fasta.textile
>>> My registry file is set-up as suggested.
>>> I run init_pipeline either with -run_all or with -species defined, but
>>> then beekeeper skips the analyses as no species defined for the pipeline.
>>> 
>>>  From beekeeper:
>>> Scheduler : Discarded 17 analyses because they do not need any Workers.
>>> 
>>> ....
>>> Worker 1 [ UNSPECIALIZED ] specializing to ScheduleSpecies(1)
>>> 
>>> -------------------- WARNING ----------------------
>>> MSG: acanthisitta_chloris is not a valid species name (check DB and API
>>> version)
>>> FILE: Bio/EnsEMBL/Registry.pm LINE: 1200
>>> CALLED BY: Production/Pipeline/SpeciesFactory.pm  LINE: 85
>>> Date (localtime)    = Thu Dec 18 10:25:53 2014
>>> Ensembl API version = 78
>>> ---------------------------------------------------
>>> 
>>> The problem is most likely related to my production database, as I can
>>> list all the core databases using my registry file and these are present
>>> for release 78. Can someone suggest a way to check what is wrong and why
>>> SpeciesFactory does not generate the list of species beekeeper needs?
>>> 
>>> Thank you.
>>> 
>>> Best wishes,
>>> Lel
>>> 
>>> 
>>> 
>>> ---------------------------------------------
>>> Lel Eory, PhD
>>> The Roslin Institute
>>> University of Edinburgh Easter Bush Campus
>>> Midlothian EH25 9RG
>>> Scotland UK
>>> Phone: +44 131 6519212
>>> 
>>> 
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> ---------------------------------------------
> Lel Eory, PhD
> The Roslin Institute
> University of Edinburgh Easter Bush Campus
> Midlothian EH25 9RG
> Scotland UK
> Phone: +44 131 6519212
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list