[ensembl-dev] Ensembl Rest Follow up

Andrew Parton aparton at ebi.ac.uk
Wed Aug 26 12:56:08 BST 2020


Hi Sage,

VEP REST is configured to require a cache file for human data - see ensembl-rest/lib/EnsEMBL/REST/Controller/VEP.pm:360. Due to the size of our human variation data (as you’re discovering with how long it’s taking to load the GRCh37 data into your local database), we recommend using a cache file when accessing human data through REST, as this provides a significant speed increase when accessing multiple variants.

Can I ask, why are you looking to set REST up with a local database connection rather than with a cache file?

If you still want to run REST VEP with a human data and a database connection, you can comment out the line:

 $vep_params{database} = 0;

At ensembl-rest/lib/EnsEMBL/REST/Controller/VEP.pm:366 - I’ve had a quick test and it seems to work (albeit slowly), although there may be unintended consequences of making this change.

Kind Regards,
Andrew

> On 19 Aug 2020, at 22:03, Sage Hornung <sage.hornung at neogenomics.com> wrote:
> 
> Hi, 
>  
> Thank you for the suggestion on modifying the <Controller::VEP> section. This got me to a different error but I guess I should have fully explained what I am trying to do.
> I am currently loading a MySQL database with all the homo_sapien GRCh37 data (Almost done this is taking  a while.  The big tables have taken a few days  each)
> I want to point the REST instance  I am currently  setting  up to our instance of the CRCh37 database once its ready.
>  
> So while I am waiting for the data to load I would like to set up the ensemble-rest as close as possible to the final configuration.
> This would mean calling the database instead of using the cache files.
> It would be great if I could make a couple small calls to the ensemble DB using the rest interface
>  
> It seems as if I can add any configurations to ensembl_rest.conf found here  https://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html  to the section of the <Controller::VEP> and they should be passed  to vep 
>  
> There is also a section at the top of the ensembl_rest.conf
>  
> So have changed these sections a bit I added the database flag to <Controller::VEP> and tried adding the port and host to <Controller::VEP> but it is still not working
>  
> Model::Registry>
>   ###### Database settings. Use if you want to connect to a single database instance. Common options are given below
>   host = ensembldb.ensembl.org
>   port = 3337
>   user = anonymous
> 
>   version = 100
>   verbose = 1
> 
>  
>  
> <Controller::VEP>
>   fasta             = Homo_sapiens.GRCh37.75.dna.toplevel.fa # path to Human toplevel fasta file
> # Default parameters for running vep
>   cache_region_size = 1000000
>   chunk_size        = 50000
>   whole_genome      = 1
>   compress          = gzip -dc
>   terms             = SO
>   cache             = 0
>   #merged            = 0 
>   failed            = 0
>   core_type         = core
>   quiet             = 1
>   sift              = b
>   polyphen          = b
>   symbol            = 1
>   regulatory        = 1
>   biotype           = 1
>   rest              = 1
>   check_existing    = 1 # adds some performance penalty, mitigated by tabix-converting the cache (http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#convert)
>   fork              = 3
>   max_post_size     = 1000
>   warning_file      = STDERR # controls VEP logging, not Catalyst
>   plugin_config     = # path to plugin config
>   dir_plugins       = # path to VEP_plugins checkout
>   #dir               = /home/ensembl/.vep/homo_sapiens/100_GRCh37
>   database = 1
>   #host              = ensembldb.ensembl.org
>   #port              = 3337
>   #assebbly          = GRCh37
>   #user              = anonymous
> </Controller::VEP>
> 
>  
> I still get this error 
>  
> The VEP can read gene data from either a local cache or local/remote databases.
> 
> Using a cache is the fastest and most efficient way to use the VEP. The
> included INSTALL.pl script can be used to fetch and set up cache files from the
> Ensembl FTP server. Simply run "perl INSTALL.pl" and follow the instructions, or
> see the documentation pages listed below.
> 
> If you have already set up a cache, use "--cache" or "--offline" to use it.
> 
> It is possible to use the public databases hosted at ensembldb.ensembl.org, but
> this is slower than using the cache and concurrent and/or long running VEP jobs
> can put strain on the Ensembl servers, limiting availability to other users.
> 
> To enable using databases, add the flag "--database".
> 
> Documentation
> Installer: http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer
> Cache: http://www.ensembl.org/info/docs/tools/vep/script/index.html#cache
> 
>      at /home/ensembl/ensembl-api-folder/ensembl-vep/modules/Bio/EnsEMBL/VEP/Config.pm line 686.
>  
> 2020/08/19 13:40:22 (6) Serialize.pm 51> Serializing with Catalyst::Action::Serialize::JSON::XS 
> 2020/08/19 13:40:22 (1) Catalyst.pm 2726> Response Code: 400; Content-Type: application/json; Content-Length: unknown
>  
>  
> Thank you for the assistance
> Sage
>  
> Sage Hornung
> Software Engineer 
> 
> NeoGenomics Laboratories, Inc.
> 2131 Faraday Avenue, Carlsbad, CA 92008
> Phone: 760.516.5114
> Cell: 760.755.3930
> sage.hornung at neogenomics.com
> neogenomics.com
> 
> <image001.png>
> 
> <image002.png>   <image003.png>   <image004.png>
>  
>  
> This communication and its attachments contain confidential information and is intended only for the named addressee. If you are not the named addressee you should not disseminate, distribute or copy this communication. Please notify the sender immediately if you have received this communication by mistake and delete or destroy this communication. Communications cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this communication which arise as a result of transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2020) _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list