[ensembl-dev] Fwd: Variant_effect_predictor Refseq and HGNC annotation and exception

NextGenSeb nextgenseb at gmail.com
Wed Feb 6 21:16:22 GMT 2013


Thanks for the clarification will,
much appreciated!
Cheers
Seb



On 06/02/13 20:36, Will McLaren wrote:
> Hello Seb,
>
> Apologies if this is a little confusing.
>
> The --refseq flag only applies when you are using the VEP with a 
> database (public or local). It asks the VEP to use the 
> *_otherfeatures_* database instead of the *_core_* database; this 
> contains RefSeq transcripts rather than ENST Ensembl transcripts.
>
> We provide the homo_sapiens_refseq_vep_70.tar.gz so that you can get 
> these same annotations in --offline or --cache mode. To use this, you 
> do not need to (and should not) use --refseq, you should merely point 
> to the directory containing the unpacked cache using --dir.
>
> homo_sapiens_refseq_vep_70.tar.gz has some limitations, one of which 
> is that it does not contain HGNC identifiers for genes, hence why 
> --hgnc does not work for you in this situation. Unfortunately these 
> mappings between RefSeq and HGNC are somewhat hard to retrieve in 
> Ensembl due to the way that our Xref (external reference) schema is 
> put together. It is possible to retrieve these mappings from other 
> resources, for example the UCSC MySQL server, and from there you could 
> add this annotation to your VEP output.
>
> Hope this helps!
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
> On 6 February 2013 04:29, NextGenSeb <nextgenseb at gmail.com 
> <mailto:nextgenseb at gmail.com>> wrote:
>
>
>
>     Dear all,
>
>     I recently came across your variant effect predictor, so first of all
>     thanks for making that available, great tool!
>     I have a few questions however pertaining to the --refseq and
>     --hgnc flags.
>
>     First of all for the --refseq flag:
>
>     After downloading both the  homo_sapiens_refseq_vep_70.tar.gz
>     homo_sapiens_vep_70.tar.gz caches from your ftp site and putting them
>     into /Data/VEP/homo_sapiens_refseq and /Data/VEP/homo_sapiens folders,
>     respectively, I tried the following two scenarios:
>
>     perl
>     /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
>     <http://variant_effect_predictor.pl>
>     -i ./test.short.vcf --format vcf -o test.vep --verbose
>     --force_overwrite
>     --refseq --cache --dir /Data/VEP/ --offline --everything --species
>     homo_sapiens
>
>     perl
>     /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
>     <http://variant_effect_predictor.pl>
>     -i ./test.short.vcf --format vcf -o test.vep --verbose
>     --force_overwrite
>     --refseq --cache --dir /Data/VEP/ --offline --everything --species
>     homo_sapiens_refseq
>
>     Both commands ran without error, however neither incorporated the
>     refseq
>     annotation in the output. Only when deleting both caches again and
>     unpacking homo_sapiens_refseq_vep_70.tar.gz into
>     /Data/VEP/homo_sapiens
>     did the refseq NM_ tags get incorporated, but then even with --refseq
>     not set. This behavior did not change  when removing the --offline or
>     the --everything flag.
>
>     In principle that would be fine, however the -hgnc flag appears to
>     only
>     work with the homo_sapiens_vep_70.tar.gz data set and not the refseq
>     one, again regardless of whether run in online or offline mode.
>
>     Now removing all cache files and running
>
>     perl
>     /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
>     <http://variant_effect_predictor.pl>
>     -i ./test.short.vcf --format vcf -o test.vep --verbose
>     --force_overwrite
>     --refseq --cache --dir /Data/VEP/ --write_cache --species homo_sapiens
>
>     does actually give the correct output. However in this case adding the
>     --everything flag not only will put a huge strain on the server
>     connection when using bigger datasets then my current test one,
>     but also
>     throws an exception (see end of email). Hence I would rather work
>     of an
>     offline cache. Could you please advise me whether there is a way
>     to fix
>     this issue?
>
>     Thanks in advance for your help,
>     Cheers
>     Seb
>
>
>
>     perl
>     /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
>     <http://variant_effect_predictor.pl>
>     -i ./test.short.vcf --format vcf -o test.vep --verbose
>     --force_overwrite
>     --refseq --cache --dir /Data/VEP/test --write_cache --everything
>     --species homo_sapiens --host useastdb.ensembl.org
>     <http://useastdb.ensembl.org>
>
>     #----------------------------------#
>     # ENSEMBL VARIANT EFFECT PREDICTOR #
>     #----------------------------------#
>
>     version 2.8
>
>     By Will McLaren (wm2 at ebi.ac.uk <mailto:wm2 at ebi.ac.uk>)
>
>     Configuration options:
>
>     cache              1
>     canonical          1
>     ccds               1
>     core_type          otherfeatures
>     dir                /Data/VEP/test
>     domains            1
>     everything         1
>     force_overwrite    1
>     format             vcf
>     gmaf               1
>     hgnc               1
>     hgvs               1
>     host useastdb.ensembl.org <http://useastdb.ensembl.org>
>     input_file         ./test.short.vcf
>     numbers            1
>     output_file        test.vep
>     polyphen           b
>     port               5306
>     protein            1
>     refseq             1
>     regulatory         1
>     sift               b
>     species            homo_sapiens
>     toplevel_dir       /Data/VEP/test
>     verbose            1
>     write_cache        1
>
>     --------------------
>
>
>
>     Will only load v70 databases
>     Species 'homo_sapiens' loaded from database 'homo_sapiens_core_70_37'
>     Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_70_37'
>     Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_70_37'
>     Species 'homo_sapiens' loaded from database
>     'homo_sapiens_otherfeatures_70_37'
>     Species 'homo_sapiens' loaded from database
>     'homo_sapiens_rnaseq_70_37'
>     homo_sapiens_variation_70_37 loaded
>     homo_sapiens_funcgen_70_37 loaded
>     Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following
>     compara databases will be ignored: ensembl_compara_70
>     ensembl_ancestral_70 loaded
>     ensembl_ontology_70 loaded
>     ensembl_stable_ids_70 loaded
>     2013-02-06 11:21:35 - Connected to core version 70 database and
>     variation version 70 database
>     2013-02-06 11:21:35 - INFO: Cache directory
>     /Data/VEP/test/homo_sapiens/70 not found - it will be created
>     2013-02-06 11:21:40 - INFO: Database will be accessed when using
>     --hgvs
>     2013-02-06 11:21:40 - INFO: Database will be accessed when using
>     --sift;
>     consider using the complete cache containing sift data (see
>     documentation for details)
>     2013-02-06 11:21:40 - INFO: Database will be accessed when using
>     --polyphen; consider using the complete cache containing polyphen data
>     (see documentation for details)
>     2013-02-06 11:21:40 - INFO: Database will be accessed when using
>     --regulatory; consider using the complete cache containing regulatory
>     data (see documentation for details)
>     2013-02-06 11:21:40 - Starting...
>     2013-02-06 11:21:42 - Read 62 variants into buffer
>     2013-02-06 11:21:42 - Reading transcript data from cache and/or
>     database
>     [======================================================================================================]
>     [ 100% ]
>     2013-02-06 12:01:12 - Retrieved 409 transcripts (0 mem, 0 cached, 411
>     DB, 2 duplicates)
>     2013-02-06 12:01:12 - Reading regulatory data from cache and/or
>     database
>     [======================================================================================================]
>     [ 100% ]
>     2013-02-06 12:05:08 - Retrieved 1738 regulatory features (0 mem, 0
>     cached, 1738 DB, 0 duplicates)
>     2013-02-06 12:05:08 - Checking for existing variations
>     [======================================================================================================]
>     [ 100% ]
>     2013-02-06 12:05:33 - Analyzing chromosome 1
>     2013-02-06 12:05:33 - Analyzing variants
>     [======================================================================================================]
>     [ 100% ]
>     2013-02-06 12:05:33 - Analyzing RegulatoryFeatures
>     [======================================================================================================]
>     [ 100% ]
>     2013-02-06 12:05:33 - Analyzing MotifFeatures
>     [======================================================================================================]
>     [ 100% ]
>     2013-02-06 12:05:33 - Calculating consequences
>     [=======> ]    [ 9% ]
>     -------------------- EXCEPTION --------------------
>     MSG: Got to have an Exon object, not a
>     STACK Bio::EnsEMBL::Translation::start_Exon
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Translation.pm:271
>     STACK Bio::EnsEMBL::Transcript::transfer
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Transcript.pm:2511
>     STACK
>     Bio::EnsEMBL::Variation::BaseTranscriptVariation::_three_prime_utr
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/BaseTranscriptVariation.pm:638
>     STACK
>     Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_alternate_cds
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:1169
>     STACK
>     Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_fs_peptides
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:1090
>     STACK
>     Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_hgvs_peptides
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:930
>     STACK Bio::EnsEMBL::Variation::TranscriptVariationAllele::hgvs_protein
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:705
>     STACK Bio::EnsEMBL::Variation::Utils::VEP::tva_to_line
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1639
>     STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1472
>     STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1275
>     STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences
>     /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1056
>     STACK main::main
>     /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl:270
>     <http://variant_effect_predictor.pl:270>
>     STACK toplevel
>     /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl:116
>     <http://variant_effect_predictor.pl:116>
>     Date (localtime)    = Wed Feb  6 12:05:33 2013
>     Ensembl API version = 70
>     ---------------------------------------------------
>
>
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130207/ca11ea24/attachment.html>


More information about the Dev mailing list