[ensembl-dev] Fwd: Variant_effect_predictor Refseq and HGNC annotation and exception
NextGenSeb
nextgenseb at gmail.com
Wed Feb 6 21:16:22 GMT 2013
Thanks for the clarification will,
much appreciated!
Cheers
Seb
On 06/02/13 20:36, Will McLaren wrote:
> Hello Seb,
>
> Apologies if this is a little confusing.
>
> The --refseq flag only applies when you are using the VEP with a
> database (public or local). It asks the VEP to use the
> *_otherfeatures_* database instead of the *_core_* database; this
> contains RefSeq transcripts rather than ENST Ensembl transcripts.
>
> We provide the homo_sapiens_refseq_vep_70.tar.gz so that you can get
> these same annotations in --offline or --cache mode. To use this, you
> do not need to (and should not) use --refseq, you should merely point
> to the directory containing the unpacked cache using --dir.
>
> homo_sapiens_refseq_vep_70.tar.gz has some limitations, one of which
> is that it does not contain HGNC identifiers for genes, hence why
> --hgnc does not work for you in this situation. Unfortunately these
> mappings between RefSeq and HGNC are somewhat hard to retrieve in
> Ensembl due to the way that our Xref (external reference) schema is
> put together. It is possible to retrieve these mappings from other
> resources, for example the UCSC MySQL server, and from there you could
> add this annotation to your VEP output.
>
> Hope this helps!
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
> On 6 February 2013 04:29, NextGenSeb <nextgenseb at gmail.com
> <mailto:nextgenseb at gmail.com>> wrote:
>
>
>
> Dear all,
>
> I recently came across your variant effect predictor, so first of all
> thanks for making that available, great tool!
> I have a few questions however pertaining to the --refseq and
> --hgnc flags.
>
> First of all for the --refseq flag:
>
> After downloading both the homo_sapiens_refseq_vep_70.tar.gz
> homo_sapiens_vep_70.tar.gz caches from your ftp site and putting them
> into /Data/VEP/homo_sapiens_refseq and /Data/VEP/homo_sapiens folders,
> respectively, I tried the following two scenarios:
>
> perl
> /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
> <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose
> --force_overwrite
> --refseq --cache --dir /Data/VEP/ --offline --everything --species
> homo_sapiens
>
> perl
> /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
> <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose
> --force_overwrite
> --refseq --cache --dir /Data/VEP/ --offline --everything --species
> homo_sapiens_refseq
>
> Both commands ran without error, however neither incorporated the
> refseq
> annotation in the output. Only when deleting both caches again and
> unpacking homo_sapiens_refseq_vep_70.tar.gz into
> /Data/VEP/homo_sapiens
> did the refseq NM_ tags get incorporated, but then even with --refseq
> not set. This behavior did not change when removing the --offline or
> the --everything flag.
>
> In principle that would be fine, however the -hgnc flag appears to
> only
> work with the homo_sapiens_vep_70.tar.gz data set and not the refseq
> one, again regardless of whether run in online or offline mode.
>
> Now removing all cache files and running
>
> perl
> /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
> <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose
> --force_overwrite
> --refseq --cache --dir /Data/VEP/ --write_cache --species homo_sapiens
>
> does actually give the correct output. However in this case adding the
> --everything flag not only will put a huge strain on the server
> connection when using bigger datasets then my current test one,
> but also
> throws an exception (see end of email). Hence I would rather work
> of an
> offline cache. Could you please advise me whether there is a way
> to fix
> this issue?
>
> Thanks in advance for your help,
> Cheers
> Seb
>
>
>
> perl
> /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl
> <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose
> --force_overwrite
> --refseq --cache --dir /Data/VEP/test --write_cache --everything
> --species homo_sapiens --host useastdb.ensembl.org
> <http://useastdb.ensembl.org>
>
> #----------------------------------#
> # ENSEMBL VARIANT EFFECT PREDICTOR #
> #----------------------------------#
>
> version 2.8
>
> By Will McLaren (wm2 at ebi.ac.uk <mailto:wm2 at ebi.ac.uk>)
>
> Configuration options:
>
> cache 1
> canonical 1
> ccds 1
> core_type otherfeatures
> dir /Data/VEP/test
> domains 1
> everything 1
> force_overwrite 1
> format vcf
> gmaf 1
> hgnc 1
> hgvs 1
> host useastdb.ensembl.org <http://useastdb.ensembl.org>
> input_file ./test.short.vcf
> numbers 1
> output_file test.vep
> polyphen b
> port 5306
> protein 1
> refseq 1
> regulatory 1
> sift b
> species homo_sapiens
> toplevel_dir /Data/VEP/test
> verbose 1
> write_cache 1
>
> --------------------
>
>
>
> Will only load v70 databases
> Species 'homo_sapiens' loaded from database 'homo_sapiens_core_70_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_70_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_70_37'
> Species 'homo_sapiens' loaded from database
> 'homo_sapiens_otherfeatures_70_37'
> Species 'homo_sapiens' loaded from database
> 'homo_sapiens_rnaseq_70_37'
> homo_sapiens_variation_70_37 loaded
> homo_sapiens_funcgen_70_37 loaded
> Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following
> compara databases will be ignored: ensembl_compara_70
> ensembl_ancestral_70 loaded
> ensembl_ontology_70 loaded
> ensembl_stable_ids_70 loaded
> 2013-02-06 11:21:35 - Connected to core version 70 database and
> variation version 70 database
> 2013-02-06 11:21:35 - INFO: Cache directory
> /Data/VEP/test/homo_sapiens/70 not found - it will be created
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using
> --hgvs
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using
> --sift;
> consider using the complete cache containing sift data (see
> documentation for details)
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using
> --polyphen; consider using the complete cache containing polyphen data
> (see documentation for details)
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using
> --regulatory; consider using the complete cache containing regulatory
> data (see documentation for details)
> 2013-02-06 11:21:40 - Starting...
> 2013-02-06 11:21:42 - Read 62 variants into buffer
> 2013-02-06 11:21:42 - Reading transcript data from cache and/or
> database
> [======================================================================================================]
> [ 100% ]
> 2013-02-06 12:01:12 - Retrieved 409 transcripts (0 mem, 0 cached, 411
> DB, 2 duplicates)
> 2013-02-06 12:01:12 - Reading regulatory data from cache and/or
> database
> [======================================================================================================]
> [ 100% ]
> 2013-02-06 12:05:08 - Retrieved 1738 regulatory features (0 mem, 0
> cached, 1738 DB, 0 duplicates)
> 2013-02-06 12:05:08 - Checking for existing variations
> [======================================================================================================]
> [ 100% ]
> 2013-02-06 12:05:33 - Analyzing chromosome 1
> 2013-02-06 12:05:33 - Analyzing variants
> [======================================================================================================]
> [ 100% ]
> 2013-02-06 12:05:33 - Analyzing RegulatoryFeatures
> [======================================================================================================]
> [ 100% ]
> 2013-02-06 12:05:33 - Analyzing MotifFeatures
> [======================================================================================================]
> [ 100% ]
> 2013-02-06 12:05:33 - Calculating consequences
> [=======> ] [ 9% ]
> -------------------- EXCEPTION --------------------
> MSG: Got to have an Exon object, not a
> STACK Bio::EnsEMBL::Translation::start_Exon
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Translation.pm:271
> STACK Bio::EnsEMBL::Transcript::transfer
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Transcript.pm:2511
> STACK
> Bio::EnsEMBL::Variation::BaseTranscriptVariation::_three_prime_utr
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/BaseTranscriptVariation.pm:638
> STACK
> Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_alternate_cds
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:1169
> STACK
> Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_fs_peptides
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:1090
> STACK
> Bio::EnsEMBL::Variation::TranscriptVariationAllele::_get_hgvs_peptides
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:930
> STACK Bio::EnsEMBL::Variation::TranscriptVariationAllele::hgvs_protein
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm:705
> STACK Bio::EnsEMBL::Variation::Utils::VEP::tva_to_line
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1639
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1472
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1275
> STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences
> /usr/bioinf/source/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1056
> STACK main::main
> /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl:270
> <http://variant_effect_predictor.pl:270>
> STACK toplevel
> /usr/bioinf/source/variant_effect_predictor/variant_effect_predictor.pl:116
> <http://variant_effect_predictor.pl:116>
> Date (localtime) = Wed Feb 6 12:05:33 2013
> Ensembl API version = 70
> ---------------------------------------------------
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130207/ca11ea24/attachment.html>
More information about the Dev
mailing list