[ensembl-dev] Fwd: Variant_effect_predictor Refseq and HGNC annotation and exception

Will McLaren wm2 at ebi.ac.uk
Wed Feb 6 09:36:55 GMT 2013


Hello Seb,

Apologies if this is a little confusing.

The --refseq flag only applies when you are using the VEP with a database
(public or local). It asks the VEP to use the *_otherfeatures_* database
instead of the *_core_* database; this contains RefSeq transcripts rather
than ENST Ensembl transcripts.

We provide the homo_sapiens_refseq_vep_70.**tar.gz so that you can get
these same annotations in --offline or --cache mode. To use this, you do
not need to (and should not) use --refseq, you should merely point to the
directory containing the unpacked cache using --dir.

homo_sapiens_refseq_vep_70.**tar.gz has some limitations, one of which is
that it does not contain HGNC identifiers for genes, hence why --hgnc does
not work for you in this situation. Unfortunately these mappings between
RefSeq and HGNC are somewhat hard to retrieve in Ensembl due to the way
that our Xref (external reference) schema is put together. It is possible
to retrieve these mappings from other resources, for example the UCSC MySQL
server, and from there you could add this annotation to your VEP output.

Hope this helps!

Regards

Will McLaren
Ensembl Variation


On 6 February 2013 04:29, NextGenSeb <nextgenseb at gmail.com> wrote:

>
>
> Dear all,
>
> I recently came across your variant effect predictor, so first of all
> thanks for making that available, great tool!
> I have a few questions however pertaining to the --refseq and --hgnc flags.
>
> First of all for the --refseq flag:
>
> After downloading both the  homo_sapiens_refseq_vep_70.**tar.gz
> homo_sapiens_vep_70.tar.gz caches from your ftp site and putting them
> into /Data/VEP/homo_sapiens_refseq and /Data/VEP/homo_sapiens folders,
> respectively, I tried the following two scenarios:
>
> perl
> /usr/bioinf/source/variant_**effect_predictor/variant_**
> effect_predictor.pl <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose --force_overwrite
> --refseq --cache --dir /Data/VEP/ --offline --everything --species
> homo_sapiens
>
> perl
> /usr/bioinf/source/variant_**effect_predictor/variant_**
> effect_predictor.pl <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose --force_overwrite
> --refseq --cache --dir /Data/VEP/ --offline --everything --species
> homo_sapiens_refseq
>
> Both commands ran without error, however neither incorporated the refseq
> annotation in the output. Only when deleting both caches again and
> unpacking homo_sapiens_refseq_vep_70.**tar.gz into /Data/VEP/homo_sapiens
> did the refseq NM_ tags get incorporated, but then even with --refseq
> not set. This behavior did not change  when removing the --offline or
> the --everything flag.
>
> In principle that would be fine, however the -hgnc flag appears to only
> work with the homo_sapiens_vep_70.tar.gz data set and not the refseq
> one, again regardless of whether run in online or offline mode.
>
> Now removing all cache files and running
>
> perl
> /usr/bioinf/source/variant_**effect_predictor/variant_**
> effect_predictor.pl <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose --force_overwrite
> --refseq --cache --dir /Data/VEP/ --write_cache --species homo_sapiens
>
> does actually give the correct output. However in this case adding the
> --everything flag not only will put a huge strain on the server
> connection when using bigger datasets then my current test one, but also
> throws an exception (see end of email). Hence I would rather work of an
> offline cache. Could you please advise me whether there is a way to fix
> this issue?
>
> Thanks in advance for your help,
> Cheers
> Seb
>
>
>
> perl
> /usr/bioinf/source/variant_**effect_predictor/variant_**
> effect_predictor.pl <http://variant_effect_predictor.pl>
> -i ./test.short.vcf --format vcf -o test.vep --verbose --force_overwrite
> --refseq --cache --dir /Data/VEP/test --write_cache --everything
> --species homo_sapiens --host useastdb.ensembl.org
>
> #-----------------------------**-----#
> # ENSEMBL VARIANT EFFECT PREDICTOR #
> #-----------------------------**-----#
>
> version 2.8
>
> By Will McLaren (wm2 at ebi.ac.uk)
>
> Configuration options:
>
> cache              1
> canonical          1
> ccds               1
> core_type          otherfeatures
> dir                /Data/VEP/test
> domains            1
> everything         1
> force_overwrite    1
> format             vcf
> gmaf               1
> hgnc               1
> hgvs               1
> host               useastdb.ensembl.org
> input_file         ./test.short.vcf
> numbers            1
> output_file        test.vep
> polyphen           b
> port               5306
> protein            1
> refseq             1
> regulatory         1
> sift               b
> species            homo_sapiens
> toplevel_dir       /Data/VEP/test
> verbose            1
> write_cache        1
>
> --------------------
>
>
>
> Will only load v70 databases
> Species 'homo_sapiens' loaded from database 'homo_sapiens_core_70_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_70_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_70_37'
> Species 'homo_sapiens' loaded from database
> 'homo_sapiens_otherfeatures_**70_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_rnaseq_70_37'
> homo_sapiens_variation_70_37 loaded
> homo_sapiens_funcgen_70_37 loaded
> Bio::EnsEMBL::Compara::DBSQL::**DBAdaptor not found so the following
> compara databases will be ignored: ensembl_compara_70
> ensembl_ancestral_70 loaded
> ensembl_ontology_70 loaded
> ensembl_stable_ids_70 loaded
> 2013-02-06 11:21:35 - Connected to core version 70 database and
> variation version 70 database
> 2013-02-06 11:21:35 - INFO: Cache directory
> /Data/VEP/test/homo_sapiens/70 not found - it will be created
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using --hgvs
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using --sift;
> consider using the complete cache containing sift data (see
> documentation for details)
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using
> --polyphen; consider using the complete cache containing polyphen data
> (see documentation for details)
> 2013-02-06 11:21:40 - INFO: Database will be accessed when using
> --regulatory; consider using the complete cache containing regulatory
> data (see documentation for details)
> 2013-02-06 11:21:40 - Starting...
> 2013-02-06 11:21:42 - Read 62 variants into buffer
> 2013-02-06 11:21:42 - Reading transcript data from cache and/or database
> [=============================**==============================**
> ==============================**=============]
> [ 100% ]
> 2013-02-06 12:01:12 - Retrieved 409 transcripts (0 mem, 0 cached, 411
> DB, 2 duplicates)
> 2013-02-06 12:01:12 - Reading regulatory data from cache and/or database
> [=============================**==============================**
> ==============================**=============]
> [ 100% ]
> 2013-02-06 12:05:08 - Retrieved 1738 regulatory features (0 mem, 0
> cached, 1738 DB, 0 duplicates)
> 2013-02-06 12:05:08 - Checking for existing variations
> [=============================**==============================**
> ==============================**=============]
> [ 100% ]
> 2013-02-06 12:05:33 - Analyzing chromosome 1
> 2013-02-06 12:05:33 - Analyzing variants
> [=============================**==============================**
> ==============================**=============]
> [ 100% ]
> 2013-02-06 12:05:33 - Analyzing RegulatoryFeatures
> [=============================**==============================**
> ==============================**=============]
> [ 100% ]
> 2013-02-06 12:05:33 - Analyzing MotifFeatures
> [=============================**==============================**
> ==============================**=============]
> [ 100% ]
> 2013-02-06 12:05:33 - Calculating consequences
> [=======> ]    [ 9% ]
> -------------------- EXCEPTION --------------------
> MSG: Got to have an Exon object, not a
> STACK Bio::EnsEMBL::Translation::**start_Exon
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**
> Translation.pm:271
> STACK Bio::EnsEMBL::Transcript::**transfer
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**
> Transcript.pm:2511
> STACK Bio::EnsEMBL::Variation::**BaseTranscriptVariation::_**
> three_prime_utr
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**Variation/**
> BaseTranscriptVariation.pm:638
> STACK
> Bio::EnsEMBL::Variation::**TranscriptVariationAllele::_**get_alternate_cds
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**Variation/**
> TranscriptVariationAllele.pm:**1169
> STACK
> Bio::EnsEMBL::Variation::**TranscriptVariationAllele::_**get_fs_peptides
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**Variation/**
> TranscriptVariationAllele.pm:**1090
> STACK
> Bio::EnsEMBL::Variation::**TranscriptVariationAllele::_**get_hgvs_peptides
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**Variation/**
> TranscriptVariationAllele.pm:**930
> STACK Bio::EnsEMBL::Variation::**TranscriptVariationAllele::**hgvs_protein
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**Variation/**
> TranscriptVariationAllele.pm:**705
> STACK Bio::EnsEMBL::Variation::**Utils::VEP::tva_to_line
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**
> Variation/Utils/VEP.pm:1639
> STACK Bio::EnsEMBL::Variation::**Utils::VEP::vf_to_consequences
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**
> Variation/Utils/VEP.pm:1472
> STACK Bio::EnsEMBL::Variation::**Utils::VEP::vf_list_to_cons
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**
> Variation/Utils/VEP.pm:1275
> STACK Bio::EnsEMBL::Variation::**Utils::VEP::get_all_**consequences
> /usr/bioinf/source/variant_**effect_predictor/Bio/EnsEMBL/**
> Variation/Utils/VEP.pm:1056
> STACK main::main
> /usr/bioinf/source/variant_**effect_predictor/variant_**
> effect_predictor.pl:270 <http://variant_effect_predictor.pl:270>
> STACK toplevel
> /usr/bioinf/source/variant_**effect_predictor/variant_**
> effect_predictor.pl:116 <http://variant_effect_predictor.pl:116>
> Date (localtime)    = Wed Feb  6 12:05:33 2013
> Ensembl API version = 70
> ------------------------------**---------------------
>
>
>
>
> ______________________________**_________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130206/02d9f917/attachment.html>


More information about the Dev mailing list