[ensembl-dev] Inconsistence output result found between running by cache and remote database
Laurent Gil
lgil at ebi.ac.uk
Fri May 11 16:29:52 BST 2018
Dear Kenneth,
You are right: the VEP cache result is correct.
VEP returns annotations based on the Genomic Reference Assembly sequence.
As you might know, sometimes the RefSeq sequences don't match the
Genomic Reference Assembly sequence, which is the case for NM_000348.3:
NM_000348.3: 121TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA NM_000348.3: 180
121|||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| 180
2:31805920TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA 2:31805861
We started to develop a way to alter the Genomic Reference Assembly
sequence for RefSeq sequence (when the sequences don't match) in order
to provide a more accurate prediction/annotation of variants on RefSeq data.
At the moment this is only available through the VEP cache, so we
strongly recommend to use VEP cache when you use VEP with the RefSeq
dataset.
I hope this makes sense.
Best regards,
Laurent
On 11/05/2018 10:45, Kenneth Wong wrote:
> The scenario is described as below :
> 1. input.vcf :
> ##fileformat=VCFv4.0
> #CHROM POS ID REF ALT QUAL FILTER INFO
> 2 31754395 . C T . . .
>
> 2. Execute below commands with versions 92
>
> /* Run VEP by connecting to remote Ensembl DB */
> ./vep -i input.vcf -o vep-online.out\
> --fork 4 --buffer_size 100\
> --refseq --assembly "GRCh37"\
> --database --port 3337 --db 92\
> --hgvs --numbers --symbol --flag_pick --no_stats\
> --species homo_sapiens\
> --fields $VEP_FIELDS
>
> /* Run VEP via local cache */
> ./vep -i input.vcf -o vep-cache.out\
> --fork 4 --buffer_size 1000\
> --offline --cache --dir_cache $VEP_CACHE_DIR\
> --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
> --hgvs --numbers --symbol --flag_pick --no_stats\
> --species homo_sapiens\
> --fields $VEP_FIELDS\
> --use_given_ref
>
> , where
> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL,
> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position,
> Amino_acids, Codons, HGVSc, HGVSp
>
> $VEP_CACHE_DIR=~/.vep/cache/
>
> 3. Below discrepancies are found between the output file
> "vep-cache.out" and "vep-online.out" (as attached)
>
> vep-cache.out :
> - Amino_acids = R/Q
> - Codons = cGa/cAa
> - HGVSp = NP_000339.2:p.Arg227Gln
>
> vep-online.out :
> - Amino_acids = E/K
> - Codons = Gag/Aag
> - HGVSp = NP_000339.2:p.Glu227Lys
>
> Assume the result in vep-cache.out is correct, would you please advice
> what's wrong in the parameters when running VEP via remote database?
>
> Many thanks for your help!
>
> Kenneth Wong
>
>
>
>
> _______________________________________________
> Dev mailing listDev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180511/8fa65fe6/attachment.html>
More information about the Dev
mailing list