[ensembl-dev] Inconsistence output result found between running by cache and remote database
Kenneth Wong
kenneth at l3-bioinfo.com
Sat May 12 01:26:44 BST 2018
Dear Laurent,
Thanks a lot for your prompt and detailed answer! It makes sense to fix
such a mismatch through VEP cache.
Thanks again for yr great help!
Best Rgds,
Kenneth
在 2018年5月11日週五 23:29,Laurent Gil <lgil at ebi.ac.uk> 寫道:
> Dear Kenneth,
>
> You are right: the VEP cache result is correct.
>
> VEP returns annotations based on the Genomic Reference Assembly sequence.
>
> As you might know, sometimes the RefSeq sequences don't match the Genomic
> Reference Assembly sequence, which is the case for NM_000348.3:
>
> NM_000348.3: 121 TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA NM_000348.3: 180
> 121 |||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| 180
> 2:31805920 TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA 2:31805861
>
>
> We started to develop a way to alter the Genomic Reference Assembly
> sequence for RefSeq sequence (when the sequences don't match) in order to
> provide a more accurate prediction/annotation of variants on RefSeq data.
>
> At the moment this is only available through the VEP cache, so we strongly
> recommend to use VEP cache when you use VEP with the RefSeq dataset.
>
> I hope this makes sense.
>
> Best regards,
>
> Laurent
>
> On 11/05/2018 10:45, Kenneth Wong wrote:
>
> The scenario is described as below :
> 1. input.vcf :
> ##fileformat=VCFv4.0
> #CHROM POS ID REF ALT QUAL FILTER INFO
> 2 31754395 . C T . . .
>
> 2. Execute below commands with versions 92
>
> /* Run VEP by connecting to remote Ensembl DB */
> ./vep -i input.vcf -o vep-online.out\
> --fork 4 --buffer_size 100\
> --refseq --assembly "GRCh37"\
> --database --port 3337 --db 92\
> --hgvs --numbers --symbol --flag_pick --no_stats\
> --species homo_sapiens\
> --fields $VEP_FIELDS
>
> /* Run VEP via local cache */
> ./vep -i input.vcf -o vep-cache.out\
> --fork 4 --buffer_size 1000\
> --offline --cache --dir_cache $VEP_CACHE_DIR\
> --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
> --hgvs --numbers --symbol --flag_pick --no_stats\
> --species homo_sapiens\
> --fields $VEP_FIELDS\
> --use_given_ref
>
> , where
> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL,
> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position,
> Amino_acids, Codons, HGVSc, HGVSp
>
> $VEP_CACHE_DIR=~/.vep/cache/
>
> 3. Below discrepancies are found between the output file "vep-cache.out"
> and "vep-online.out" (as attached)
>
> vep-cache.out :
> - Amino_acids = R/Q
> - Codons = cGa/cAa
> - HGVSp = NP_000339.2:p.Arg227Gln
>
> vep-online.out :
> - Amino_acids = E/K
> - Codons = Gag/Aag
> - HGVSp = NP_000339.2:p.Glu227Lys
>
> Assume the result in vep-cache.out is correct, would you please advice
> what's wrong in the parameters when running VEP via remote database?
>
> Many thanks for your help!
>
> Kenneth Wong
>
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180512/610fd1e1/attachment.html>
More information about the Dev
mailing list