[ensembl-dev] Inconsistence output result found between running by cache and remote database

Kenneth Wong kenneth at l3-bioinfo.com
Sat May 12 01:26:44 BST 2018


Dear Laurent,

Thanks a lot for your prompt and detailed answer! It makes sense to fix
such a mismatch through VEP cache.

Thanks again for yr great help!

Best Rgds,
Kenneth


在 2018年5月11日週五 23:29,Laurent Gil <lgil at ebi.ac.uk> 寫道:

> Dear Kenneth,
>
> You are right: the VEP cache result is correct.
>
> VEP returns annotations based on the Genomic Reference Assembly sequence.
>
> As you might know, sometimes the RefSeq sequences don't match the Genomic
> Reference Assembly sequence, which is the case for NM_000348.3:
>
> NM_000348.3:     121 TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA NM_000348.3:     180
>                  121 |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||                  180
>           2:31805920 TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA           2:31805861
>
>
> We started to develop a way to alter the Genomic Reference Assembly
> sequence for RefSeq sequence (when the sequences don't match) in order to
> provide a more accurate prediction/annotation of variants on RefSeq data.
>
> At the moment this is only available through the VEP cache, so we strongly
> recommend to use VEP cache when you use VEP with the RefSeq dataset.
>
> I hope this makes sense.
>
> Best regards,
>
> Laurent
>
> On 11/05/2018 10:45, Kenneth Wong wrote:
>
> The scenario is described as below :
> 1. input.vcf :
>    ##fileformat=VCFv4.0
>    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>    2       31754395        .       C       T       .       .       .
>
> 2. Execute below commands with versions 92
>
>  /* Run VEP by connecting to remote Ensembl DB */
>   ./vep -i  input.vcf -o vep-online.out\
>   --fork 4 --buffer_size 100\
>   --refseq --assembly "GRCh37"\
>   --database --port 3337 --db 92\
>   --hgvs --numbers --symbol --flag_pick --no_stats\
>   --species homo_sapiens\
>   --fields $VEP_FIELDS
>
>  /* Run VEP via local cache */
>   ./vep -i input.vcf -o vep-cache.out\
>   --fork 4 --buffer_size 1000\
>   --offline --cache --dir_cache $VEP_CACHE_DIR\
>   --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>   --hgvs --numbers --symbol --flag_pick --no_stats\
>   --species homo_sapiens\
>   --fields $VEP_FIELDS\
>   --use_given_ref
>
> , where
> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL,
> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position,
> Amino_acids, Codons, HGVSc, HGVSp
>
> $VEP_CACHE_DIR=~/.vep/cache/
>
> 3. Below discrepancies are found between the output file "vep-cache.out"
> and "vep-online.out" (as attached)
>
> vep-cache.out :
> - Amino_acids = R/Q
> - Codons = cGa/cAa
> - HGVSp = NP_000339.2:p.Arg227Gln
>
> vep-online.out :
> - Amino_acids = E/K
> - Codons = Gag/Aag
> - HGVSp = NP_000339.2:p.Glu227Lys
>
> Assume the result in vep-cache.out is correct, would you please advice
> what's wrong in the parameters when running VEP via remote database?
>
> Many thanks for your help!
>
> Kenneth Wong
>
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180512/610fd1e1/attachment.html>


More information about the Dev mailing list