[ensembl-dev] Inconsistence output result found between running by cache and remote database

Laurent Gil lgil at ebi.ac.uk
Fri May 11 16:29:52 BST 2018


Dear Kenneth,

You are right: the VEP cache result is correct.

VEP returns annotations based on the Genomic Reference Assembly sequence.

As you might know, sometimes the RefSeq sequences don't match the 
Genomic Reference Assembly sequence, which is the case for NM_000348.3:

NM_000348.3:     121TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA  NM_000348.3:     180
                  121|||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||                   180
           2:31805920TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA            2:31805861


We started to develop a way to alter the Genomic Reference Assembly 
sequence for RefSeq sequence (when the sequences don't match) in order 
to provide a more accurate prediction/annotation of variants on RefSeq data.

At the moment this is only available through the VEP cache, so we 
strongly recommend to use VEP cache when you use VEP with the RefSeq 
dataset.

I hope this makes sense.

Best regards,

Laurent

On 11/05/2018 10:45, Kenneth Wong wrote:
> The scenario is described as below :
> 1. input.vcf :
>    ##fileformat=VCFv4.0
>    #CHROM  POS     ID      REF     ALT     QUAL    FILTER INFO
>    2       31754395        .       C       T       .  .       .
>
> 2. Execute below commands with versions 92
>
>  /* Run VEP by connecting to remote Ensembl DB */
>   ./vep -i  input.vcf -o vep-online.out\
>   --fork 4 --buffer_size 100\
>   --refseq --assembly "GRCh37"\
>   --database --port 3337 --db 92\
>   --hgvs --numbers --symbol --flag_pick --no_stats\
>   --species homo_sapiens\
>   --fields $VEP_FIELDS
>
>  /* Run VEP via local cache */
>   ./vep -i input.vcf -o vep-cache.out\
>   --fork 4 --buffer_size 1000\
>   --offline --cache --dir_cache $VEP_CACHE_DIR\
>   --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>   --hgvs --numbers --symbol --flag_pick --no_stats\
>   --species homo_sapiens\
>   --fields $VEP_FIELDS\
>   --use_given_ref
>
> , where
> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL, 
> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position, 
> Amino_acids, Codons, HGVSc, HGVSp
>
> $VEP_CACHE_DIR=~/.vep/cache/
>
> 3. Below discrepancies are found between the output file 
> "vep-cache.out" and "vep-online.out" (as attached)
>
> vep-cache.out :
> - Amino_acids = R/Q
> - Codons = cGa/cAa
> - HGVSp = NP_000339.2:p.Arg227Gln
>
> vep-online.out :
> - Amino_acids = E/K
> - Codons = Gag/Aag
> - HGVSp = NP_000339.2:p.Glu227Lys
>
> Assume the result in vep-cache.out is correct, would you please advice 
> what's wrong in the parameters when running VEP via remote database?
>
> Many thanks for your help!
>
> Kenneth Wong
>
>
>
>
> _______________________________________________
> Dev mailing listDev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog:http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180511/8fa65fe6/attachment.html>


More information about the Dev mailing list