[ensembl-dev] Inconsistence output result found between running by cache and remote database
Laurent Gil
lgil at ebi.ac.uk
Fri May 11 16:36:14 BST 2018
Dear Kenneth,
We have some documentation about this RefSeq issue on the Ensembl
website for more detailed information:
http://grch37.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq_bam
Best regards,
Laurent
On 11/05/2018 16:29, Laurent Gil wrote:
>
> Dear Kenneth,
>
> You are right: the VEP cache result is correct.
>
> VEP returns annotations based on the Genomic Reference Assembly sequence.
>
> As you might know, sometimes the RefSeq sequences don't match the
> Genomic Reference Assembly sequence, which is the case for NM_000348.3:
>
> NM_000348.3: 121TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA NM_000348.3: 180
> 121|||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| 180
> 2:31805920TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA 2:31805861
>
> We started to develop a way to alter the Genomic Reference Assembly
> sequence for RefSeq sequence (when the sequences don't match) in order
> to provide a more accurate prediction/annotation of variants on RefSeq
> data.
>
> At the moment this is only available through the VEP cache, so we
> strongly recommend to use VEP cache when you use VEP with the RefSeq
> dataset.
>
> I hope this makes sense.
>
> Best regards,
>
> Laurent
> On 11/05/2018 10:45, Kenneth Wong wrote:
>> The scenario is described as below :
>> 1. input.vcf :
>> ##fileformat=VCFv4.0
>> #CHROM POS ID REF ALT QUAL FILTER INFO
>> 2 31754395 . C T . . .
>>
>> 2. Execute below commands with versions 92
>>
>> /* Run VEP by connecting to remote Ensembl DB */
>> ./vep -i input.vcf -o vep-online.out\
>> --fork 4 --buffer_size 100\
>> --refseq --assembly "GRCh37"\
>> --database --port 3337 --db 92\
>> --hgvs --numbers --symbol --flag_pick --no_stats\
>> --species homo_sapiens\
>> --fields $VEP_FIELDS
>>
>> /* Run VEP via local cache */
>> ./vep -i input.vcf -o vep-cache.out\
>> --fork 4 --buffer_size 1000\
>> --offline --cache --dir_cache $VEP_CACHE_DIR\
>> --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>> --hgvs --numbers --symbol --flag_pick --no_stats\
>> --species homo_sapiens\
>> --fields $VEP_FIELDS\
>> --use_given_ref
>>
>> , where
>> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL,
>> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position,
>> Amino_acids, Codons, HGVSc, HGVSp
>>
>> $VEP_CACHE_DIR=~/.vep/cache/
>>
>> 3. Below discrepancies are found between the output file
>> "vep-cache.out" and "vep-online.out" (as attached)
>>
>> vep-cache.out :
>> - Amino_acids = R/Q
>> - Codons = cGa/cAa
>> - HGVSp = NP_000339.2:p.Arg227Gln
>>
>> vep-online.out :
>> - Amino_acids = E/K
>> - Codons = Gag/Aag
>> - HGVSp = NP_000339.2:p.Glu227Lys
>>
>> Assume the result in vep-cache.out is correct, would you please
>> advice what's wrong in the parameters when running VEP via remote
>> database?
>>
>> Many thanks for your help!
>>
>> Kenneth Wong
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180511/76e74fcb/attachment.html>
More information about the Dev
mailing list