[ensembl-dev] Inconsistence output result found between running by cache and remote database

Kenneth Wong kenneth at l3-bioinfo.com
Mon May 14 09:00:47 BST 2018


Dear Laurent,

I try to run below command with VEP version 88 (and cache v88). It can fix
the mentioned issue successfully.

./vep -cache -refseq -i variants.vcf --bam
interim_GRCh37.p13_knownrefseq_alignments_2017-01-13.bam


As the VEP version used in our project is version 80, I try running the
above command in version 80 but it failed due to unsupported option
"--bam".
To avoid upgrading VEP (code & cache) from version 80 to 92 which will
involve tremendous amount of testing work, would you please advice any
workaround approach to use BAM-edited transcript models in version 80 ?

If there is no simple workaround approach, do you think it is feasible for
us to port the related scripts from version 92 to version 80?

Many thanks for your kindly help an advice!

Best Rdgs,
Kenneth

On Sat, May 12, 2018 at 8:26 AM, Kenneth Wong <kenneth at l3-bioinfo.com>
wrote:

> Dear Laurent,
>
> Thanks a lot for your prompt and detailed answer! It makes sense to fix
> such a mismatch through VEP cache.
>
> Thanks again for yr great help!
>
> Best Rgds,
> Kenneth
>
>
> 在 2018年5月11日週五 23:29,Laurent Gil <lgil at ebi.ac.uk> 寫道:
>
>> Dear Kenneth,
>>
>> You are right: the VEP cache result is correct.
>>
>> VEP returns annotations based on the Genomic Reference Assembly sequence.
>>
>> As you might know, sometimes the RefSeq sequences don't match the Genomic
>> Reference Assembly sequence, which is the case for NM_000348.3:
>>
>> NM_000348.3:     121 TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA NM_000348.3:     180
>>                  121 |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||                  180
>>           2:31805920 TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA           2:31805861
>>
>>
>> We started to develop a way to alter the Genomic Reference Assembly
>> sequence for RefSeq sequence (when the sequences don't match) in order to
>> provide a more accurate prediction/annotation of variants on RefSeq data.
>>
>> At the moment this is only available through the VEP cache, so we
>> strongly recommend to use VEP cache when you use VEP with the RefSeq
>> dataset.
>>
>> I hope this makes sense.
>>
>> Best regards,
>>
>> Laurent
>>
>> On 11/05/2018 10:45, Kenneth Wong wrote:
>>
>> The scenario is described as below :
>> 1. input.vcf :
>>    ##fileformat=VCFv4.0
>>    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>>    2       31754395        .       C       T       .       .       .
>>
>> 2. Execute below commands with versions 92
>>
>>  /* Run VEP by connecting to remote Ensembl DB */
>>   ./vep -i  input.vcf -o vep-online.out\
>>   --fork 4 --buffer_size 100\
>>   --refseq --assembly "GRCh37"\
>>   --database --port 3337 --db 92\
>>   --hgvs --numbers --symbol --flag_pick --no_stats\
>>   --species homo_sapiens\
>>   --fields $VEP_FIELDS
>>
>>  /* Run VEP via local cache */
>>   ./vep -i input.vcf -o vep-cache.out\
>>   --fork 4 --buffer_size 1000\
>>   --offline --cache --dir_cache $VEP_CACHE_DIR\
>>   --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>>   --hgvs --numbers --symbol --flag_pick --no_stats\
>>   --species homo_sapiens\
>>   --fields $VEP_FIELDS\
>>   --use_given_ref
>>
>> , where
>> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL,
>> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position,
>> Amino_acids, Codons, HGVSc, HGVSp
>>
>> $VEP_CACHE_DIR=~/.vep/cache/
>>
>> 3. Below discrepancies are found between the output file "vep-cache.out"
>> and "vep-online.out" (as attached)
>>
>> vep-cache.out :
>> - Amino_acids = R/Q
>> - Codons = cGa/cAa
>> - HGVSp = NP_000339.2:p.Arg227Gln
>>
>> vep-online.out :
>> - Amino_acids = E/K
>> - Codons = Gag/Aag
>> - HGVSp = NP_000339.2:p.Glu227Lys
>>
>> Assume the result in vep-cache.out is correct, would you please advice
>> what's wrong in the parameters when running VEP via remote database?
>>
>> Many thanks for your help!
>>
>> Kenneth Wong
>>
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180514/e39ed557/attachment.html>


More information about the Dev mailing list