[ensembl-dev] Inconsistence output result found between running by cache and remote database
Laurent Gil
lgil at ebi.ac.uk
Mon May 14 16:01:46 BST 2018
Dear Kenneth,
Unfortunately we changed significantly the VEP code (and it's structure)
between the version 87 and 88 and we added the BAM-edit option in
version 90 so it won't be easy to find a way to make it works with VEP
version 80.
The RefSeq data has been updated few times since the freeze of the
Ensembl geneset on GRCh37 (version 75) and we regularly update the
Variation data on this assembly.
The "new" VEP (from version 88) is more robust:
http://www.ensembl.info/2016/12/13/unwrap-the-new-vep-for-christmas/ and
the Ensembl/VEP version 92 has more variant annotations, so I would
recommend to upgrade to the release 92.
Best regards,
Laurent
On 14/05/2018 09:00, Kenneth Wong wrote:
> Dear Laurent,
>
> I try to run below command with VEP version 88 (and cache v88). It can
> fix the mentioned issue successfully.
>
> ./vep -cache -refseq -i variants.vcf --bam interim_GRCh37.p13_knownrefseq_alignments_2017-01-13.bam
>
> As the VEP version used in our project is version 80, I try running
> the above command in version 80 but it failed due to unsupported
> option "--bam".
> To avoid upgrading VEP (code & cache) from version 80 to 92 which will
> involve tremendous amount of testing work, would you please advice any
> workaround approach to use BAM-edited transcript models in version 80 ?
>
> If there is no simple workaround approach, do you think it is feasible
> for us to port the related scripts from version 92 to version 80?
>
> Many thanks for your kindly help an advice!
>
> Best Rdgs,
> Kenneth
>
> On Sat, May 12, 2018 at 8:26 AM, Kenneth Wong <kenneth at l3-bioinfo.com
> <mailto:kenneth at l3-bioinfo.com>> wrote:
>
> Dear Laurent,
>
> Thanks a lot for your prompt and detailed answer! It makes sense
> to fix such a mismatch through VEP cache.
>
> Thanks again for yr great help!
>
> Best Rgds,
> Kenneth
>
>
> 在 2018年5月11日週五 23:29,Laurent Gil <lgil at ebi.ac.uk
> <mailto:lgil at ebi.ac.uk>> 寫道:
>
> Dear Kenneth,
>
> You are right: the VEP cache result is correct.
>
> VEP returns annotations based on the Genomic Reference
> Assembly sequence.
>
> As you might know, sometimes the RefSeq sequences don't match
> the Genomic Reference Assembly sequence, which is the case for
> NM_000348.3:
>
> NM_000348.3: 121TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA NM_000348.3: 180
> 121|||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| 180
> 2:31805920TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA 2:31805861
>
>
> We started to develop a way to alter the Genomic Reference
> Assembly sequence for RefSeq sequence (when the sequences
> don't match) in order to provide a more accurate
> prediction/annotation of variants on RefSeq data.
>
> At the moment this is only available through the VEP cache, so
> we strongly recommend to use VEP cache when you use VEP with
> the RefSeq dataset.
>
> I hope this makes sense.
>
> Best regards,
>
> Laurent
>
> On 11/05/2018 10:45, Kenneth Wong wrote:
>> The scenario is described as below :
>> 1. input.vcf :
>> ##fileformat=VCFv4.0
>> #CHROM POS ID REF ALT QUAL FILTER INFO
>> 2 31754395 . C T . . .
>>
>> 2. Execute below commands with versions 92
>>
>> /* Run VEP by connecting to remote Ensembl DB */
>> ./vep -i input.vcf -o vep-online.out\
>> --fork 4 --buffer_size 100\
>> --refseq --assembly "GRCh37"\
>> --database --port 3337 --db 92\
>> --hgvs --numbers --symbol --flag_pick --no_stats\
>> --species homo_sapiens\
>> --fields $VEP_FIELDS
>>
>> /* Run VEP via local cache */
>> ./vep -i input.vcf -o vep-cache.out\
>> --fork 4 --buffer_size 1000\
>> --offline --cache --dir_cache $VEP_CACHE_DIR\
>> --refseq --fasta
>> ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>> --hgvs --numbers --symbol --flag_pick --no_stats\
>> --species homo_sapiens\
>> --fields $VEP_FIELDS\
>> --use_given_ref
>>
>> , where
>> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene,
>> SYMBOL, SYMBOL_SOURCE, HGNC_ID, Feature, Consequence,
>> Protein_position, Amino_acids, Codons, HGVSc, HGVSp
>>
>> $VEP_CACHE_DIR=~/.vep/cache/
>>
>> 3. Below discrepancies are found between the output file
>> "vep-cache.out" and "vep-online.out" (as attached)
>>
>> vep-cache.out :
>> - Amino_acids = R/Q
>> - Codons = cGa/cAa
>> - HGVSp = NP_000339.2:p.Arg227Gln
>>
>> vep-online.out :
>> - Amino_acids = E/K
>> - Codons = Gag/Aag
>> - HGVSp = NP_000339.2:p.Glu227Lys
>>
>> Assume the result in vep-cache.out is correct, would you
>> please advice what's wrong in the parameters when running VEP
>> via remote database?
>>
>> Many thanks for your help!
>>
>> Kenneth Wong
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> <http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog:http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180514/09c66d56/attachment.html>
More information about the Dev
mailing list