[ensembl-dev] Inconsistence output result found between running by cache and remote database

Laurent Gil lgil at ebi.ac.uk
Mon May 14 16:01:46 BST 2018


Dear Kenneth,

Unfortunately we changed significantly the VEP code (and it's structure) 
between the version 87 and 88 and we added the BAM-edit option in 
version 90 so it won't be easy to find a way to make it works with VEP 
version 80.

The RefSeq data has been updated few times since the freeze of the 
Ensembl geneset on GRCh37 (version 75) and we regularly update the 
Variation data on this assembly.

The "new" VEP (from version 88) is more robust: 
http://www.ensembl.info/2016/12/13/unwrap-the-new-vep-for-christmas/ and 
the Ensembl/VEP version 92 has more variant annotations, so I would 
recommend to upgrade to the release 92.


Best regards,

Laurent

On 14/05/2018 09:00, Kenneth Wong wrote:
> Dear Laurent,
>
> I try to run below command with VEP version 88 (and cache v88). It can 
> fix the mentioned issue successfully.
>
> ./vep -cache -refseq -i variants.vcf --bam interim_GRCh37.p13_knownrefseq_alignments_2017-01-13.bam
>
> As the VEP version used in our project is version 80, I try running 
> the above command in version 80 but it failed due to unsupported 
> option "--bam".
> To avoid upgrading VEP (code & cache) from version 80 to 92 which will 
> involve tremendous amount of testing work, would you please advice any 
> workaround approach to use BAM-edited transcript models in version 80 ?
>
> If there is no simple workaround approach, do you think it is feasible 
> for us to port the related scripts from version 92 to version 80?
>
> Many thanks for your kindly help an advice!
>
> Best Rdgs,
> Kenneth
>
> On Sat, May 12, 2018 at 8:26 AM, Kenneth Wong <kenneth at l3-bioinfo.com 
> <mailto:kenneth at l3-bioinfo.com>> wrote:
>
>     Dear Laurent,
>
>     Thanks a lot for your prompt and detailed answer! It makes sense
>     to fix such a mismatch through VEP cache.
>
>     Thanks again for yr great help!
>
>     Best Rgds,
>     Kenneth
>
>
>     在 2018年5月11日週五 23:29,Laurent Gil <lgil at ebi.ac.uk
>     <mailto:lgil at ebi.ac.uk>> 寫道:
>
>         Dear Kenneth,
>
>         You are right: the VEP cache result is correct.
>
>         VEP returns annotations based on the Genomic Reference
>         Assembly sequence.
>
>         As you might know, sometimes the RefSeq sequences don't match
>         the Genomic Reference Assembly sequence, which is the case for
>         NM_000348.3:
>
>         NM_000348.3:     121TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA  NM_000348.3:     180
>                           121|||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||                   180
>                    2:31805920TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA            2:31805861
>
>
>         We started to develop a way to alter the Genomic Reference
>         Assembly sequence for RefSeq sequence (when the sequences
>         don't match) in order to provide a more accurate
>         prediction/annotation of variants on RefSeq data.
>
>         At the moment this is only available through the VEP cache, so
>         we strongly recommend to use VEP cache when you use VEP with
>         the RefSeq dataset.
>
>         I hope this makes sense.
>
>         Best regards,
>
>         Laurent
>
>         On 11/05/2018 10:45, Kenneth Wong wrote:
>>         The scenario is described as below :
>>         1. input.vcf :
>>            ##fileformat=VCFv4.0
>>            #CHROM  POS     ID      REF  ALT     QUAL    FILTER  INFO
>>            2       31754395        .  C       T       .       .       .
>>
>>         2. Execute below commands with versions 92
>>
>>          /* Run VEP by connecting to remote Ensembl DB */
>>           ./vep -i  input.vcf -o vep-online.out\
>>           --fork 4 --buffer_size 100\
>>           --refseq --assembly "GRCh37"\
>>           --database --port 3337 --db 92\
>>           --hgvs --numbers --symbol --flag_pick --no_stats\
>>           --species homo_sapiens\
>>           --fields $VEP_FIELDS
>>
>>          /* Run VEP via local cache */
>>           ./vep -i input.vcf -o vep-cache.out\
>>           --fork 4 --buffer_size 1000\
>>           --offline --cache --dir_cache $VEP_CACHE_DIR\
>>           --refseq --fasta
>>         ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>>           --hgvs --numbers --symbol --flag_pick --no_stats\
>>           --species homo_sapiens\
>>           --fields $VEP_FIELDS\
>>           --use_given_ref
>>
>>         , where
>>         $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene,
>>         SYMBOL, SYMBOL_SOURCE, HGNC_ID, Feature, Consequence,
>>         Protein_position, Amino_acids, Codons, HGVSc, HGVSp
>>
>>         $VEP_CACHE_DIR=~/.vep/cache/
>>
>>         3. Below discrepancies are found between the output file
>>         "vep-cache.out" and "vep-online.out" (as attached)
>>
>>         vep-cache.out :
>>         - Amino_acids = R/Q
>>         - Codons = cGa/cAa
>>         - HGVSp = NP_000339.2:p.Arg227Gln
>>
>>         vep-online.out :
>>         - Amino_acids = E/K
>>         - Codons = Gag/Aag
>>         - HGVSp = NP_000339.2:p.Glu227Lys
>>
>>         Assume the result in vep-cache.out is correct, would you
>>         please advice what's wrong in the parameters when running VEP
>>         via remote database?
>>
>>         Many thanks for your help!
>>
>>         Kenneth Wong
>>
>>
>>
>>
>>         _______________________________________________
>>         Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>         <http://lists.ensembl.org/mailman/listinfo/dev>
>>         Ensembl Blog:http://www.ensembl.info/
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180514/09c66d56/attachment.html>


More information about the Dev mailing list