[ensembl-dev] Inconsistence output result found between running by cache and remote database

Laurent Gil lgil at ebi.ac.uk
Fri May 11 16:36:14 BST 2018


Dear Kenneth,

We have some documentation about this RefSeq issue on the Ensembl 
website for more detailed information:
http://grch37.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq_bam 


Best regards,

Laurent

On 11/05/2018 16:29, Laurent Gil wrote:
>
> Dear Kenneth,
>
> You are right: the VEP cache result is correct.
>
> VEP returns annotations based on the Genomic Reference Assembly sequence.
>
> As you might know, sometimes the RefSeq sequences don't match the 
> Genomic Reference Assembly sequence, which is the case for NM_000348.3:
>
> NM_000348.3:     121TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCCCTCCGGCTACGGGAAGCACA  NM_000348.3:     180
>                   121|||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||                   180
>            2:31805920TGGTCGCCCTTGGGGCACTGGCCTTGTACGTCGCGAAGCC-TCCGGCTACGGGAAGCACA            2:31805861
>
> We started to develop a way to alter the Genomic Reference Assembly 
> sequence for RefSeq sequence (when the sequences don't match) in order 
> to provide a more accurate prediction/annotation of variants on RefSeq 
> data.
>
> At the moment this is only available through the VEP cache, so we 
> strongly recommend to use VEP cache when you use VEP with the RefSeq 
> dataset.
>
> I hope this makes sense.
>
> Best regards,
>
> Laurent
> On 11/05/2018 10:45, Kenneth Wong wrote:
>> The scenario is described as below :
>> 1. input.vcf :
>>    ##fileformat=VCFv4.0
>>    #CHROM  POS     ID      REF     ALT     QUAL FILTER  INFO
>>    2       31754395        .       C       T       .  .       .
>>
>> 2. Execute below commands with versions 92
>>
>>  /* Run VEP by connecting to remote Ensembl DB */
>>   ./vep -i  input.vcf -o vep-online.out\
>>   --fork 4 --buffer_size 100\
>>   --refseq --assembly "GRCh37"\
>>   --database --port 3337 --db 92\
>>   --hgvs --numbers --symbol --flag_pick --no_stats\
>>   --species homo_sapiens\
>>   --fields $VEP_FIELDS
>>
>>  /* Run VEP via local cache */
>>   ./vep -i input.vcf -o vep-cache.out\
>>   --fork 4 --buffer_size 1000\
>>   --offline --cache --dir_cache $VEP_CACHE_DIR\
>>   --refseq --fasta ~/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa\
>>   --hgvs --numbers --symbol --flag_pick --no_stats\
>>   --species homo_sapiens\
>>   --fields $VEP_FIELDS\
>>   --use_given_ref
>>
>> , where
>> $VEP_FIELDS=Uploaded_variation, Location, Allele, Gene, SYMBOL, 
>> SYMBOL_SOURCE, HGNC_ID, Feature, Consequence, Protein_position, 
>> Amino_acids, Codons, HGVSc, HGVSp
>>
>> $VEP_CACHE_DIR=~/.vep/cache/
>>
>> 3. Below discrepancies are found between the output file 
>> "vep-cache.out" and "vep-online.out" (as attached)
>>
>> vep-cache.out :
>> - Amino_acids = R/Q
>> - Codons = cGa/cAa
>> - HGVSp = NP_000339.2:p.Arg227Gln
>>
>> vep-online.out :
>> - Amino_acids = E/K
>> - Codons = Gag/Aag
>> - HGVSp = NP_000339.2:p.Glu227Lys
>>
>> Assume the result in vep-cache.out is correct, would you please 
>> advice what's wrong in the parameters when running VEP via remote 
>> database?
>>
>> Many thanks for your help!
>>
>> Kenneth Wong
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180511/76e74fcb/attachment.html>


More information about the Dev mailing list