[ensembl-dev] Question about vep refseq cache version annotation with dbNSFP plugin.

Will McLaren wm2 at ebi.ac.uk
Thu Apr 30 13:23:57 BST 2015


Hi Namchul,

Please take all of this with a pinch of salt; mappings between Ensembl and
RefSeq transcripts vary in quality, and there is no guarantee that if a
corresponding ID is found it will have the same transcript sequence
(typically they will differ in non-coding regions but coding differences
are also not uncommon).

Any sequence differences will therefore make values calculated against
Ensembl transcripts (such as those in dbNSFP) potentially invalid and
misleading. Also bear in mind that the current dbNSFP release may use
transcripts from an older version of Ensembl than is current for the VEP.

On 30 April 2015 at 08:46, namchul ghim <chulghim at gmail.com> wrote:

> Thanks..
>
> 1)
>  I got grch39 mapping list (ensembl transcriptid and refseq_transcript id)
> from your  homo_sapiens_core_79_38
>
> But, I can't homo_sapiens_core_79_37 database.
> where do i get GRCh37 version's mapping list ?
>

You can connect to the 37 database on port 3337:

mysql -hensembldb.ensembl.org -uanonymous -P3337 -Dhomo_sapiens_core_79_37

You could also try BioMart to retrieve the mapping, e.g:

http://grch37.ensembl.org/biomart/martview/b7985fa1a36be3b616c0b915031b9a1a


>
>
> 2)  I will insert refseq id in dbNSFP db file.
>      What is the field name ?
>

The field VEP uses in the dbNSFP file is named Ensembl_transcriptid. You
could replace this with the RefSeq ID and use the plugin unmodified, but it
would be better to add it as a new field and modify the plugin (just grep
dbNSFP.pm for Ensembl_transcriptid to see where you need to change it).

Regards

Will


>
>
>
> On Wed, Apr 29, 2015 at 7:27 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Hi Namchul,
>>
>> The current version of the dbNSFP plugin does not work with the RefSeq
>> cache; this is because data is looked up on a key constructed from the
>> position, variant allele and Ensembl transcript ID, so will not work when
>> the input key contains the RefSeq transcript ID and not the Ensembl one.
>>
>> If dbNSFP add the RefSeq transcript ID as a field in the data files, then
>> the plugin could use this as its key; until then the plugin will only work
>> with the Ensembl cache.
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>> On 29 April 2015 at 02:38, namchul ghim <chulghim at gmail.com> wrote:
>>
>>> my environment
>>>  release v79
>>>  latest vep
>>>  dbnsfp version - 2.9
>>>  reference - grch37
>>>
>>> dbNSFP plugin works at vep original cache version  very well.
>>> but, does not work at  refseq cache version..
>>>
>>> Why?
>>> my command is the following.
>>>
>>> perl /src/ensembl-tools/scripts/variant_effect_predictor/
>>> variant_effect_predictor.pl -i example_GRCh37.vcf --cache --offline
>>> --everything --force_overwrite --dir /cache/GRCh37 -o out.vcf --vcf
>>> --refseq --species homo_sapiens --xref_refseq --regulatory
>>> --flag_pick_allele  --buffer_size 5000 --fork 10  --hgvs --force --fasta
>>> /cache/GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa  --plugin
>>> dbNSFP,/cache/GRCh37/dbNSFP/dbNSFP.gz,ref,alt,aaref,aaalt,rs_dbSNP141,hg38_chr,hg38_pos,genename,Uniprot_acc,Uniprot_id,Uniprot_aapos,Interpro_domain,cds_strand,refcodon,SLR_test_statistic,codonpos,fold-degenerate,Ancestral_allele,Ensembl_geneid,Ensembl_transcriptid,aapos,aapos_SIFT,aapos_FATHMM,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,FATHMM_score,FATHMM_rankscore,FATHMM_pred,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,Reliability_index,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore,phyloP46way_primate,phyloP46way_primate_rankscore,phyloP46way_placental,phyloP46way_placental_rankscore,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore,phastCons46way_primate,phastCons46way_primate_rankscore,phastCons46way_placental,phastCons46way_placental_rankscore,phastCons100way_vertebrate,phastCons100way_vertebrate_rankscore,SiPhy_29way_pi,SiPhy_29way_logOdds,SiPhy_29way_logOdds_rankscore,LRT_Omega,UniSNP_ids,1000Gp1_AC,1000Gp1_AF,1000Gp1_AFR_AC,1000Gp1_AFR_AF,1000Gp1_EUR_AC,1000Gp1_EUR_AF,1000Gp1_AMR_AC,1000Gp1_AMR_AF,1000Gp1_ASN_AC,1000Gp1_ASN_AF,ESP6500_AA_AF,ESP6500_EA_AF,ARIC5606_AA_AC,ARIC5606_AA_AF,ARIC5606_EA_AC,ARIC5606_EA_AF,ExAC_AC,ExAC_AF,ExAC_Adj_AC,ExAC_Adj_AF,ExAC_AFR_AC,ExAC_AFR_AF,ExAC_AMR_AC,ExAC_AMR_AF,ExAC_EAS_AC,ExAC_EAS_AF,ExAC_FIN_AC,ExAC_FIN_AF,ExAC_NFE_AC,ExAC_NFE_AF,ExAC_SAS_AC,ExAC_SAS_AF,clinvar_rs,clinvar_clnsig,clinvar_trait,COSMIC_ID,COSMIC_CNT
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150430/931f367a/attachment.html>


More information about the Dev mailing list