[ensembl-dev] CADD_RAW is SNV

Sarah Hunt seh at ebi.ac.uk
Thu Apr 9 09:18:03 BST 2020


Hi Margaret,

Thanks for the examples. I note the input variant is given at it's 
location on GRCh37-

rs145295123 	4:161938 	T

while the assembly is set to GRCh38 in the command line. VEP cache files 
are available for GRCh37 and can be downloaded the same way as you 
picked up the GRCh38 data.

As GRCh38 is our default assembly, we only update GRCh37 resources 
annually and the next new data will be available in our coming release 
at the end of the month.

Best wishes,

Sarah

On 08/04/2020 22:18, Linan, Margaret wrote:
> Hi -
>
> Thanks, also that was the top portion of the annotated VCF file. I have attached a different section of it (see attached).
> A scientist at the Icahn School of Medicine at Mount Sinai made the following comments about it:
>
> 1)  Here is an example of a line and the header as opened in excel. The wrong value is in BH but there might be others.
>       When contacting VEP please check the number/proportion of missense variants that completely lack MAF and
>       other annotation values.
>
>
> 2) Checking the missense variants, there are still issues with columns AO-BP that are mostly empty (most missense variants should be
>       present in gnomAD and have proper predictions by various methods), and also many missing CADD values. There are some values
>       that are wrong, such as "gnomAD_AMR" in column BH.
>
>
>
> Best,
> Margaret
>
> -----Original Message-----
> From: Dev <dev-bounces at ensembl.org> On Behalf Of Thomas Danhorn
> Sent: Wednesday, April 8, 2020 12:44 PM
> To: Ensembl developers list <dev at ensembl.org>
> Subject: Re: [ensembl-dev] CADD_RAW is SNV
>
> I noticed that all of the variants in your spreadsheet have "intergenic"
> in the Consequence column, and that most (all?) of the columns with missing data pertain to genes or proteins, which are not applicable for intergenic variants, so I would expect them to be empty.
>
> Hope this helps,
>
> Thomas
>
>
> On Wed, 8 Apr 2020, Linan, Margaret wrote:
>
>> Hi Souhila,
>>
>>
>> Thanks, that worked. But there is still missing data for the other columns, what can I do to fix this? Please see my attached annotated file.
>>
>> Here is my command:
>> ./vep -i ./project_data/top200k.vcf --tab --assembly GRCh38 --cache
>> --offline --dir_plugins /root/.vep/Plugins --plugin
>> CADD,./project_data/whole_genome_SNVs.tsv.gz,./project_data/InDels.tsv
>> .gz -o annotations.vcf --everything --variant_class --sift b
>> --polyphen b --ccds --uniprot --hgvs --symbol --numbers --domains
>> --regulatory --canonical --protein --biotype --uniprot --tsl --appris
>> --gene_phenotype --af --af_1kg --af_esp --af_gnomad --max_af --pubmed
>> --variant_class -mane
>>
>> Best,
>> Margaret
>>
>>
>> From: Souhila Amanzougarene <souhila.amanzougarene at cnrs.fr>
>> Sent: Wednesday, April 8, 2020 12:46 AM
>> To: Ensembl developers list <dev at ensembl.org>; Linan, Margaret
>> <margaret.linan at mssm.edu>
>> Subject: Re: [ensembl-dev] CADD_RAW is SNV
>>
>> USE CAUTION: External Message.
>>
>>
>> Hi Margaret,
>>
>> You obtain SNV in the CADD_RAW column, because you have using the file : whole_genome_SNVs_inclAnno.tsv.gz instead of : whole_genome_SNVs.tsv.gz that contains CADD score.
>>
>> CADD plugin only reports scores and does not consider any additional annotations from a CADD file. It is therefore sufficient to use CADD files without the additional annotations.
>>
>> Hope this helps
>>
>> Best regards
>>
>> Souhila
>> Le 07/04/2020 à 22:55, Linan, Margaret a écrit :
>> Hi -
>>
>> I am trying to use vep in tab delimited mode. But no matter what I do, I keep seeing SNV in the CADD_RAW column.
>>
>> Here is my command:
>> ./vep -i ./project_data/top200k.vcf --tab --assembly GRCh38 --cache
>> --offline --dir_plugins /root/.vep/Plugins --plugin
>> CADD,./project_data/whole_genome_SNVs_in
>> clAnno.tsv.gz,./project_data/InDels_inclAnno.tsv.gz -o annotations.vcf
>> --everything --variant_class --sift b --polyphen b --ccds --uniprot
>> --hgvs --symbol --num bers --domains --regulatory --canonical
>> --protein --biotype --uniprot --tsl --appris --gene_phenotype --af
>> --af_1kg --af_esp --af_gnomad --max_af --pubmed --var iant_class -mane
>>
>> Best,
>> Margaret
>>
>>
>>
>> _______________________________________________
>>
>> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
>>
>> Posting guidelines and subscribe/unsubscribe info:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org
>> _mailman_listinfo_dev-5Fensembl.org&d=DwIDAw&c=shNJtf5dKgNcPZ6Yh64b-A&
>> r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6woDJL_jnA&m=tZTyzbbAQZ7SNA0sHHVsbR
>> aUae58YycIOVZUw2wV6cU&s=HPapQFkkcNQaoSaJaip6DMDZv-j1JMAJOmHGltspB30&e=
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.or
>> g_mailman_listinfo_dev-5Fensembl.org&d=DwMD-g&c=shNJtf5dKgNcPZ6Yh64b-A
>> &r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6woDJL_jnA&m=aB-9z7RzWEP4a5q1iyIMZ
>> 8bquwo0gX2YfgEe_6JouXo&s=GCqALYOFHFwecOGJ-WNwUP0fyhG7YAdTuMfWtfv9TrM&e
>> =>
>>
>> Ensembl Blog:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&
>> d=DwIDAw&c=shNJtf5dKgNcPZ6Yh64b-A&r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6
>> woDJL_jnA&m=tZTyzbbAQZ7SNA0sHHVsbRaUae58YycIOVZUw2wV6cU&s=yWl4ZoDn_juX
>> NqGwR16CsxhTlVcDWxNF6QI5XGWiqko&e=
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_
>> &d=DwMD-g&c=shNJtf5dKgNcPZ6Yh64b-A&r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn
>> 6woDJL_jnA&m=aB-9z7RzWEP4a5q1iyIMZ8bquwo0gX2YfgEe_6JouXo&s=VWKFwIuEkxW
>> VU62Zw6hsZlqPusmbkRjUcS4c6SkesCA&e=>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200409/5fa958a0/attachment.html>


More information about the Dev mailing list