[ensembl-dev] CADD_RAW is SNV

Linan, Margaret margaret.linan at mssm.edu
Wed Apr 8 22:18:12 BST 2020


Hi - 

Thanks, also that was the top portion of the annotated VCF file. I have attached a different section of it (see attached).
A scientist at the Icahn School of Medicine at Mount Sinai made the following comments about it:

1)  Here is an example of a line and the header as opened in excel. The wrong value is in BH but there might be others. 
     When contacting VEP please check the number/proportion of missense variants that completely lack MAF and 
     other annotation values.


2) Checking the missense variants, there are still issues with columns AO-BP that are mostly empty (most missense variants should be
     present in gnomAD and have proper predictions by various methods), and also many missing CADD values. There are some values 
     that are wrong, such as "gnomAD_AMR" in column BH.



Best,
Margaret 

-----Original Message-----
From: Dev <dev-bounces at ensembl.org> On Behalf Of Thomas Danhorn
Sent: Wednesday, April 8, 2020 12:44 PM
To: Ensembl developers list <dev at ensembl.org>
Subject: Re: [ensembl-dev] CADD_RAW is SNV

I noticed that all of the variants in your spreadsheet have "intergenic" 
in the Consequence column, and that most (all?) of the columns with missing data pertain to genes or proteins, which are not applicable for intergenic variants, so I would expect them to be empty.

Hope this helps,

Thomas


On Wed, 8 Apr 2020, Linan, Margaret wrote:

> Hi Souhila,
>
>
> Thanks, that worked. But there is still missing data for the other columns, what can I do to fix this? Please see my attached annotated file.
>
> Here is my command:
> ./vep -i ./project_data/top200k.vcf --tab --assembly GRCh38 --cache 
> --offline --dir_plugins /root/.vep/Plugins --plugin 
> CADD,./project_data/whole_genome_SNVs.tsv.gz,./project_data/InDels.tsv
> .gz -o annotations.vcf --everything --variant_class --sift b 
> --polyphen b --ccds --uniprot --hgvs --symbol --numbers --domains 
> --regulatory --canonical --protein --biotype --uniprot --tsl --appris 
> --gene_phenotype --af --af_1kg --af_esp --af_gnomad --max_af --pubmed 
> --variant_class -mane
>
> Best,
> Margaret
>
>
> From: Souhila Amanzougarene <souhila.amanzougarene at cnrs.fr>
> Sent: Wednesday, April 8, 2020 12:46 AM
> To: Ensembl developers list <dev at ensembl.org>; Linan, Margaret 
> <margaret.linan at mssm.edu>
> Subject: Re: [ensembl-dev] CADD_RAW is SNV
>
> USE CAUTION: External Message.
>
>
> Hi Margaret,
>
> You obtain SNV in the CADD_RAW column, because you have using the file : whole_genome_SNVs_inclAnno.tsv.gz instead of : whole_genome_SNVs.tsv.gz that contains CADD score.
>
> CADD plugin only reports scores and does not consider any additional annotations from a CADD file. It is therefore sufficient to use CADD files without the additional annotations.
>
> Hope this helps
>
> Best regards
>
> Souhila
> Le 07/04/2020 à 22:55, Linan, Margaret a écrit :
> Hi -
>
> I am trying to use vep in tab delimited mode. But no matter what I do, I keep seeing SNV in the CADD_RAW column.
>
> Here is my command:
> ./vep -i ./project_data/top200k.vcf --tab --assembly GRCh38 --cache 
> --offline --dir_plugins /root/.vep/Plugins --plugin 
> CADD,./project_data/whole_genome_SNVs_in
> clAnno.tsv.gz,./project_data/InDels_inclAnno.tsv.gz -o annotations.vcf 
> --everything --variant_class --sift b --polyphen b --ccds --uniprot 
> --hgvs --symbol --num bers --domains --regulatory --canonical 
> --protein --biotype --uniprot --tsl --appris --gene_phenotype --af 
> --af_1kg --af_esp --af_gnomad --max_af --pubmed --var iant_class -mane
>
> Best,
> Margaret
>
>
>
> _______________________________________________
>
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org
> _mailman_listinfo_dev-5Fensembl.org&d=DwIDAw&c=shNJtf5dKgNcPZ6Yh64b-A&
> r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6woDJL_jnA&m=tZTyzbbAQZ7SNA0sHHVsbR
> aUae58YycIOVZUw2wV6cU&s=HPapQFkkcNQaoSaJaip6DMDZv-j1JMAJOmHGltspB30&e= 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.or
> g_mailman_listinfo_dev-5Fensembl.org&d=DwMD-g&c=shNJtf5dKgNcPZ6Yh64b-A
> &r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6woDJL_jnA&m=aB-9z7RzWEP4a5q1iyIMZ
> 8bquwo0gX2YfgEe_6JouXo&s=GCqALYOFHFwecOGJ-WNwUP0fyhG7YAdTuMfWtfv9TrM&e
> =>
>
> Ensembl Blog: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&
> d=DwIDAw&c=shNJtf5dKgNcPZ6Yh64b-A&r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn6
> woDJL_jnA&m=tZTyzbbAQZ7SNA0sHHVsbRaUae58YycIOVZUw2wV6cU&s=yWl4ZoDn_juX
> NqGwR16CsxhTlVcDWxNF6QI5XGWiqko&e= 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_
> &d=DwMD-g&c=shNJtf5dKgNcPZ6Yh64b-A&r=kRxZpbitOhDkEC3BuUN1vDtzo3iicYrRn
> 6woDJL_jnA&m=aB-9z7RzWEP4a5q1iyIMZ8bquwo0gX2YfgEe_6JouXo&s=VWKFwIuEkxW
> VU62Zw6hsZlqPusmbkRjUcS4c6SkesCA&e=>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Example.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 10122 bytes
Desc: Example.xlsx
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200408/5ae97ea9/attachment.xlsx>


More information about the Dev mailing list