[ensembl-dev] VEP using CADD plugin

Eva Goncalves Serra egs at sanger.ac.uk
Thu Jul 3 14:44:40 BST 2014


Thanks very much!

Eva

From: Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>>
Reply-To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Date: Thursday, 3 July 2014 14:38
To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: [ensembl-dev] VEP using CADD plugin

Hi Eva,

You should get the data file and index from the line:

All possible SNVs of GRCh37/hg19   [ file (79G)<http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz> | tabix index (2.7M)<http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz.tbi> ]

Regards

Will McLaren
Ensembl Variation


On 3 July 2014 14:19, Eva Goncalves Serra <egs at sanger.ac.uk<mailto:egs at sanger.ac.uk>> wrote:
Hi,

I am a bit confused as to which file from http://cadd.gs.washington.edu/download I should download to use the plugin of CADD scores.

Any help would be appreciated.

Thanks!

Eva

From: Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>>
Reply-To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Date: Wednesday, 7 May 2014 16:13
To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: [ensembl-dev] VEP using CADD plugin

Hello,

Correct, the plugin was intended to work with the whole_genome_SNVs.tsv file, which only contains data for SNVs.

I've modified the plugin so that it should be able to cope with indel data files such as you have; please do let me know if you have any problems as I've only sparingly tested it on made-up data!

Regards

Will McLaren
Ensembl Variation


On 7 May 2014 15:37, Genomeo Dev <genomeodev at gmail.com<mailto:genomeodev at gmail.com>> wrote:
Hi,

There seem to be a discrepancy between the CADD score calculated using VEP with the CADD.pm plugin and the tabix direct output:

For example using this 1000G variant:

#CHROM POSIDREFALTQUALFILTERINFO
7 86214932rs140931361TTACTCT.PASS.

variant_effect_predictor.pl<http://variant_effect_predictor.pl> -i input.txt --format vcf --plugin CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz
does not return any CADD score

whereas
$ tabix -p vcf 1000G.tsv.gz 7:86214932-86214932
7 86214932TTACTCT-0.4202432.040

This seems to affect indels and not SNVs. I could see in the plugin that there is a rule to ignore indels. Any suggestions please how to safely change that?

Also, in the plugin, I assume there is a test to ensure the alleles are identical between the input file and the 1000G.tsv.gz file. Is this correct?

Thanks.

--
G.

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140703/ab3125e0/attachment.html>


More information about the Dev mailing list