[ensembl-dev] How to add custom motifs and transcription factor binding sites to VEP

njohnson njohnson at ebi.ac.uk
Mon Oct 13 11:49:44 BST 2014


Hi Matt

The VEP does currently relies on a funcgen API/DB call to calculate the relative binding affinity of a motif given a variation.  I suspect it's probably better for you to look at the plugin root, likely being quicker to implement, more manageable and re-usable (by others!).  Unless of course you would have some other utility in having a funcgen DB. There would be slightly more than the standard gene/transcript cache requirements:

1 A cache of motif features scores and access to their sequence. (tabix/core API)

2 A separate cache of binding matrices. Individually lazy loaded pwm files would likely do here.

3 A method to re-implement the ensembl-funcgen MotifFeature::get_relative_binding_affinity method using the above.

This would make a really neat VEP plugin.

Nathan Johnson

Ensembl Regulation
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

http://www.ensembl.info/
http://twitter.com/#!/ensembl
https://www.facebook.com/Ensembl.org

On 13 Oct 2014, at 11:26, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Matt,
> 
> I'm afraid your options are somewhat limited here. The custom caches, as you suspected, are just for gene and transcript data.
> 
> You can use the --custom flag to look for overlaps with your features in a gff or similar (http://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html), but this won't do any sequence-based analysis.
> 
> I could see two possible routes:
> 
> 1) Write a plugin to do the analysis. Without knowing the nature of your data, it's hard for me to guess at what you might have to do in the plugin, but I anticipate you'd probably read features from a tabix-indexed data file and then proceed with the analysis from there. The dbNSFP and CADD plugins do similar things (https://github.com/ensembl-variation/VEP_plugins)
> 
> 2) Add your data to a custom Ensembl Funcgen database, and either use this directly or build a cache from it. I have no sense of how hard or easy this might be; our Funcgen team might be able to give some insight here.
> 
> HTH
> 
> Will McLaren
> Ensembl Variation
> 
> On 8 October 2014 19:56, Matt Wood <matt.wood at codifiedgenomics.com> wrote:
> We have some custom motifs and transcription factor binding sites that we'd like to get incorporated into our VEP output. I know that we can create custom caches, but that doesn't sound right for motifs. I know plugins are also a possibility.
> 
> Do you have any suggestions about how I'd incorporate this into VEP or how I should get started?
> 
> Thank you.
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list