[ensembl-dev] How to add custom motifs and transcription factor binding sites to VEP
njohnson
njohnson at ebi.ac.uk
Mon Oct 13 11:49:44 BST 2014
Hi Matt
The VEP does currently relies on a funcgen API/DB call to calculate the relative binding affinity of a motif given a variation. I suspect it's probably better for you to look at the plugin root, likely being quicker to implement, more manageable and re-usable (by others!). Unless of course you would have some other utility in having a funcgen DB. There would be slightly more than the standard gene/transcript cache requirements:
1 A cache of motif features scores and access to their sequence. (tabix/core API)
2 A separate cache of binding matrices. Individually lazy loaded pwm files would likely do here.
3 A method to re-implement the ensembl-funcgen MotifFeature::get_relative_binding_affinity method using the above.
This would make a really neat VEP plugin.
Nathan Johnson
Ensembl Regulation
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
http://www.ensembl.info/
http://twitter.com/#!/ensembl
https://www.facebook.com/Ensembl.org
On 13 Oct 2014, at 11:26, Will McLaren <wm2 at ebi.ac.uk> wrote:
> Hi Matt,
>
> I'm afraid your options are somewhat limited here. The custom caches, as you suspected, are just for gene and transcript data.
>
> You can use the --custom flag to look for overlaps with your features in a gff or similar (http://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html), but this won't do any sequence-based analysis.
>
> I could see two possible routes:
>
> 1) Write a plugin to do the analysis. Without knowing the nature of your data, it's hard for me to guess at what you might have to do in the plugin, but I anticipate you'd probably read features from a tabix-indexed data file and then proceed with the analysis from there. The dbNSFP and CADD plugins do similar things (https://github.com/ensembl-variation/VEP_plugins)
>
> 2) Add your data to a custom Ensembl Funcgen database, and either use this directly or build a cache from it. I have no sense of how hard or easy this might be; our Funcgen team might be able to give some insight here.
>
> HTH
>
> Will McLaren
> Ensembl Variation
>
> On 8 October 2014 19:56, Matt Wood <matt.wood at codifiedgenomics.com> wrote:
> We have some custom motifs and transcription factor binding sites that we'd like to get incorporated into our VEP output. I know that we can create custom caches, but that doesn't sound right for motifs. I know plugins are also a possibility.
>
> Do you have any suggestions about how I'd incorporate this into VEP or how I should get started?
>
> Thank you.
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list