[ensembl-dev] RegulatoryFeatures in VEP v79?

Konrad Karczewski konradk at broadinstitute.org
Thu May 21 15:17:07 BST 2015

Yeah, I was surprised as well. Removing the histones makes sense: could account for most of the difference, but it went from 1916763 to 0(!). Even in a biased set, would have expected a few by chance.

If you'd like more examples, it's the ExAC dataset (ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3/ExAC.r0.3.sites.vep.vcf.gz) - this one was annotated with v77 so the ones with RegulatoryFeatures (albeit 2M of them) may give you a starting point.

Thanks for looking into it!

On Thu, May 21, 2015 at 10:02 AM, Daniel Zerbino <zerbino at ebi.ac.uk>

> Wait a minute, *none* of them do? That's something else entirely...
> Now, ExAC is a somewhat biased set in that it is pulled down from exon 
> sequencing, and a difference between the old and the new build is that 
> regions with the histone marks associated to transcription are no longer 
> annotated as "regulatory". This is what you see at your locus 1:13372, 
> on e75, where there used to be mostly "gene associated" annotations 
> across cell types.
> However, 0 hits across 10M sounds very suspicious. For example, the 
> locus you describe happens to be on the edge of a CTCF feature (the 
> actual binding site is a bit farther on the 5'), so it should 
> technically have been reported.
> We'll investigate...
> On 5/21/15 1:56 PM, Konrad Karczewski wrote:
>> Hi Daniel,
>> Ah ok great thanks! Thing is though: now none of the 10M variants in 
>> ExAC overlap RegulatoryFeatures. Is that expected? I would have 
>> expected at least a few...
>> -Konrad
>> On Thu, May 21, 2015 at 2:17 AM, Daniel Zerbino <zerbino at ebi.ac.uk 
>> <mailto:zerbino at ebi.ac.uk>> wrote:
>>     Hello Konrad,
>>     this is because on release 79 we replaced the old regulatory build
>>     with the newer version (which we had released for GRCh38 in v76).
>>     There would definitely be some moving around of features as both
>>     builds are very different in the way they are computed.
>>     Regards,
>>     Daniel
>>     On 5/21/15 5:28 AM, Konrad Karczewski wrote:
>>>     Hi Will, everyone,
>>>     Are RegulatoryFeature annotations expected to have the same
>>>     results in VEP v79 (GRCh37) as previous versions (e.g. v77)? When
>>>     annotating the ExAC VCF, the older versions included many ENSR*
>>>     annotations, but v79's do not (same command including
>>>     --everything both times). For instance, the following variant
>>>     used to overlap ENSR00000528767 but does not seem to in my most
>>>     recent version:
>>>     1       13372   .       G       C
>>>     Any idea why this might be happening? All other annotations seem
>>>     fine.
>>>     Thanks!
>>>     -Konrad
>>>     _______________________________________________
>>>     Dev mailing listDev at ensembl.org
>>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>     Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150521/106e6d55/attachment.html>

More information about the Dev mailing list