[ensembl-dev] would ENSEMBL kindly host merged and filtered vcf files for gnomad2.1

Sergey Naumenko evolgenomicslab at gmail.com
Mon May 13 03:44:56 BST 2019

Dear Ensembl developers!

Thank you for all your great work!

Gnomad 2.1. is a major update of Gnomad database of variation in the human
(whole exome and whole genome sequencing).

We are using Ensembl hosted Gnomad vcf files in cloudbiolinux and bcbio.
chapmanb/cloudbiolinux <https://github.com/chapmanb/cloudbiolinux>
bcbio/bcbio-nextgen <https://github.com/bcbio/bcbio-nextgen>

There is a difference between gnomad2.0.1 files and gnomad2.1 - they are
split into chromosomes:
Index of

To use gnomad2.1 in the annotation step of bcbio (we annotate with
vcfanno), we decided to merge the files
and remove a number of INFO fields to reduce the file size, see the
discussion here:
Using gnomad2.1: request for opinions · Issue #2736 · bcbio/bcbio-nextgen

We created recipes in cloudbiolinux to merge gnomad2.1 vcfs for grch37,
grch38, and hg19.

However, the long running time makes merging gnomad vcf files in every
local installation not feasible.
We decided to generate merged files once, and then provide users with easy
to install recipe.

Would you kindly agree to host merged vcfs for gnomad exome and genome for
grch37 and grch38 on ENSEMBL FTP server?

We would be happy to produce the files and upload them.
The technical steps on how we merge the vcfs are listed in the recipe:
we sort the variants, filter only PASS variants, keep the pre-defined
subset of INFO fields, etc.

We hope that many of Ensembl users would benefit
from the merged and relatively slim gnomad2.1 vcf files,
and we are happy to share our work with Ensembl.

Sergey Naumenko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190512/45bd2cbb/attachment.html>

More information about the Dev mailing list