[ensembl-dev] would ENSEMBL kindly host merged and filtered vcf files for gnomad2.1

Sergey Naumenko evolgenomicslab at gmail.com
Wed May 15 01:15:19 BST 2019


Thanks, Sarah!
We will give it a try.
SN

On Tue, May 14, 2019 at 9:33 AM Sarah Hunt <seh at ebi.ac.uk> wrote:

>
> Hi Sergey,
>
> Thanks for you kind words about Ensembl.
>
> Ensembl is not an archive - the data files on our FTP site are used in our
> services or created from our databases for each release - so we would not
> be best placed to host these files. Have you considered contacting the
> gnomAD team about hosting your slimmed down files?
>
> Best wishes,
>
> Sarah
>
> On 13/05/2019 03:44, Sergey Naumenko wrote:
>
> Dear Ensembl developers!
>
> Thank you for all your great work!
>
> Gnomad 2.1. is a major update of Gnomad database of variation in the human
> population
> (whole exome and whole genome sequencing).
> https://macarthurlab.org/2018/10/17/gnomad-v2-1/
>
> We are using Ensembl hosted Gnomad vcf files in cloudbiolinux and bcbio.
> chapmanb/cloudbiolinux <https://github.com/chapmanb/cloudbiolinux>
> bcbio/bcbio-nextgen <https://github.com/bcbio/bcbio-nextgen>
>
> There is a difference between gnomad2.0.1 files and gnomad2.1 - they are
> split into chromosomes:
> Index of
> /pub/data_files/homo_sapiens/GRCh37/variation_genotype/gnomad/r2.1/exomes
> <http://ftp.ensemblorg.ebi.ac.uk/pub/data_files/homo_sapiens/GRCh37/variation_genotype/gnomad/r2.1/exomes/>
>
> To use gnomad2.1 in the annotation step of bcbio (we annotate with
> vcfanno), we decided to merge the files
> and remove a number of INFO fields to reduce the file size, see the
> discussion here:
> Using gnomad2.1: request for opinions · Issue #2736 · bcbio/bcbio-nextgen
> <https://github.com/bcbio/bcbio-nextgen/issues/2736>
>
> We created recipes in cloudbiolinux to merge gnomad2.1 vcfs for grch37,
> grch38, and hg19.
>
> https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/GRCh37/gnomad.yaml
>
> However, the long running time makes merging gnomad vcf files in every
> local installation not feasible.
> We decided to generate merged files once, and then provide users with easy
> to install recipe.
>
> Would you kindly agree to host merged vcfs for gnomad exome and genome for
> grch37 and grch38 on ENSEMBL FTP server?
>
> We would be happy to produce the files and upload them.
> The technical steps on how we merge the vcfs are listed in the recipe:
> we sort the variants, filter only PASS variants, keep the pre-defined
> subset of INFO fields, etc.
>
> We hope that many of Ensembl users would benefit
> from the merged and relatively slim gnomad2.1 vcf files,
> and we are happy to share our work with Ensembl.
>
> Thanks!
> Sergey Naumenko
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://mail.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://mail.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190514/8cc5de1d/attachment.html>


More information about the Dev mailing list