[ensembl-dev] would ENSEMBL kindly host merged and filtered vcf files for gnomad2.1

Sarah Hunt seh at ebi.ac.uk
Tue May 14 14:32:41 BST 2019


Hi Sergey,

Thanks for you kind words about Ensembl.

Ensembl is not an archive - the data files on our FTP site are used in 
our services or created from our databases for each release - so we 
would not be best placed to host these files. Have you considered 
contacting the gnomAD team about hosting your slimmed down files?

Best wishes,

Sarah


On 13/05/2019 03:44, Sergey Naumenko wrote:
> Dear Ensembl developers!
>
> Thank you for all your great work!
>
> Gnomad 2.1. is a major update of Gnomad database of variation in the 
> human population
> (whole exome and whole genome sequencing).
> https://macarthurlab.org/2018/10/17/gnomad-v2-1/
>
> We are using Ensembl hosted Gnomad vcf files in cloudbiolinux and bcbio.
> chapmanb/cloudbiolinux <https://github.com/chapmanb/cloudbiolinux>
> bcbio/bcbio-nextgen <https://github.com/bcbio/bcbio-nextgen>
>
> There is a difference between gnomad2.0.1 files and gnomad2.1 - they 
> are split into chromosomes:
> Index of 
> /pub/data_files/homo_sapiens/GRCh37/variation_genotype/gnomad/r2.1/exomes 
> <http://ftp.ensemblorg.ebi.ac.uk/pub/data_files/homo_sapiens/GRCh37/variation_genotype/gnomad/r2.1/exomes/>
>
> To use gnomad2.1 in the annotation step of bcbio (we annotate with 
> vcfanno), we decided to merge the files
> and remove a number of INFO fields to reduce the file size, see the 
> discussion here:
> Using gnomad2.1: request for opinions · Issue #2736 · 
> bcbio/bcbio-nextgen <https://github.com/bcbio/bcbio-nextgen/issues/2736>
>
> We created recipes in cloudbiolinux to merge gnomad2.1 vcfs for 
> grch37, grch38, and hg19.
> https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/GRCh37/gnomad.yaml
>
> However, the long running time makes merging gnomad vcf files in every 
> local installation not feasible.
> We decided to generate merged files once, and then provide users with 
> easy to install recipe.
>
> Would you kindly agree to host merged vcfs for gnomad exome and genome 
> for grch37 and grch38 on ENSEMBL FTP server?
>
> We would be happy to produce the files and upload them.
> The technical steps on how we merge the vcfs are listed in the recipe:
> we sort the variants, filter only PASS variants, keep the pre-defined 
> subset of INFO fields, etc.
>
> We hope that many of Ensembl users would benefit
> from the merged and relatively slim gnomad2.1 vcf files,
> and we are happy to share our work with Ensembl.
>
> Thanks!
> Sergey Naumenko
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://mail.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190514/6640cf1d/attachment.html>


More information about the Dev mailing list