[ensembl-dev] Downloading Human genome via ftp, do I need to patch?

Andy Yates ayates at ebi.ac.uk
Wed Sep 19 15:28:58 BST 2012


Hi Allan,

Sorry for the long wait for a reply. Any file which mentions PATCH is a section of the GRC patch assembly, with padding, to provide full length chromosomes suitable for blasting and retrieving co-ordinates. They are not files compatible with unix utilities such as patch. If you want to work with just the plain assembly i.e. no patch regions then you can download the individual chromosome files or:

ftp://ftp.ensembl.org/pub/release-68/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.68.dna.primary_assembly.fa.gz

Which is all non-patch regions concatenated into one file.

Regards,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 14 Sep 2012, at 08:59, Allan Kamau wrote:

> I would like to download and use the DNA fasta sequences of the entire
> human genome.
> Looking at the ftp directory I see several files ending with
> "PATCH.fa.gz", do I need to patch the contents of any of the fasta
> files of the 24 chromosomes (1-22, X and Y) for example
> "ftp://ftp.ensembl.org/pub/release-68/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.68.dna.chromosome.1.fa.gz"?
> 
> Allan.
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list