[ensembl-dev] Ensembl 59 Haplotype Sequences

Bio X2Y bio.x2y at gmail.com
Tue Oct 5 19:16:27 BST 2010


I understand that the Ensembl 59 is based on GRCh37.p1.

For haplotypes, GRCh37 seems to include sequences for the alternative part
of the target chromosome, rather than a full alternative version of the
chromosome. Ensembl seems to take the other approach, releasing a full-sized
alternative chromsome sequence (at least for file downloads).

Intuitively, I imagine this is done by identifying the region in the
original chromosome that corresponds to the alternative region, and
replacing it with that region.

When I try to verify this, however, I seem to be seeing an off-by-one error
for some haplotypes, and not for others.

GRCh37 releases a small file (alt_locus_scaf2primary.pos) with each
haplotype, and this seems to provide the coordinates (start from 1,
inclusive) that determine how to insert the alternative sequence into the
parent chromosome. For example, the following details are provided for the
APD haplotype for the chromosome 6 MHC:

Chrom_start = 28696604
Chrom_end = 33335493
Alt_loci_start = 1
Alt_loci_end = 4622290

The sequence size of APD is 4622290 in GRCh37, and the full length APD
haplotype in Ensembl is 171098467.
Since the original chromosome 6 is length 171115067, I would intuitively
think that the following procedure can be used to predict the Ensembl size
for the full haplotype:

(Full_chromosome_length - [chrom_end - chrom_start + 1] + [alt_loci_end -
alt_loci_start + 1])
Where we can imagine that chrom_start and chrom_end describe the region
("hole") in the original chromosome that is replaced with the alternative

Indeed, this works for APD - we get the Ensembl figure of 171098467.

However, it doesn't work for the haplotypes where the size of the "hole" in
the original sequence is smaller than the region being inserted. In these
cases, it is off-by-one.

Also, it doesn't work for the chromosome 4 haplotype, even though the "hole"
in the original sequence is larger than the region being inserted.

Could someone perhaps explain why I'm seeing this? I assume I'm missing
something simple.

Thanks for your time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101005/43769653/attachment.html>

More information about the Dev mailing list