[ensembl-dev] Ensembl 59 Haplotype Sequences
bio.x2y at gmail.com
Tue Oct 12 17:20:05 BST 2010
Apologies for sending this again, I'm just hoping someone might be able to
shed some light on this?
I haven't been able to find an explanation elsewhere,
On Tue, Oct 5, 2010 at 7:16 PM, Bio X2Y <bio.x2y at gmail.com> wrote:
> I understand that the Ensembl 59 is based on GRCh37.p1.
> For haplotypes, GRCh37 seems to include sequences for the alternative part
> of the target chromosome, rather than a full alternative version of the
> chromosome. Ensembl seems to take the other approach, releasing a full-sized
> alternative chromsome sequence (at least for file downloads).
> Intuitively, I imagine this is done by identifying the region in the
> original chromosome that corresponds to the alternative region, and
> replacing it with that region.
> When I try to verify this, however, I seem to be seeing an off-by-one error
> for some haplotypes, and not for others.
> GRCh37 releases a small file (alt_locus_scaf2primary.pos) with each
> haplotype, and this seems to provide the coordinates (start from 1,
> inclusive) that determine how to insert the alternative sequence into the
> parent chromosome. For example, the following details are provided for the
> APD haplotype for the chromosome 6 MHC:
> Chrom_start = 28696604
> Chrom_end = 33335493
> Alt_loci_start = 1
> Alt_loci_end = 4622290
> The sequence size of APD is 4622290 in GRCh37, and the full length APD
> haplotype in Ensembl is 171098467.
> Since the original chromosome 6 is length 171115067, I would intuitively
> think that the following procedure can be used to predict the Ensembl size
> for the full haplotype:
> (Full_chromosome_length - [chrom_end - chrom_start + 1] + [alt_loci_end -
> alt_loci_start + 1])
> Where we can imagine that chrom_start and chrom_end describe the region
> ("hole") in the original chromosome that is replaced with the alternative
> Indeed, this works for APD - we get the Ensembl figure of 171098467.
> However, it doesn't work for the haplotypes where the size of the "hole" in
> the original sequence is smaller than the region being inserted. In these
> cases, it is off-by-one.
> Also, it doesn't work for the chromosome 4 haplotype, even though the
> "hole" in the original sequence is larger than the region being inserted.
> Could someone perhaps explain why I'm seeing this? I assume I'm missing
> something simple.
> Thanks for your time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Dev