[ensembl-dev] Ensembl 59 Haplotype Sequences

Bio X2Y bio.x2y at gmail.com
Tue Oct 12 17:20:05 BST 2010


Hi,

Apologies for sending this again, I'm just hoping someone might be able to
shed some light on this?
I haven't been able to find an explanation elsewhere,

Thanks.


On Tue, Oct 5, 2010 at 7:16 PM, Bio X2Y <bio.x2y at gmail.com> wrote:

> Hi,
>
> I understand that the Ensembl 59 is based on GRCh37.p1.
>
> For haplotypes, GRCh37 seems to include sequences for the alternative part
> of the target chromosome, rather than a full alternative version of the
> chromosome. Ensembl seems to take the other approach, releasing a full-sized
> alternative chromsome sequence (at least for file downloads).
>
> Intuitively, I imagine this is done by identifying the region in the
> original chromosome that corresponds to the alternative region, and
> replacing it with that region.
>
> When I try to verify this, however, I seem to be seeing an off-by-one error
> for some haplotypes, and not for others.
>
> GRCh37 releases a small file (alt_locus_scaf2primary.pos) with each
> haplotype, and this seems to provide the coordinates (start from 1,
> inclusive) that determine how to insert the alternative sequence into the
> parent chromosome. For example, the following details are provided for the
> APD haplotype for the chromosome 6 MHC:
>
> Chrom_start = 28696604
> Chrom_end = 33335493
> Alt_loci_start = 1
> Alt_loci_end = 4622290
>
> The sequence size of APD is 4622290 in GRCh37, and the full length APD
> haplotype in Ensembl is 171098467.
> Since the original chromosome 6 is length 171115067, I would intuitively
> think that the following procedure can be used to predict the Ensembl size
> for the full haplotype:
>
> (Full_chromosome_length - [chrom_end - chrom_start + 1] + [alt_loci_end -
> alt_loci_start + 1])
> Where we can imagine that chrom_start and chrom_end describe the region
> ("hole") in the original chromosome that is replaced with the alternative
> region.
>
> Indeed, this works for APD - we get the Ensembl figure of 171098467.
>
> However, it doesn't work for the haplotypes where the size of the "hole" in
> the original sequence is smaller than the region being inserted. In these
> cases, it is off-by-one.
>
> Also, it doesn't work for the chromosome 4 haplotype, even though the
> "hole" in the original sequence is larger than the region being inserted.
>
> Could someone perhaps explain why I'm seeing this? I assume I'm missing
> something simple.
>
> Thanks for your time.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20101012/bd1cc01d/attachment.html>


More information about the Dev mailing list