[ensembl-dev] length of HG79_PATCH

Hervé Pagès hpages at fhcrc.org
Tue Nov 15 05:26:45 GMT 2011


Hi,

Related to this thread from Oct 2010:

   http://lists.ensembl.org/pipermail/dev/2010-October/000304.html

In Ensembl release 64 (and maybe in previous releases, I didn't
check), the 'seq_region' table for homo sapiens

 
ftp://ftp.ensembl.org/pub/release-64/mysql/homo_sapiens_core_64_37/seq_region.txt.gz

contains entries for some of the "patch" sequences that belong to
GRCh37.p5. Those "patch" sequences are named with the _PATCH suffix
(e.g. HG7_PATCH, HG79_PATCH, HG506_HG1000_1_PATCH, etc...),
and each "patch" has 2 entries in the table. For example, here are
the 2 rows for HG79_PATCH:

seq_region_id        name  coord_system_id     length
     100965615  HG79_PATCH                2  141223844
    1000157396  HG79_PATCH                3     330164

According to the 'coord_system' table, coord_system_id 2 and 3
correspond to "chromosome" and "supercontig", respectively.
So one possible interpretation could be that the 2 lengths
reported for HG79_PATCH are (1) the length of the chromosome
that this patch belongs to, and (2) the length of the patched
region.

However, according to this page

 
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-79

HG79_PATCH belongs to chr9 and is mapped to region
136049443 - 136317858. So it's mapped to a region of length
268416, but the 2nd length reported for the patch is 330164.
That seems to confirm what the OP reported in the above thread
i.e. that the patch is replacing a region in the reference genome
by a larger region.

Also, according to this page

 
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml

the length of chr9 is 141213431. But the first length reported
for HG79_PATCH in the 'seq_region' table is 141223844, which is
the length of chr9 + 10413.

Where are those 10413 extra nucleotides coming from? Could it be
that this first length reported for HG79_PATCH is the length of
chr9 *after* its alteration by the patch? But that doesn't seem
to be the case either since this alteration would add 61748 bases
to chr9 (330164 - 268416).

So my question is: what are those 2 lengths reported for HG79_PATCH,
and for the "patch" sequences in general?

Thanks in advance for any clarification.

Cheers,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319




More information about the Dev mailing list