[ensembl-dev] what happened to the rest of human chrY?

Nicole Washington nlwashington at lbl.gov
Tue Feb 26 19:18:12 GMT 2013


Hi Devs,

Of course, I should have googled the question before posting it to the list.

For anyone wanting to know the answer, it is discussed here:
http://uswest.ensembl.org/info/docs/genebuild/assembly.html

It says:

The pseudoautosomal regions (PAR), where chromosome X and Y share homologous sequence, are defined for human. In Ensembl, the full-length Y chromosome is displayed on our browser. However, within our core human database, the Y chromosome is divided into four regions:
chromosome:GRCh37:Y:1 - 10000 is unique to Y but is a string of 10000 Ns
chromosome:GRCh37:Y:10001 - 2649520 is shared with X (PAR1)
chromosome:GRCh37:Y:2649521- 59034049 is unique to Y
chromosome:GRCh37:Y:59034050 - 59373566 is shared with X (PAR2)
We store sequence for only the two unique regions of Y in our database. The DNA for PAR1 and PAR2 are loaded only for chromosome X. The full-length chromosome Y can be generated on-the-fly by our API, where we stitch in the shared sequence from X.

Nicole

On Feb 26, 2013, at 11:07 AM, Nicole Washington wrote:

> Hi,
> 
> I'm just starting out learning the ENSEMBL API.  I've written a simple script to fetch the human chromosomes and print out a count of the genes for each one.  I am not a human-genome expert, so the answer to my question may very well be due to nuances in the human genome.
> 
> Quite simply, I do:
> 
> my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' );
> my $slices = $slice_adaptor->fetch_all('chromosome');
> 
> and then iterate over the list of slices, printing out some details about each.
> 
> Chr Y behaves differently than the rest of the human chromosomes.  All other chromosomes (1..22, X, MT) go from 1-max_length.  But, it seems that there are two chrY toplevel elements retrieved, neither of which extends the complete length from 1..59,373,566.  They only span 1..10,000 and 2,649,521..59,034,049, completely skipping over the range 10,000..2,649,52.  It also seems that chrY is the only chromosome that is split into two toplevel features.
> 
> Does anyone know why this is?  Am I missing something very simple here?
> 
> Here the output from my script for ChrY:
> 
> Fetching genes for Y...Properties for this chromosome
>  ID: chromosome:GRCh37:Y:1:10000:1
>  Length: 59373566
>  Build: 27507
>  Coords: 1..10000
> found 0 genes.
> Fetching genes for Y...Properties for this chromosome
>  ID: chromosome:GRCh37:Y:2649521:59034049:1
>  Length: 59373566
>  Build: 27507
>  Coords: 2649521..59034049
> found 501 genes.
> 
> 
> Thanks for any help you can provide.
> 
> Nicole
> 
> 
> Nicole Washington
> Research Scientist
> Lawrence Berkeley National Laboratory
> Genomics Division
> 1 Cyclotron Rd. MS64-121
> Berkeley, CA 94720
> 510-486-6836
> NLWashington at lbl.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130226/9664aab2/attachment.html>


More information about the Dev mailing list