[ensembl-dev] GRCh37 chrY sequence ?

Bronwen Aken ba1 at sanger.ac.uk
Mon Jun 11 13:35:27 BST 2012


Hi Hiram,


For the human Y chromosome in Ensembl, we have included DNA sequence (A/G/C/T) for only the unique region. The rest of the chromosome is masked with Ns, which explains how the length of the chromosome matches the GRC chromosome but the composition of the sequence is shifted. The reason we only include the unique region of Y is to make sure that we represent each region of the genome only once. 

grep \> Homo_sapiens.GRCh37.67.dna.chromosome.Y.fa 
>Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1


To add a bit more detail, the Y chromosome has four regions, two of which are unique to Y and two of which are shared with X.
chromosome:GRCh37:Y:1 - 10000 is unique to Y but it's a string of 1000 Ns (Pseudoautosomal region)
chromosome:GRCh37:Y:10001 - 2649520 is shared with X
chromosome:GRCh37:Y:2649521- 59034049  is unique to Y
chromosome:GRCh37:Y:59034050 - 59373566 is shared with X

We store sequence for only the 2 unique regions of Y in our database. The full chromosome Y can be generated on-the-fly by our API, where we stitch in the shared sequence from X. By default our API will fetch only the unique regions of Y however you can request to stitch in the X sequence by setting the 4th argument in the SliceAdaptor to '1' : 
$slice_adaptor->fetch_all('toplevel', undef,0,1)};
The relationship between the shared regions of X and Y are stored in the assembly_exception table.


Hope that helps.

Cheers,
Bronwen


On 8 Jun 2012, at 17:29, Hiram Clawson wrote:

> Good Morning Ensembl:
> 
> A user reported to UCSC that the GRCh37/hg19 chrY sequence at UCSC is
> different from the chrY sequence at Ensembl.  I picked up the v67
> chrY sequence from Ensembl and compared it to the UCSC chrY and
> the GRCh37 chrY sequence and Ensembl has a different sequence.
> I checked previous versions of chrY from Ensembl and they remain
> the same, so it isn't a patched sequence.  Anyone know what
> the story is here ?
> 
> --Hiram
> 
> faCount composition measure of chrY sequence from genbank, UCSC and Ensembl:
> 
> #seq           len     A       C       G       T       N       cpg
> CM000686.1 59373566 7667625 5099171 5153288 7733482 33720000 217906
> hg19.chrY  59373566 7667625 5099171 5153288 7733482 33720000 217906
> Y.v67      59373566 6965778 4475138 4518436 7025177 36389037 163434
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120611/a7ae689a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2058 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120611/a7ae689a/attachment.p7s>


More information about the Dev mailing list