[ensembl-dev] GRCh37 chrY sequence ?

Bert Overduin bert at ebi.ac.uk
Mon Jun 11 14:27:37 BST 2012


Hi Bron,

The sentence "but it's a string of 1000 Ns (Pseudoautosomal region)" does
not make sense to me. First, the region is 10000, not 1000 bp long, second,
this is not a pseudo autosomal region, but the regions that are identical
to X are.

Cheers,
Bert

On Mon, Jun 11, 2012 at 1:35 PM, Bronwen Aken <ba1 at sanger.ac.uk> wrote:

> Hi Hiram,
>
>
> For the human Y chromosome in Ensembl, we have included DNA sequence
> (A/G/C/T) for only the unique region. The rest of the chromosome is masked
> with Ns, which explains how the length of the chromosome matches the GRC
> chromosome but the composition of the sequence is shifted. The reason we
> only include the unique region of Y is to make sure that we represent each
> region of the genome only once.
>
> grep \> Homo_sapiens.GRCh37.67.dna.chromosome.Y.fa
> >Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1
>
>
> To add a bit more detail, the Y chromosome has four regions, two of which
> are unique to Y and two of which are shared with X.
> chromosome:GRCh37:Y:1 - 10000 is unique to Y but it's a string of 1000 Ns
> (Pseudoautosomal region)
> chromosome:GRCh37:Y:10001 - 2649520 is shared with X
> chromosome:GRCh37:Y:2649521- 59034049  is unique to Y
> chromosome:GRCh37:Y:59034050 - 59373566 is shared with X
>
> We store sequence for only the 2 unique regions of Y in our database. The
> full chromosome Y can be generated on-the-fly by our API, where we stitch
> in the shared sequence from X. By default our API will fetch only the
> unique regions of Y however you can request to stitch in the X sequence by
> setting the 4th argument in the SliceAdaptor to '1' :
> $slice_adaptor->fetch_all('toplevel', undef,0,1)};
> The relationship between the shared regions of X and Y are stored in the
> assembly_exception table.
>
>
> Hope that helps.
>
> Cheers,
> Bronwen
>
>
> On 8 Jun 2012, at 17:29, Hiram Clawson wrote:
>
> Good Morning Ensembl:
>
> A user reported to UCSC that the GRCh37/hg19 chrY sequence at UCSC is
> different from the chrY sequence at Ensembl.  I picked up the v67
> chrY sequence from Ensembl and compared it to the UCSC chrY and
> the GRCh37 chrY sequence and Ensembl has a different sequence.
> I checked previous versions of chrY from Ensembl and they remain
> the same, so it isn't a patched sequence.  Anyone know what
> the story is here ?
>
> --Hiram
>
> faCount composition measure of chrY sequence from genbank, UCSC and
> Ensembl:
>
> #seq           len     A       C       G       T       N       cpg
> CM000686.1 59373566 7667625 5099171 5153288 7733482 33720000 217906
> hg19.chrY  59373566 7667625 5099171 5153288 7733482 33720000 217906
> Y.v67      59373566 6965778 4475138 4518436 7025177 36389037 163434
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Bert Overduin, Ph.D.
Vertebrate Genomics Team

EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom

http://www.ebi.ac.uk/~bert

Ensembl browser: http://www.ensembl.org

Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html

Blog: http://www.ensembl.info

YouTube: http://www.youtube.com/user/EnsemblHelpdesk
Facebook: http://www.facebook.com/Ensembl.org
Twitter: http://twitter.com/Ensembl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120611/73805e1f/attachment.html>


More information about the Dev mailing list