[ensembl-dev] Our treatment of pseudoautosomal regions (PAR)

Healy, Matthew Matthew.Healy at bms.com
Wed Nov 19 11:43:36 GMT 2014


http://link.springer.com/article/10.1007%2Fs10142-013-0323-6

________________________________________
From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Giulietta [gspudich at ebi.ac.uk]
Sent: Wednesday, November 19, 2014 4:26 AM
To: Ensembl developers list
Subject: [ensembl-dev] Our treatment of pseudoautosomal regions (PAR)

Dear all,
Ensembl is reviewing how we provide access to pseudoautosomal regions (PAR) in human, and we'd like your input.
Please let us know how our current PAR strategy works for you. What do you like and what is difficult? If our current strategy (explained below) is not working for you, please let us know why.  (If it does work for you, we’d also like to know why!)
The pseudoautosomal regions (PAR), where chromosome X and Y share homologous sequence, are defined by the GRC for human. In Ensembl, the full-length Y chromosome is displayed on our browser. However, within our core human database, the Y chromosome is divided into five regions:
chromosome:GRCh38:Y:1:10000:1 unique to Y
chromosome:GRCh38:Y:10001:2781479:1 PAR1
chromosome:GRCh38:Y:2781480:56887902:1 unique to Y
chromosome:GRCh38:Y:56887903:57217415:1 PAR2
chromosome:GRCh38:Y:57217416:57227415:1 unique to Y
We store sequence for only the three unique regions of Y in our database. The DNA for PAR1 and PAR2 are stored with the sequence for chromosome X. The full-length chromosome Y can be generated on-the-fly by our API, where we stitch in the shared PAR sequence from X.
What does this mean practically?
- chr Y PARs are identical (ie. same set of contigs in the same order) for human X and Y. When aligning data such as RNA-seq to the genome, masking out the PAR on Y avoids duplicate mappings. This impacts mapping scores.
- Genes are only built on the X PAR regions only. (The genes you see in the Y PAR regions are identical to the genes annotated on X.)
Accessing PARs on the Y chromosome:
- Our Perl API provides a range of methods that allow you to fetch only the unique regions of Y or the whole of Y
- Our websites (e.g. www.ensembl.org<http://www.ensembl.org>) show genomic sequence for chr Y PAR
- Our FTP site provides a fasta file for the whole length of Y, with PARs replaced with Ns
Do feedback any comments by responding to this post, or let us know what you think on helpdesk (at) ensembl.org
Best wishes,
Giulietta (Ensembl Outreach)

This message (including any attachments) may contain confidential, proprietary, privileged and/or private information.  The information is intended to be for the use of the individual or entity designated above.  If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments.  Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.




More information about the Dev mailing list