[ensembl-dev] Fwd: Ensembl API - pseudoautosomal regions

Julia Söllner julia.f.soellner at gmail.com
Wed Dec 7 13:33:12 GMT 2016


Dear Ensembl developers,

I access Ensembl data via the Perl API and retrieve information on genes,
transcripts etc. I have made the observation that if I get data from the
database's gene table there are genes which occur twice, once on the X and
once on the Y chromosome. This affects 45 human genes, for 34/45 genes the
start and end positions on X and Y are identical.

Two examples:
geneIDbiotypechromosomestartend
ENSG00000002586 protein_coding X 2691179 2741309
ENSG00000002586 protein_coding Y 2691179 2741309
ENSG00000124333 protein_coding X 155881293 155943769
ENSG00000124333 protein_coding Y 57067813 57130289

When querying some of these genes via the Ensembl website it turned out
that they are mapped to pseudoautosomal regions (identical sequence on X
and Y).

*Some more information on how I retrieve the data:*

I use the API version 86.

To speed things up I iterate over chromosomes in parallel and retrieve all
genes as follows:

$slice = $slice_adaptor -> fetch_by_region('chromosome', $chr_name);
my @genes = @{$slice -> get_all_Genes()};

So basically ENSG00000124333 is in @genes when querying information on X
and when querying information on Y. If I, however, go via the gene I only
get the X chromosome:

my $gene_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Gene' );

my $gene = $gene_adaptor->fetch_by_stable_id( 'ENSG00000124333');

print $gene->seq_region_name(); # => X

On http://lists.ensembl.org/pipermail/dev/2010-October/000214.html they say
that a gene might exceed a pseudoautosomal region and thus extend into a
region unique to the Y chromosome. This could be a reason why a gene shows
up for X and Y. However, I checked this and there is no overlap between
unique regions of Y and the gene coordinates. PAR-Coordinates from
http://www.ensembl.org/info/genome/genebuild/assembly.html were used.

*Questions*

   - How come the positions are identical for some of the genes?
   - Why do I get these duplicate gene entries?
   - How can I prevent this?


Thanks in advance and kind regards,
Julia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161207/637d6fce/attachment.html>


More information about the Dev mailing list