[ensembl-dev] Finding "canonical/reference" gene by external name when ALT loci are involved

Marc Hoeppner mphoeppner at gmail.com
Mon Oct 14 13:57:02 BST 2019


Hi,

I have the following issue - I need to use a gene name (HGNC) to
recover the corresponding gene, fetch the canonical transcript and
write out a BED-type file for all the "coding" exons in the "canonical
coordinate system" (i.e. a chromosome with start/stop).

So far, so good. Unfortunately, a lot of genes now return multiple
ENSEMBL gene objects because of the ALT alleles. An example would be
"C4B".

I figured that I can use the "$gene->is_reference" function, but that
is actually set for both:
LRG_138
ENSG00000233312

The former is not a genomic coordinate, exactly, and the latter is
located on CHR_HSCHR6_MHC_SSTO_CTG1 (an ALT locus).

What I think I want is "ENSG00000224389" - but that is not considered
"reference" by the API. Essentially, I just can't figure out how to
automatically determine that this is the reference gene in "canonical"
coordinates. Any flag I may have overlooked?

With kind regards,
Marc




More information about the Dev mailing list