[ensembl-dev] Finding "canonical/reference" gene by external name when ALT loci are involved
Marc Hoeppner
mphoeppner at gmail.com
Mon Oct 14 13:57:02 BST 2019
Hi,
I have the following issue - I need to use a gene name (HGNC) to
recover the corresponding gene, fetch the canonical transcript and
write out a BED-type file for all the "coding" exons in the "canonical
coordinate system" (i.e. a chromosome with start/stop).
So far, so good. Unfortunately, a lot of genes now return multiple
ENSEMBL gene objects because of the ALT alleles. An example would be
"C4B".
I figured that I can use the "$gene->is_reference" function, but that
is actually set for both:
LRG_138
ENSG00000233312
The former is not a genomic coordinate, exactly, and the latter is
located on CHR_HSCHR6_MHC_SSTO_CTG1 (an ALT locus).
What I think I want is "ENSG00000224389" - but that is not considered
"reference" by the API. Essentially, I just can't figure out how to
automatically determine that this is the reference gene in "canonical"
coordinates. Any flag I may have overlooked?
With kind regards,
Marc
More information about the Dev
mailing list