[ensembl-dev] Finding "canonical/reference" gene by external name when ALT loci are involved

mag mr6 at ebi.ac.uk
Mon Oct 14 15:13:21 BST 2019


Hi Marc,

For the example of C4B, ENSG00000233312 is indeed the reference gene in 
that it is deemed the most representative version of C4B.

It sounds like what you are looking for is the version of C4B that is 
located on the primary assembly.
For that, you can check if the underlying genomic location of the gene 
is reference, ie "$gene->slice->is_reference"


Hope that helps,
mag

On 14/10/2019 13:57, Marc Hoeppner wrote:
> Hi,
>
> I have the following issue - I need to use a gene name (HGNC) to
> recover the corresponding gene, fetch the canonical transcript and
> write out a BED-type file for all the "coding" exons in the "canonical
> coordinate system" (i.e. a chromosome with start/stop).
>
> So far, so good. Unfortunately, a lot of genes now return multiple
> ENSEMBL gene objects because of the ALT alleles. An example would be
> "C4B".
>
> I figured that I can use the "$gene->is_reference" function, but that
> is actually set for both:
> LRG_138
> ENSG00000233312
>
> The former is not a genomic coordinate, exactly, and the latter is
> located on CHR_HSCHR6_MHC_SSTO_CTG1 (an ALT locus).
>
> What I think I want is "ENSG00000224389" - but that is not considered
> "reference" by the API. Essentially, I just can't figure out how to
> automatically determine that this is the reference gene in "canonical"
> coordinates. Any flag I may have overlooked?
>
> With kind regards,
> Marc
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list