[ensembl-dev] Retrieve genes via name

Christian Cole (Staff) C.Cole at dundee.ac.uk
Mon Oct 10 15:07:10 BST 2016


Hi Kieron,

Thanks for the reply.

I've been using ensembl for many years and I've got to say the LRGs are new to me. Honestly, I don't see why they are getting top-level treatment as opposed to via an external_ref search.

The problem with fetch_all_by_external_names() is that it also matches synonyms and many HGNC gene names are also synonyms for other genes. This is not particularly useful when you know the name of a specific gene and simply want to get the ensembl ID. Hence the suggestion to use fetch_by_display_label() instead.

It seems a surprise to me that although HGNC gene names are uniquely identifying it's not possible to get a one-to-one mapping to ensembl IDs straightforwardly via the API.

I'll do the biotype filtering as you suggest.
Thanks,

Chris



On 10/10/2016, 13:44, "dev-bounces at ensembl.org on behalf of Kieron Taylor" <dev-bounces at ensembl.org on behalf of ktaylor at ebi.ac.uk> wrote:

    Hi Chris,

    LRG genes are a parallel gene set of manual annotations ( http://www.lrg-sequence.org/ ). They've been imported into Ensembl for more than the six years I've worked here, but their annotations have been growing steadily and perhaps more visibly over time. There are several ways to filter them out.

    LRG Genes have a different biotype than the Ensembl gene set (LRG_gene), so you can filter the result set by their biotype ($gene->biotype ne 'LRG_gene'). We do not currently provide a method that will roll both display name and biotype into a single request.

    If you wish to be more specific about fetching genes via their common name, or more accurately a name that you know the source of, then you could do the following:

    $genes = $gene_adaptor->fetch_all_by_external_name("FOXP2","HGNC");

    The second argument takes the Ensembl name for our external source, for example: LRG, KEGG_Enzyme or RefSeq_dna. A full list can be obtained by:

    $db_entry_adaptor = $registry->get_adaptor($species,'core','DBEntry');
    $names_ref = $db_entry_adaptor->get_distinct_external_dbs;
    print join("\n",@$names_ref);

    If you cannot find the source you are looking for in the list, do ask us and we'll try to help. It might be faster to discard the LRG entries yourself of course.

    Hope that helps,


    Kieron


    > On 10 Oct 2016, at 10:48, Christian Cole (Staff) <C.Cole at dundee.ac.uk> wrote:
    >
    > Hi,
    >
    > Last year on this very list, I was advised to use 'fetch_by_display_label' in the perl API in order to retrieve genes via their HGNC or similar name. Now, with more recent versions of the API, I note that some genes have duplicate entries: one starting with 'ENSG' and one with 'LRG_'. These IDs appear to be synonyms with respect to the gene adaptor.
    >
    > Firstly, what are these LRG entries and, secondly, how do I regain the one-to-one mapping of gene names to ensemble IDs?
    > Many thanks,
    >
    > Chris
    >
    > --
    > Dr Christian Cole
    > Co-ordinator, The Data Analysis Group
    > The Barton Group
    > Division of Computational Biology, School of Life Sciences,
    > University of Dundee, Dundee, UK.
    > Tel:+44 1382 388721
    > http://www.compbio.dundee.ac.uk/dag.html
    > twitter: @drchriscole
    > ORCID: http://europepmc.org/authors/0000-0002-2560-2484
    >
    >
    > The University of Dundee is a registered Scottish Charity, No: SC015096_______________________________________________
    > Dev mailing list    Dev at ensembl.org
    > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
    > Ensembl Blog: http://www.ensembl.info/


    _______________________________________________
    Dev mailing list    Dev at ensembl.org
    Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
    Ensembl Blog: http://www.ensembl.info/



The University of Dundee is a registered Scottish Charity, No: SC015096


More information about the Dev mailing list