[ensembl-dev] Retrieve genes via name

Kieron Taylor ktaylor at ebi.ac.uk
Mon Oct 10 13:44:45 BST 2016


Hi Chris,

LRG genes are a parallel gene set of manual annotations ( http://www.lrg-sequence.org/ ). They've been imported into Ensembl for more than the six years I've worked here, but their annotations have been growing steadily and perhaps more visibly over time. There are several ways to filter them out.

LRG Genes have a different biotype than the Ensembl gene set (LRG_gene), so you can filter the result set by their biotype ($gene->biotype ne 'LRG_gene'). We do not currently provide a method that will roll both display name and biotype into a single request.

If you wish to be more specific about fetching genes via their common name, or more accurately a name that you know the source of, then you could do the following:

$genes = $gene_adaptor->fetch_all_by_external_name("FOXP2","HGNC");

The second argument takes the Ensembl name for our external source, for example: LRG, KEGG_Enzyme or RefSeq_dna. A full list can be obtained by:

$db_entry_adaptor = $registry->get_adaptor($species,'core','DBEntry');
$names_ref = $db_entry_adaptor->get_distinct_external_dbs;
print join("\n",@$names_ref);

If you cannot find the source you are looking for in the list, do ask us and we'll try to help. It might be faster to discard the LRG entries yourself of course.

Hope that helps,


Kieron


> On 10 Oct 2016, at 10:48, Christian Cole (Staff) <C.Cole at dundee.ac.uk> wrote:
> 
> Hi,
>  
> Last year on this very list, I was advised to use 'fetch_by_display_label' in the perl API in order to retrieve genes via their HGNC or similar name. Now, with more recent versions of the API, I note that some genes have duplicate entries: one starting with 'ENSG' and one with 'LRG_'. These IDs appear to be synonyms with respect to the gene adaptor.
>  
> Firstly, what are these LRG entries and, secondly, how do I regain the one-to-one mapping of gene names to ensemble IDs?
> Many thanks,
>  
> Chris
>  
> --
> Dr Christian Cole
> Co-ordinator, The Data Analysis Group
> The Barton Group
> Division of Computational Biology, School of Life Sciences,
> University of Dundee, Dundee, UK.
> Tel:+44 1382 388721
> http://www.compbio.dundee.ac.uk/dag.html
> twitter: @drchriscole
> ORCID: http://europepmc.org/authors/0000-0002-2560-2484
>  
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096_______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list