[ensembl-dev] (v80) "RefSeq_gene_name" DBLink missing for ACKR1?

Tue Jul 21 15:53:20 BST 2015

Hi Fergal,

That is indeed very helpful.

Many thanks,
Luke

From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Fergal
Sent: 21 July 2015 15:20
To: Ensembl developers list
Subject: Re: [ensembl-dev] (v80) "RefSeq_gene_name" DBLink missing for ACKR1?

Hi Luke,

The quick answer is to change your script to do the following:

foreach my $query ('NM_002036.3', 'NM_001122951.2') {
                print "q: " . $query . "\n";
                my $transcripts = $external_transcript_adaptor->fetch_all_versions_by_stable_id($query);
                foreach my $transcript (@{$transcripts}) {
                  print "  t: " . $transcript->stable_id . "\n";
                  foreach my $dbe (
                                  @{ $transcript->get_Gene->get_all_DBEntries() },
                                  @{ $transcript->get_all_DBEntries() },
                                  @{ $transcript->get_Gene->get_all_DBLinks() },
                                  @{ $transcript->get_all_DBLinks() }
                                  ) {
                                  printf "    XREF %s (%s)\n", $dbe->display_id(), $dbe->dbname();
                                }
                }
}

The issue you ran into is because of the assumption that stable ids are unique in the otherfeatures db (as they are in the core db). This is not the case. The reason is that the otherfeatures db has external gene sets and we do not really create stable ids for the genes and transcripts in it. For example for refseq genes we take things like the NM accession and make it the stable id. This does not mean it's unique, an NM corresponds to an mRNA that may map to multiple places in the genome. Refseq often map these mRNAs in multiple places and thus you get duplicate stable ids. In addition, there are actually two refseq genesets in the db ('refseq_human_import', which is a curated set closely linked to the CCDS set, and 'refseq_import', which is the full public annotation from the gff3 file on the refseq ftp site), leading to duplicate ids between the transcripts in each set.

The problem occurs when you request a transcript by stable_id. Currently the API returns a single transcript as it assumes that stable ids are unique. In e80 the public set, 'refseq_import', has no xrefs (these are present in e81 though). The 'fetch_by_stable_id' call was returning only the transcript from the public set and thus no xrefs were showing. In e81 you should get xrefs back from both so there isn't really a problem. For e80 the above code, which uses 'fetch_all_versions_by_stable_id' will give back all transcripts and thus you will be able to get the xrefs from the curated refseq set.

It is worth keeping in mind that mRNAs may be mapped to multiple places, that stable ids are not unique and that there are two different refseq geneset when working with otherfeatures db. In future we are considering the removal of the curated set as it is confusing and is a subset of the public set. For the moment, if you only want to work with one of the two sets you can use fetch_all_by_logic_name('refseq_import') or fetch_all_by_logic_name('refseq_human_import'), though you'll need to move to e81 if you want the xrefs for the 'refseq_import' set.

Hope this helps,

Fergal.

On 21 Jul 2015, at 10:15, Luke Goodsell <Luke.Goodsell at ogt.com<mailto:Luke.Goodsell at ogt.com>> wrote:

Hi EnsEMBL Devs,

Querying an  external transcript adaptor for 'NM_002036.3' or 'NM_001122951.2' (transcripts of ACKR1/DARC) and calling get_all_DBEntries or get_all_DBLinks on either the returned transcript or the transcripts' gene objects returns nothing in v80, but in v75 it returns the expected RefSeq_gene_name links.

Have the DBLinks/Entries been intentionally removed, is this in error, or have I missed something?

Example code below.

Kind regards,
Luke

#!/usr/bin/perl

use strict;
use warnings;

use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
                -host => 'ensembldb.ensembl.org<http://ensembldb.ensembl.org/>',
                -user => 'anonymous',
);

my $external_transcript_adaptor = $registry->get_adaptor( 'Human', 'otherfeatures', 'Transcript' );

foreach my $query ('NM_002036.3', 'NM_001122951.2') {
                print "q: " . $query . "\n";

                my $transcript = $external_transcript_adaptor->fetch_by_stable_id($query);

                print "  t: " . $transcript->stable_id . "\n";

                foreach my $dbe (
                                @{ $transcript->get_Gene->get_all_DBEntries() },
                                @{ $transcript->get_all_DBEntries() },
                                @{ $transcript->get_Gene->get_all_DBLinks() },
                                @{ $transcript->get_all_DBLinks() }
                ) {
                                printf "    XREF %s (%s)\n", $dbe->display_id(), $dbe->dbname();
                }
}
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150721/6142e3be/attachment.html>