[ensembl-dev] (v80) "RefSeq_gene_name" DBLink missing for ACKR1?

mr6 at ebi.ac.uk mr6 at ebi.ac.uk
Tue Jul 21 18:38:00 BST 2015


Hi Matthew,

The RefSeq import available in the otherfeatures database is not provided
via Biomart, as this would be redundant with fetching the gene set
directly from NCBI.

Biomart is centered around the Ensembl genes, which all have unique stable
ids.
RefSeq identifiers are still available in Biomart via external references.


Hope that helps,
Magali

> How does Biomart handle such issues when stable id may not be unique?
>
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf
> Of Luke Goodsell
> Sent: 21 July, 2015 10:53 AM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] (v80) "RefSeq_gene_name" DBLink missing for
> ACKR1?
>
> Hi Fergal,
>
> That is indeed very helpful.
>
> Many thanks,
> Luke
>
> From: dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>
> [mailto:dev-bounces at ensembl.org] On Behalf Of Fergal
> Sent: 21 July 2015 15:20
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] (v80) "RefSeq_gene_name" DBLink missing for
> ACKR1?
>
> Hi Luke,
>
> The quick answer is to change your script to do the following:
>
> foreach my $query ('NM_002036.3', 'NM_001122951.2') {
>                 print "q: " . $query . "\n";
>                 my $transcripts =
> $external_transcript_adaptor->fetch_all_versions_by_stable_id($query);
>                 foreach my $transcript (@{$transcripts}) {
>                   print "  t: " . $transcript->stable_id . "\n";
>                   foreach my $dbe (
>                                   @{
> $transcript->get_Gene->get_all_DBEntries()
> },
>                                   @{ $transcript->get_all_DBEntries() },
>                                   @{
> $transcript->get_Gene->get_all_DBLinks()
> },
>                                   @{ $transcript->get_all_DBLinks() }
>                                   ) {
>                                   printf "    XREF %s (%s)\n",
> $dbe->display_id(), $dbe->dbname();
>                                 }
>                 }
> }
>
>
> The issue you ran into is because of the assumption that stable ids are
> unique in the otherfeatures db (as they are in the core db). This is not
> the case. The reason is that the otherfeatures db has external gene sets
> and we do not really create stable ids for the genes and transcripts in
> it. For example for refseq genes we take things like the NM accession and
> make it the stable id. This does not mean it's unique, an NM corresponds
> to an mRNA that may map to multiple places in the genome. Refseq often map
> these mRNAs in multiple places and thus you get duplicate stable ids. In
> addition, there are actually two refseq genesets in the db
> ('refseq_human_import', which is a curated set closely linked to the CCDS
> set, and 'refseq_import', which is the full public annotation from the
> gff3 file on the refseq ftp site), leading to duplicate ids between the
> transcripts in each set.
>
> The problem occurs when you request a transcript by stable_id. Currently
> the API returns a single transcript as it assumes that stable ids are
> unique. In e80 the public set, 'refseq_import', has no xrefs (these are
> present in e81 though). The 'fetch_by_stable_id' call was returning only
> the transcript from the public set and thus no xrefs were showing. In e81
> you should get xrefs back from both so there isn't really a problem. For
> e80 the above code, which uses 'fetch_all_versions_by_stable_id' will give
> back all transcripts and thus you will be able to get the xrefs from the
> curated refseq set.
>
> It is worth keeping in mind that mRNAs may be mapped to multiple places,
> that stable ids are not unique and that there are two different refseq
> geneset when working with otherfeatures db. In future we are considering
> the removal of the curated set as it is confusing and is a subset of the
> public set. For the moment, if you only want to work with one of the two
> sets you can use fetch_all_by_logic_name('refseq_import') or
> fetch_all_by_logic_name('refseq_human_import'), though you'll need to move
> to e81 if you want the xrefs for the 'refseq_import' set.
>
> Hope this helps,
>
> Fergal.
>
>
>
> On 21 Jul 2015, at 10:15, Luke Goodsell
> <Luke.Goodsell at ogt.com<mailto:Luke.Goodsell at ogt.com>> wrote:
>
> Hi EnsEMBL Devs,
>
> Querying an  external transcript adaptor for 'NM_002036.3' or
> 'NM_001122951.2' (transcripts of ACKR1/DARC) and calling get_all_DBEntries
> or get_all_DBLinks on either the returned transcript or the transcripts'
> gene objects returns nothing in v80, but in v75 it returns the expected
> RefSeq_gene_name links.
>
> Have the DBLinks/Entries been intentionally removed, is this in error, or
> have I missed something?
>
> Example code below.
>
> Kind regards,
> Luke
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use Bio::EnsEMBL::Registry;
>
> my $registry = 'Bio::EnsEMBL::Registry';
>
> $registry->load_registry_from_db(
>                 -host =>
> 'ensembldb.ensembl.org<http://ensembldb.ensembl.org/>',
>                 -user => 'anonymous',
> );
>
> my $external_transcript_adaptor = $registry->get_adaptor( 'Human',
> 'otherfeatures', 'Transcript' );
>
> foreach my $query ('NM_002036.3', 'NM_001122951.2') {
>                 print "q: " . $query . "\n";
>
>                 my $transcript =
> $external_transcript_adaptor->fetch_by_stable_id($query);
>
>                 print "  t: " . $transcript->stable_id . "\n";
>
>                 foreach my $dbe (
>                                 @{
> $transcript->get_Gene->get_all_DBEntries()
> },
>                                 @{ $transcript->get_all_DBEntries() },
>                                 @{
> $transcript->get_Gene->get_all_DBLinks()
> },
>                                 @{ $transcript->get_all_DBLinks() }
>                 ) {
>                                 printf "    XREF %s (%s)\n",
> $dbe->display_id(), $dbe->dbname();
>                 }
> }
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> ________________________________
> This message (including any attachments) may contain confidential,
> proprietary, privileged and/or private information. The information is
> intended to be for the use of the individual or entity designated above.
> If you are not the intended recipient of this message, please notify the
> sender immediately, and delete the message and any attachments. Any
> disclosure, reproduction, distribution or other use of this message or any
> attachments by an individual or entity other than the intended recipient
> is prohibited.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>






More information about the Dev mailing list