[ensembl-dev] Missing IDS in ENSEMBL database

Thibaut Hourlier th3 at sanger.ac.uk
Wed May 30 14:29:05 BST 2012


Hi Duarte,
I maybe forgot to mention that you need to look in the supporting features.
Have a look at the script:

use strict;
use warnings;

use Getopt::Long;
use Bio::EnsEMBL::DBSQL::DBAdaptor;


# Connection to the DB
my $host   = '';
my $port   = '';
my $user   = '';
my $pass   = '';
my $dbname = '';

&GetOptions (
             'host=s'       => \$host,
             'port=s'       => \$port,
             'user=s'       => \$user,
             'pass=s'       => \$pass,
             'dbname=s'     => \$dbname,
         );
my $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new(
     -host => $host,
     -port => $port,
     -user => $user,
     -dbname => $dbname
     );
my $transcript_adaptor = $db->get_TranscriptAdaptor();
my @transcripts_of_interest = qw( NM_001167607.1 GM2A NM_001199987.1 
NDUFB6 );
foreach my $query_transcript (@transcripts_of_interest) {
     chomp $query_transcript;
     my $transcript = "";
     if ($query_transcript =~ /ENST/i){
         $transcript =   
$transcript_adaptor->fetch_by_stable_id("$query_transcript");
     }
     else{
         ($transcript) = @{ 
$transcript_adaptor->fetch_all_by_external_name("$query_transcript")};
         }
         unless ($transcript){
             print STDERR "Query: $query_transcript failed\n";
             next;
         }
         foreach my $supporting_feature 
(@{$transcript->get_all_supporting_features}) {
             print STDERR 'Supporting feature of ', 
$transcript->display_id, ': ', $supporting_feature->display_id, "\n";
         }
         foreach my $exon ( @{ $transcript->get_all_Exons() } ) {
             foreach my $exon_support 
(@{$exon->get_all_supporting_features}) {
                 print STDERR $query_transcript,":\t",$exon->display_id, 
' ; ', $exon_support->display_id,"\n";
             }
         }
     }

Regards
Thibaut

On 25/05/12 13:53, Duarte Molha wrote:
> Unfortunately it does not work. The fetch_all_by_external_name in not able to retrieve the gene (when applied to the gene adaptor)
> Of the correct transcript (when applied to the transcript adaptor)
>
> :(
>
>
> -----Original Message-----
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Thibaut Hourlier
> Sent: 25 May 2012 13:34
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Missing IDS in ENSEMBL database
>
> Hi Duarte,
> I apologize for the misspelling of the method.
> It should work with the transcript adaptor. Otherwise, as you said, using the gene adaptor and looping on the transcripts should work.
>
> Regards
> Thibaut
>
> On 25 May 2012, at 12:40, Duarte Molha wrote:
>
>> Ok...
>>
>> But that was the method I was using on my code already.
>>
>> So what Thibaut was suggesting is that I use the method call on a gene adaptor to retrieve the gene and then use that gene to retrieve its transcripts?
>>
>> Best regards
>>
>> Duarte
>>
>>
>> -----Original Message-----
>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On
>> Behalf Of Carlos Garcia Giron
>> Sent: 25 May 2012 12:36
>> To: Ensembl developers list
>> Subject: Re: [ensembl-dev] Missing IDS in ENSEMBL database
>>
>> Dear Duarte,
>>
>> The method is called "fetch_all_by_external_name" and it can be found for:
>>
>> Bio::EnsEMBL::DBSQL::GeneAdaptor::fetch_all_by_external_name()
>> Bio::EnsEMBL::DBSQL::TranscriptAdaptor::fetch_all_by_external_name()
>> Bio::EnsEMBL::DBSQL::TranslationAdaptor::fetch_all_by_external_name()
>>
>> I hope it helps.
>>
>> Kind regards,
>> Carlos
>>
>> Duarte Molha wrote:
>>> Dear Thibaut Hourlier
>>>
>>> I was searching the doxygen documentation for that method call you
>>> indicated but do not seem to be able to find it.
>>>
>>> It is a method call for a transcript adaptor?
>>>
>>> Best regards
>>>
>>> Duarte
>>>
>>> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
>>> Behalf Of *Duarte Molha
>>> *Sent:* 25 May 2012 12:02
>>> *To:* Ensembl developers list
>>> *Subject:* Re: [ensembl-dev] Missing IDS in ENSEMBL database
>>>
>>> Thanks Thibaut Hourlier
>>>
>>> I just get very confused with all these IDS with the same format
>>> meaning different things!
>>>
>>> Best regards
>>>
>>> Duarte
>>>
>>> *From:* dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>
>>> [mailto:dev-bounces at ensembl.org] *On Behalf Of *Thibaut Hourlier
>>> *Sent:* 25 May 2012 11:20
>>> *To:* Ensembl developers list
>>> *Subject:* Re: [ensembl-dev] Missing IDS in ENSEMBL database
>>>
>>> Dear Duarte,
>>>
>>> I went through the four first of your IDs:
>>>
>>> On 25 May 2012, at 10:14, Duarte Molha wrote:
>>>
>>> Dear Developers
>>>
>>> I created a simple script to output the exons of specific transcripts
>>> with NM ids.
>>>
>>> It works fine for all but a small list of IDS. The large majority of
>>> the failed IDS have been suppressed from NCBI
>>>
>>> Because they were found to be a " /nonsense-mediated mRNA decay (NMD)
>>> candidate/" so I do not mind eliminating those records from my query.
>>>
>>> / /
>>>
>>> /However some of the ones that fail are in NCBI database and for some
>>> reason ENSEMBL is not able to query them:/
>>>
>>> /NM/_001040409.1
>>>
>>> It is an NMD transcript.
>>>
>>> NM_001167607.1
>>>
>>> It is an exon supporting feature and not a transcript supporting
>>> feature, i think this is the reason you don't get it with your script
>>>
>>> http://www.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=
>>> c
>>> ore;g=ENSG00000196743;r=5:150591711-150650001;t=ENST00000523466
>>>
>>> NM_001199987.1
>>>
>>> If you look in the Gene database at NCBI you will see that there is 2
>>> other sequences for NDUFB6, which are the transcript supporting
>>> feature for the 2 transcripts in Ensembl for the gene.
>>>
>>> http://www.ncbi.nlm.nih.gov/gene/?term=NM_001199987.1
>>>
>>> http://www.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=
>>> c
>>> ore;g=ENSG00000165264;r=9:32552997-32573160;t=ENST00000379847
>>>
>>> NM_001204090.1
>>>
>>> Same problem as above, we did not use this sequence.
>>>
>>> What you can do is to use the HGNC identifier of these failing IDs in
>>> the fetch_all_by_external_id method, i.e. NM_001199987.1 ->  NDUFB6
>>>
>>> Regards
>>>
>>> Thibaut
>>>
>>> NM_001242881.1
>>>
>>> NM_014249.2
>>>
>>> NM_015584.3
>>>
>>> NM_024728.2
>>>
>>> Can you tell me how to retrieve these from the database?
>>>
>>> Here is the portion of my script I use to retrieve the data:
>>>
>>> foreach my $query_transcript (@transcripts_of_interest) {
>>>
>>> chomp $query_transcript;
>>>
>>> my $transcript = "";
>>>
>>> if ($query_transcript =~ /ENST/i){
>>>
>>> $transcript =
>>> $transcript_adaptor->fetch_by_stable_id("$query_transcript");
>>>
>>> }
>>>
>>> else{
>>>
>>> ($transcript) = @{
>>> $transcript_adaptor->fetch_all_by_external_name("$query_transcript");
>>>
>>> }
>>>
>>> unless ($transcript){
>>>
>>> $progress->message("Query: $query_transcript failed");
>>>
>>> next;
>>>
>>> }
>>>
>>> foreach my $exon ( @{ $transcript->get_all_Exons() } ) {
>>>
>>> my $estring = feature2string($exon);
>>>
>>> print "$query_transcript:\t$estring\n";
>>>
>>> }
>>>
>>> $next_update = $progress->update() if (++$j>  $next_update);
>>>
>>> }
>>>
>>> Best regards
>>>
>>> Duarte Molha
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>  List admin
>>> (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list