[ensembl-dev] Missing IDS in ENSEMBL database

Thibaut Hourlier th3 at sanger.ac.uk
Fri May 25 11:19:54 BST 2012


Dear Duarte,
I went through the four first of your IDs:

On 25 May 2012, at 10:14, Duarte Molha wrote:

> Dear Developers
>  
> I created a simple script to output the exons of specific transcripts with NM ids.
> It works fine for all but a small list of IDS. The large majority of the failed IDS have been suppressed from NCBI
> Because they were found to be a “ nonsense-mediated mRNA decay (NMD) candidate” so I do not mind eliminating those records from my query.
>  
> However some of the ones that fail are in NCBI database and for some reason ENSEMBL is not able to query them:
> NM_001040409.1
It is an NMD transcript.
> NM_001167607.1
It is an exon supporting feature and not a transcript supporting feature, i think this is the reason you don't get it with your script
http://www.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000196743;r=5:150591711-150650001;t=ENST00000523466
> NM_001199987.1
If you look in the Gene database at NCBI you will see that there is 2 other sequences for NDUFB6, which are the transcript supporting feature for the 2 transcripts in Ensembl for the gene.
http://www.ncbi.nlm.nih.gov/gene/?term=NM_001199987.1
http://www.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000165264;r=9:32552997-32573160;t=ENST00000379847
> NM_001204090.1
Same problem as above, we did not use this sequence.

What you can do is to use the HGNC identifier of these failing IDs in the fetch_all_by_external_id method, i.e. NM_001199987.1 -> NDUFB6

Regards
Thibaut

> NM_001242881.1
> NM_014249.2
> NM_015584.3
> NM_024728.2
>  
> Can you tell me how to retrieve these from the database?
>  
> Here is the portion of my script I use to retrieve the data:
>  
> foreach my $query_transcript (@transcripts_of_interest) {
>     chomp $query_transcript;
>     my $transcript = "";
>     if ($query_transcript =~ /ENST/i){
>         $transcript =   $transcript_adaptor->fetch_by_stable_id("$query_transcript");
>     }
>     else{
>         ($transcript) = @{ $transcript_adaptor->fetch_all_by_external_name("$query_transcript”);
>     }
>  
>     unless ($transcript){
>         $progress->message("Query: $query_transcript failed");
>         next;
>     }
>     foreach my $exon ( @{ $transcript->get_all_Exons() } ) {
>         my $estring = feature2string($exon);
>         print "$query_transcript:\t$estring\n";
>     }
>     $next_update = $progress->update() if (++$j > $next_update);
>  
> }
>  
> Best regards
>  
> Duarte Molha
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120525/4fb1c408/attachment.html>


More information about the Dev mailing list