[ensembl-dev] Use of uninitialized value $xref_display_label in substitution (s///) in TranscriptAdaptor.pm

Mon Mar 2 16:25:27 GMT 2015

Dear Juan,

The error you’re seeing is non-fatal, and an artifact of a bugfix that doesn’t affect your output. I’ve committed a minor patch to release 78 to stop the warning. Please carry on and ignore the warning or update your checkout and it’ll go away.

Regards,

Kieron Taylor PhD.
Ensembl Core senior software developer

EMBL, European Bioinformatics Institute

> On 2 Mar 2015, at 16:02, Juan Pascual Anaya <jpascualanaya at gmail.com> wrote:
> 
> Hi there,
> 
> I was trying to run the following script to get either cDNA or CDS sequences of canonical transcripts of protein coding genes of a given species:
> 
> use strict;
> use warnings;
> 
> use Bio::EnsEMBL::Registry;
> use Data::Dumper;
> 
> my $species=$ARGV[0];
> my $minlength=$ARGV[1];
> my $seqtype=$ARGV[2];
> if ($seqtype ne "cds" && $seqtype ne "cdna"){
>     print "Select the type of sequence you want to retrieve: cdna or cds. \"$seqtype\" is not either.\nUsage: perl retrieve_protcod_canonicaltranscript.pl speciesname mintranscriptlength cdna/cds\nExample: perl retrieve_protcod_canonicaltranscript.pl pelodiscus_sinensis 200 cdna\n\n";
>     exit;
> }
> 
> my $registry = "Bio::EnsEMBL::Registry";
> $registry->load_registry_from_multiple_dbs(
>     {-host => 'mysql-eg-publicsql.ebi.ac.uk',
>      -port => 4157, 
>      -user => 'anonymous'
>     },
>     {-host => 'ensembldb.ensembl.org',
>      -port => 5306,
>      -user    => 'anonymous'
>     }
> );
> 
> unless (open(OUTPUT, ">${species}_protcod_canonicaltranscript$seqtype.fa")){
>     print "Cannot create an output file\n\n";
>     exit;
> }
> 
> my $gene_adaptor  = $registry->get_adaptor( $species, 'Core', 'Gene' );
> my @gene_ids= @{$gene_adaptor->list_stable_ids()};
> my $count = 0;
> my $noncodingcount = 0;
> 
> while (my $gene_id = shift @gene_ids) {
>     my $gene = $gene_adaptor->fetch_by_stable_id($gene_id);
>     if ($gene->biotype eq 'protein_coding'){
>         my $cds = $gene->canonical_transcript()->translateable_seq();
>         my $cdna = $gene->canonical_transcript()->spliced_seq();
>         $count++;
>         if (length $cdna >= $minlength && $seqtype eq 'cdna') {
>             print OUTPUT ">$gene_id\n$cdna\n";
>         } elsif (length $cds >= $minlength && $seqtype eq 'cds') {
>             print OUTPUT ">$gene_id\n$cds\n";
>         } else {
>             next;
>         }
>     } elsif ($gene->biotype ne 'protein_coding'){
>         $noncodingcount++;
>         next;
>     }
> }
> 
> my $total = $count + $noncodingcount;
> print "Found $count protein coding genes.\nThere were $noncodingcount non-coding genes.\nIn total, there are $total genes. Check this number to be sure.\nWritten fasta file with their canonical transcript sequence in ${species}_protcod_canonicaltranscript$seqtype.fa\n\n";
> exit;
> 
> and although it seems to do what it has to do, I get this STDERR when parsing protein coding genes:
> 
> Use of uninitialized value $xref_display_label in substitution (s///) at /home/champi/Software/ensemblAPI/src/ensembl/modules/Bio/EnsEMBL/DBSQL/TranscriptAdaptor.pm line 1861.
> And I don't know why. The script looks OK, and it finds the sequences that I'm looking for... 
> 
> I have installed the API branch/release 78, and was running the script for Nematostella vectensis.
> 
> Any help is very much appreciated.
> 
> Best,
> 
> Juan
> 
> 
> -- 
> 
> Juan Pascual-Anaya, PhD
> Research Scientist
> Evolutionary Morphology Laboratory, RIKEN
> 2-2-3 Minatojima-minamimachi
> Chuo-ku, Kobe, Hyogo 650-0047
> Japan
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/