[ensembl-dev] Use of uninitialized value $xref_display_label in substitution (s///) in TranscriptAdaptor.pm

Mon Mar 2 16:02:17 GMT 2015

Hi there,

I was trying to run the following script to get either cDNA or CDS
sequences of canonical transcripts of protein coding genes of a given
species:

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
use Data::Dumper;

my $species=$ARGV[0];
my $minlength=$ARGV[1];
my $seqtype=$ARGV[2];
if ($seqtype ne "cds" && $seqtype ne "cdna"){
    print "Select the type of sequence you want to retrieve: cdna or cds.
\"$seqtype\" is not either.\nUsage: perl
retrieve_protcod_canonicaltranscript.pl speciesname mintranscriptlength
cdna/cds\nExample: perl retrieve_protcod_canonicaltranscript.pl
pelodiscus_sinensis 200 cdna\n\n";
    exit;
}

my $registry = "Bio::EnsEMBL::Registry";
$registry->load_registry_from_multiple_dbs(
    {-host => 'mysql-eg-publicsql.ebi.ac.uk',
     -port => 4157,
     -user => 'anonymous'
    },
    {-host => 'ensembldb.ensembl.org',
     -port => 5306,
     -user    => 'anonymous'
    }
);

unless (open(OUTPUT, ">${species}_protcod_canonicaltranscript$seqtype.fa")){
    print "Cannot create an output file\n\n";
    exit;
}

my $gene_adaptor  = $registry->get_adaptor( $species, 'Core', 'Gene' );
my @gene_ids= @{$gene_adaptor->list_stable_ids()};
my $count = 0;
my $noncodingcount = 0;

while (my $gene_id = shift @gene_ids) {
    my $gene = $gene_adaptor->fetch_by_stable_id($gene_id);
    if ($gene->biotype eq 'protein_coding'){
        my $cds = $gene->canonical_transcript()->translateable_seq();
        my $cdna = $gene->canonical_transcript()->spliced_seq();
        $count++;
        if (length $cdna >= $minlength && $seqtype eq 'cdna') {
            print OUTPUT ">$gene_id\n$cdna\n";
        } elsif (length $cds >= $minlength && $seqtype eq 'cds') {
            print OUTPUT ">$gene_id\n$cds\n";
        } else {
            next;
        }
    } elsif ($gene->biotype ne 'protein_coding'){
        $noncodingcount++;
        next;
    }
}

my $total = $count + $noncodingcount;
print "Found $count protein coding genes.\nThere were $noncodingcount
non-coding genes.\nIn total, there are $total genes. Check this number to
be sure.\nWritten fasta file with their canonical transcript sequence in
${species}_protcod_canonicaltranscript$seqtype.fa\n\n";
exit;

and although it seems to do what it has to do, I get this STDERR when
parsing protein coding genes:

Use of uninitialized value $xref_display_label in substitution (s///) at
> /home/champi/Software/ensemblAPI/src/ensembl/modules/Bio/EnsEMBL/DBSQL/TranscriptAdaptor.pm
> line 1861.

And I don't know why. The script looks OK, and it finds the sequences that
I'm looking for...

I have installed the API branch/release 78, and was running the script for
Nematostella vectensis.

Any help is very much appreciated.

Best,

Juan

-- 

Juan Pascual-Anaya, PhD
Research Scientist
Evolutionary Morphology Laboratory, RIKEN
2-2-3 Minatojima-minamimachi
Chuo-ku, Kobe, Hyogo 650-0047
Japan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150303/0f26a625/attachment.html>