[ensembl-dev] Use of uninitialized value $xref_display_label in substitution (s///) in TranscriptAdaptor.pm
Juan Pascual Anaya
jpascualanaya at gmail.com
Mon Mar 2 16:02:17 GMT 2015
Hi there,
I was trying to run the following script to get either cDNA or CDS
sequences of canonical transcripts of protein coding genes of a given
species:
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Data::Dumper;
my $species=$ARGV[0];
my $minlength=$ARGV[1];
my $seqtype=$ARGV[2];
if ($seqtype ne "cds" && $seqtype ne "cdna"){
print "Select the type of sequence you want to retrieve: cdna or cds.
\"$seqtype\" is not either.\nUsage: perl
retrieve_protcod_canonicaltranscript.pl speciesname mintranscriptlength
cdna/cds\nExample: perl retrieve_protcod_canonicaltranscript.pl
pelodiscus_sinensis 200 cdna\n\n";
exit;
}
my $registry = "Bio::EnsEMBL::Registry";
$registry->load_registry_from_multiple_dbs(
{-host => 'mysql-eg-publicsql.ebi.ac.uk',
-port => 4157,
-user => 'anonymous'
},
{-host => 'ensembldb.ensembl.org',
-port => 5306,
-user => 'anonymous'
}
);
unless (open(OUTPUT, ">${species}_protcod_canonicaltranscript$seqtype.fa")){
print "Cannot create an output file\n\n";
exit;
}
my $gene_adaptor = $registry->get_adaptor( $species, 'Core', 'Gene' );
my @gene_ids= @{$gene_adaptor->list_stable_ids()};
my $count = 0;
my $noncodingcount = 0;
while (my $gene_id = shift @gene_ids) {
my $gene = $gene_adaptor->fetch_by_stable_id($gene_id);
if ($gene->biotype eq 'protein_coding'){
my $cds = $gene->canonical_transcript()->translateable_seq();
my $cdna = $gene->canonical_transcript()->spliced_seq();
$count++;
if (length $cdna >= $minlength && $seqtype eq 'cdna') {
print OUTPUT ">$gene_id\n$cdna\n";
} elsif (length $cds >= $minlength && $seqtype eq 'cds') {
print OUTPUT ">$gene_id\n$cds\n";
} else {
next;
}
} elsif ($gene->biotype ne 'protein_coding'){
$noncodingcount++;
next;
}
}
my $total = $count + $noncodingcount;
print "Found $count protein coding genes.\nThere were $noncodingcount
non-coding genes.\nIn total, there are $total genes. Check this number to
be sure.\nWritten fasta file with their canonical transcript sequence in
${species}_protcod_canonicaltranscript$seqtype.fa\n\n";
exit;
and although it seems to do what it has to do, I get this STDERR when
parsing protein coding genes:
Use of uninitialized value $xref_display_label in substitution (s///) at
> /home/champi/Software/ensemblAPI/src/ensembl/modules/Bio/EnsEMBL/DBSQL/TranscriptAdaptor.pm
> line 1861.
And I don't know why. The script looks OK, and it finds the sequences that
I'm looking for...
I have installed the API branch/release 78, and was running the script for
Nematostella vectensis.
Any help is very much appreciated.
Best,
Juan
--
Juan Pascual-Anaya, PhD
Research Scientist
Evolutionary Morphology Laboratory, RIKEN
2-2-3 Minatojima-minamimachi
Chuo-ku, Kobe, Hyogo 650-0047
Japan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150303/0f26a625/attachment.html>
More information about the Dev
mailing list