[ensembl-dev] danio rerio RNAseq genes -- relative abundance exon RPKM averaging by gene slice

Albert Vilella avilella at ebi.ac.uk
Sat Oct 23 17:00:31 BST 2010


Hi,

I would like to calculate the relative abundance of genes based on the
exon RPKM values in the Danio rerio RNAseq genes.

I've written an ensembl API script that queries the danio rerio
otherfeautures database and looks at each RPKM as the $feature->p_value
for each exon. Then multiplies the RPKM value by the feature (exon)
length (L), and sums those values over the length of the gene slice.
This gives me the sums of the RPKM*L for all exons for that gene slice,
and I store this for every gene. Is this the right way of having
comparable values for different genes?

See script below. Thanks in advance,

Albert.

----

my $daf_adaptor =
Bio::EnsEMBL::Registry->get_adaptor($species_name,'otherfeatures','DnaAlignFeature');
print
"logic_name,gene_stable_id,member_name,sequence_length,total_number_of_mappable_reads\n";
my $total_length = 0;
my $total_number_of_mappable_reads = 0;
my $logic_name='zfish_male_head';
my $feature_sequence_hash; # hash to avoid repeating features for exon
sequence
foreach my $feature
(@{$daf_adaptor->fetch_all_by_Slice($gene->feature_Slice)}) {
  next unless ($feature->analysis->logic_name eq $logic_name);
  if ($feature->p_value) {
    my $feature_sequence = $feature->seq;
    next if (defined $feature_sequence_hash->{$feature_sequence});
    $feature_sequence_hash->{$feature_sequence} = 1;
    my $length = $feature->length;
    my $rpkm = $feature->p_value;
    my $number_of_mappable_reads = sprintf("%.2f",(($length * $rpkm) /
1000));
    $total_number_of_mappable_reads += $number_of_mappable_reads;
    $total_length += $length;
  }
}
print "$logic_name,$gene_stable_id,$sequence_length,
$total_number_of_mappable_reads\n";









More information about the Dev mailing list