[ensembl-dev] The number of spanning reads for all genes

J.vandeVorst at gen.umcn.nl J.vandeVorst at gen.umcn.nl
Thu Nov 22 09:39:23 GMT 2012


Dear Thibaut,

I appreciate your quick response. I now have an idea how to get the number of spanning reads, but I have another question. Because I would like to have the number of spanning reads per gene, I fetch the dna features that overlap the gene and count all the reads as you can see in the code below. But as I print the feature IDs, I see only introns. Why are there no exons? And sometimes a big number of reads is shown, like 1728.44444444444. Is this an artefact or something?

Regard Maartje

#!/usr/bin/env perl

use strict;
use warnings;
use Bio::EnsEMBL::DBSQL::DBAdaptor;

# Connection to the rnaseq database
my $rnaseqdb = new Bio::EnsEMBL::DBSQL::DBAdaptor(
        -host =>  'ensembldb.ensembl.org',
        -port =>  5306,
        -user =>  'anonymous',
        -dbname =>  'homo_sapiens_rnaseq_69_37');

# The GeneAdaptor and the DnaFeatureAdaptor are needed
my $rnaseqsa = $rnaseqdb->get_SliceAdaptor();
my $rnaseqaa = $rnaseqdb->get_AnalysisAdaptor();
my $rnaseqdafa = $rnaseqdb->get_DnaAlignFeatureAdaptor();
foreach my $slice (@{$rnaseqsa->fetch_all('toplevel')}) {
    foreach my $analysis (@{$rnaseqaa->fetch_all()}) {
        # To analyse with a logic name
        my $logic_name = $analysis->logic_name;
        next if ($logic_name !~ /rnaseq/);
        print 'Looking at genes from analysis: ', $logic_name, "\n";
        # To load the gene
        foreach my $gene (@{$slice->get_all_Genes_by_type(undef, $logic_name, 1)}) {
            print $gene->display_id, ':', $gene->start, ':', $gene->end, ':', $gene->strand, ' on chromosome ', $gene->seq_region_name, "\n";
            # Rnaseq models have only one transcript per gene
            foreach my $transcript (@{$gene->get_all_Transcripts()}) {
                print $transcript->display_id, ':', $transcript->start, ':', $transcript->end, '  length: ', $transcript->length, "\n";
               my $tslice = $transcript->feature_Slice();
                # To fetch the dna features that overlap the gene
                my $features = $rnaseqdafa->fetch_all_by_Slice($tslice);
                my $read_count = 0;
                foreach my $feature (@{$features}) {
                        print "\t", $feature->display_id, ':', $feature->seq_region_start, ':', $feature->seq_region_end, "\tNumber of spanning reads: ", $feature->score, "\n";
                        # To count all the reads that span over a splice site
                        $read_count += $feature->score;
                }
                print "\tNumber of spanning reads: ", $read_count, "\n";
            }
            print "\n";
        }
    }
}


Van: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] Namens Thibaut Hourlier
Verzonden: 19 November 2012 17:18
Aan: dev at ensembl.org
Onderwerp: Re: [ensembl-dev] The number of spanning reads for all genes

Dear Maartje,
First I have to warn you that for next release which is due in January we will update the models of the RNASeq data for human. These models have been generated with the improved pipeline.

It is "normal" that you can't get what you want because of the way we put the data in the database. You will need to use a DnaAlignFeatureAdaptor.
Did you have a look at the perl script on this thread: http://lists.ensembl.org/pipermail/dev/2012-September/008071.html. It's not exactly what you're trying to do but it is similar and it shows a way of getting the number of spanning reads. It is stored as the score of the DnaAlignFeature linked to the exons.

Regards
Thibaut
On 19/11/12 13:39, J.vandeVorst at gen.umcn.nl<mailto:J.vandeVorst at gen.umcn.nl> wrote:
Hello Ensemble team,

I was trying to get the number of spanning reads for one gene by using the API and the database homo_sapiens_rnaseq_69-37. I used the following Perl code, but I didn't get any results. The API version I use is 69 and I would like to get the number of spanning reads for all genes. I hope you can help me.

Thank you!

Regards Maartje

use Bio::EnsEMBL::DBSQL::DBAdaptor;

$db = new Bio::EnsEMBL::DBSQL::DBAdaptor(
                 -host =>  'ensembldb.ensembl.org',
                 -port =>  5306,
                 -user =>  'anonymous',
                 -dbname =>  'homo_sapiens_core_69_37',
                 -species => 'Homo_sapiens');
$ga = $db->get_GeneAdaptor();
$gene = $ga->fetch_by_stable_id("ENSG00000109061");
$slice = $gene->slice;
$rnaseqdb = new Bio::EnsEMBL::DBSQL::DBAdaptor(
                 -host =>  'ensembldb.ensembl.org',
                 -port =>  5306,
                 -user =>  'anonymous',
                 -dbname =>  'homo_sapiens_rnaseq_69_37');
$rnaseqsa = $rnaseqdb->get_SliceAdaptor();
$rnaseqslice = $rnaseqsa->fetch_by_name($slice->name);
@transcripts = @{$rnaseqslice->get_all_Transcripts('skeletal_rnaseq')};
foreach my $transcript (@transcripts) {
     foreach my $sf (@{$transcript->get_all_supporting_features()}) {
        #The number of reads that spanned accross the intron
        print STDOUT $sf->hit_name, ' :', $sf->score, "\n";
    }
}



Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.




_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/




Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20121122/37e7ae23/attachment.html>


More information about the Dev mailing list