[ensembl-dev] The number of spanning reads for all genes

Thibaut Hourlier th3 at sanger.ac.uk
Thu Dec 13 10:35:42 GMT 2012


Dear Maartje,
Sorry for the late reply.
The feature you are printing are the introns, you will need to call the 
method get_all_Exons from a transcript object, and you will need to look 
at the exon boundaries to see which exon correspond to an intron.

The extra number you that is printed correspond to the sum of all the 
intron spanning reads.

Regards
Thibaut

On 22/11/12 09:39, J.vandeVorst at gen.umcn.nl wrote:
>
> Dear Thibaut,
>
> I appreciate your quick response. I now have an idea how to get the 
> number of spanning reads, but I have another question. Because I would 
> like to have the number of spanning reads per gene, I fetch the dna 
> features that overlap the gene and count all the reads as you can see 
> in the code below. But as I print the feature IDs, I see only introns. 
> Why are there no exons? And sometimes a big number of reads is shown, 
> like 1728.44444444444. Is this an artefact or something?
>
> Regard Maartje
>
> #!/usr/bin/env perl
>
> use strict;
>
> use warnings;
>
> use Bio::EnsEMBL::DBSQL::DBAdaptor;
>
> # Connection to the rnaseq database
>
> my $rnaseqdb = new Bio::EnsEMBL::DBSQL::DBAdaptor(
>
> -host =>  'ensembldb.ensembl.org',
>
> -port =>  5306,
>
> -user =>  'anonymous',
>
> -dbname =>  'homo_sapiens_rnaseq_69_37');
>
> # The GeneAdaptor and the DnaFeatureAdaptor are needed
>
> my $rnaseqsa = $rnaseqdb->get_SliceAdaptor();
>
> my $rnaseqaa = $rnaseqdb->get_AnalysisAdaptor();
>
> my $rnaseqdafa = $rnaseqdb->get_DnaAlignFeatureAdaptor();
>
> foreach my $slice (@{$rnaseqsa->fetch_all('toplevel')}) {
>
> foreach my $analysis (@{$rnaseqaa->fetch_all()}) {
>
> # To analyse with a logic name
>
> my $logic_name = $analysis->logic_name;
>
> next if ($logic_name !~ /rnaseq/);
>
> print 'Looking at genes from analysis: ', $logic_name, "\n";
>
> # To load the gene
>
> foreach my $gene (@{$slice->get_all_Genes_by_type(undef, $logic_name, 
> 1)}) {
>
> print $gene->display_id, ':', $gene->start, ':', $gene->end, ':', 
> $gene->strand, ' on chromosome ', $gene->seq_region_name, "\n";
>
> # Rnaseq models have only one transcript per gene
>
> foreach my $transcript (@{$gene->get_all_Transcripts()}) {
>
> print $transcript->display_id, ':', $transcript->start, ':', 
> $transcript->end, '  length: ', $transcript->length, "\n";
>
>                my $tslice = $transcript->feature_Slice();
>
> # To fetch the dna features that overlap the gene
>
> my $features = $rnaseqdafa->fetch_all_by_Slice($tslice);
>
> my $read_count = 0;
>
> foreach my $feature (@{$features}) {
>
> print "\t", $feature->display_id, ':', $feature->seq_region_start, 
> ':', $feature->seq_region_end, "\tNumber of spanning reads: ", 
> $feature->score, "\n";
>
> # To count all the reads that span over a splice site
>
> $read_count += $feature->score;
>
> }
>
> print "\tNumber of spanning reads: ", $read_count, "\n";
>
> }
>
> print "\n";
>
> }
>
> }
>
> }
>
> *Van:*dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *Namens 
> *Thibaut Hourlier
> *Verzonden:* 19 November 2012 17:18
> *Aan:* dev at ensembl.org
> *Onderwerp:* Re: [ensembl-dev] The number of spanning reads for all genes
>
> Dear Maartje,
> First I have to warn you that for next release which is due in January 
> we will update the models of the RNASeq data for human. These models 
> have been generated with the improved pipeline.
>
> It is "normal" that you can't get what you want because of the way we 
> put the data in the database. You will need to use a 
> DnaAlignFeatureAdaptor.
> Did you have a look at the perl script on this thread: 
> http://lists.ensembl.org/pipermail/dev/2012-September/008071.html. 
> It's not exactly what you're trying to do but it is similar and it 
> shows a way of getting the number of spanning reads. It is stored as 
> the score of the DnaAlignFeature linked to the exons.
>
> Regards
> Thibaut
>
> On 19/11/12 13:39, J.vandeVorst at gen.umcn.nl 
> <mailto:J.vandeVorst at gen.umcn.nl> wrote:
>
>     Hello Ensemble team,
>
>     I was trying to get the number of spanning reads for one gene by
>     using the API and the database homo_sapiens_rnaseq_69-37. I used
>     the following Perl code, but I didn't get any results. The API
>     version I use is 69 and I would like to get the number of spanning
>     reads for all genes. I hope you can help me.
>
>
>     Thank you!
>
>     Regards Maartje
>
>     use Bio::EnsEMBL::DBSQL::DBAdaptor;
>
>     $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(
>
>                      -host =>  'ensembldb.ensembl.org',
>
>                      -port =>  5306,
>
>                      -user =>  'anonymous',
>
>     -dbname =>  'homo_sapiens_core_69_37',
>
>     -species => 'Homo_sapiens');
>
>     $ga = $db->get_GeneAdaptor();
>
>     $gene = $ga->fetch_by_stable_id("ENSG00000109061");
>
>     $slice = $gene->slice;
>
>     $rnaseqdb = new Bio::EnsEMBL::DBSQL::DBAdaptor(
>
>                      -host =>  'ensembldb.ensembl.org',
>
>                      -port =>  5306,
>
>                      -user =>  'anonymous',
>
>               -dbname =>  'homo_sapiens_rnaseq_69_37');
>
>     $rnaseqsa = $rnaseqdb->get_SliceAdaptor();
>
>     $rnaseqslice = $rnaseqsa->fetch_by_name($slice->name);
>
>     @transcripts =
>     @{$rnaseqslice->get_all_Transcripts('skeletal_rnaseq')};
>
>     foreach my $transcript (@transcripts) {
>
>          foreach my $sf (@{$transcript->get_all_supporting_features()}) {
>
>             #The number of reads that spanned accross the intron
>
>             print STDOUT $sf->hit_name, ' :', $sf->score, "\n";
>
>         }
>
>     }
>
>
>     Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel
>     in het handelsregister onder nummer 41055629.
>     The Radboud University Nijmegen Medical Centre is listed in the
>     Commercial Register of the Chamber of Commerce under file number
>     41055629.
>
>
>
>
>     _______________________________________________
>
>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>     Ensembl Blog:http://www.ensembl.info/
>
>
> Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in 
> het handelsregister onder nummer 41055629.
> The Radboud University Nijmegen Medical Centre is listed in the 
> Commercial Register of the Chamber of Commerce under file number 41055629.
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20121213/bac2a6b9/attachment.html>


More information about the Dev mailing list