[ensembl-dev] getting exons from database directly

Andrea Edwards edwardsa at cs.man.ac.uk
Fri May 6 17:04:38 BST 2011


Hi Bert,


I agree with your code and i agree with the sql i originally posted. I 
don't understand why biomart and this perl code (below)
are only returning 25k, and it seems too coincidental they are returning 
the same number

The difference in results is huge. What exons is this code missing? All 
i could think of was predicted exons but it seems
unlikely there are 25k known exons and (250k-25k = 225k) predicted exons 
not assigned to genes. I don't even know if ensembl deals with predicted 
exons.
I got the same 'discrepancy' with figures when i tested human too.

===============================================

my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core', 'Gene' );
my $genes = $gene_adaptor->fetch_all();


$total_genes=0;
$exon_count = 0;
foreach $gene(@{$genes}) {
     $total_genes++;

     foreach $exon ($gene->get_all_Exons()) {
       $exon_count++;
     }
} #end for each gene


=============================================


Thank you very much

On 06/05/11 16:44, Bert Overduin wrote:
> Hi,
>
> When I use the following code:
>
> #!/usr/bin/perl
>
>
> use strict;
>
> use Bio::EnsEMBL::Registry;
>
>
> my $reg = "Bio::EnsEMBL::Registry";
>
>
> $reg->load_registry_from_db( -host => 'ensembldb.ensembl.org 
> <http://ensembldb.ensembl.org>', -user => 'anonymous' );
>
>
> my $exon_adaptor  = $reg->get_adaptor( 'Bos taurus', 'Core', 'Exon' );
>
>
> my $exons = $exon_adaptor->fetch_all;
>
>
> print scalar( @{$exons} ), "\n";
>
>
> I get:
>
> farm2-head2[bert]2: perl test.pl <http://test.pl>
> 225837
>
> Which is the same number I get with a MySQL query:
>
> mysql -u anonymous -h ensembldb.ensembl.org 
> <http://ensembldb.ensembl.org> -P 5306
> Welcome to the MySQL monitor.  Commands end with ; or \g.
> Your MySQL connection id is 8610 to server version: 5.1.34-log
>
> Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
>
> mysql> use bos_taurus_core_62_4k
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
>
> Database changed
> mysql> SELECT COUNT(*) FROM exon;
> +----------+
> | COUNT(*) |
> +----------+
> |   225837 |
> +----------+
> 1 row in set (0.01 sec)
>
> Cheers,
> Bert
>
>
> On Fri, May 6, 2011 at 4:22 PM, Andrea Edwards <edwardsa at cs.man.ac.uk 
> <mailto:edwardsa at cs.man.ac.uk>> wrote:
>
>     I tried 2 ways :
>
>     ===============================================
>
>     my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',
>     'Gene' );
>     my $genes = $gene_adaptor->fetch_all();
>
>     my $exon_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',
>     'Exon' );
>     $total_genes=0;
>     $exon_count = 0;
>     foreach $gene(@{$genes}) {
>         $total_genes++;
>
>         foreach $exon ($gene->get_all_Exons()) {
>           $exon_count++;
>         }
>     } #end for each gene
>
>
>     =============================================
>
>     This way gave even less (23k) but i'm being stricter here about
>     the chromosomes
>
>     @slices = @{ $slice_adaptor->fetch_all('chromosome', undef, 0, 1) };
>
>     $total_genes=0;
>     $exon_count = 0;
>     foreach $slice (@slices) {
>         unless ($slice->seq_region_name() =~ /Un/) {
>             print $slice->seq_region_name."\n";
>             my $genes = $gene_adaptor->fetch_all_by_Slice($slice);
>
>
>             foreach my $gene(@{$genes}) {
>                 $total_genes++;
>
>                 foreach my $exon ($gene->get_all_Exons()) {
>                       $exon_count++;
>                       print "$exon_count\n";
>                 }
>
>
>
>
>             } #end for each gene
>         }
>     }
>
>     ==============================================
>
>     But neither give anything like the sql results
>
>     Why does the sql give so many more? Which should I use?
>
>     thank you
>
>
>
>     On 06/05/11 15:50, Bert Overduin wrote:
>>     Hi Andrea,
>>
>>     I suspect that your BioMart results are truncated because the
>>     query is too large.
>>
>>     However, that doesn't explain your API results .... How does your
>>     API code look like?
>>
>>     Cheers,
>>     Bert
>>
>>     On Fri, May 6, 2011 at 3:45 PM, Andrea Edwards
>>     <edwardsa at cs.man.ac.uk <mailto:edwardsa at cs.man.ac.uk>> wrote:
>>
>>         Hello
>>
>>         I'm sorry for the basic question but I was looking at the
>>         ensembl core schema and trying to retrieve just the exons on
>>         chromosomes and couldn't work out why i am getting such
>>         different figures than with biomart and the perl api
>>
>>         For example for cow there are 25670 exons in genes with
>>         biomart and the api but with this sql  ~210k exons. This code
>>         is just looking for exons on chromosomes 1-30 and X
>>
>>         select count(distinct stable_id) from exon e inner join
>>         exon_stable_id es using(exon_id) inner join seq_region sr
>>         using(seq_region_id) where sr.coord_system_id = 2 and sr.name
>>         <http://sr.name> REGEXP '^[1-9]|^X'  and e.is_current=1
>>
>>         I get 8k just on chromosome 1
>>
>>         I'm sure this is simple and perhaps its because its Friday
>>         afternoon but I'm just not seeing it!!
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         List admin (including subscribe/unsubscribe):
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     -- 
>>     Bert Overduin, Ph.D.
>>     Vertebrate Genomics Team
>>
>>     EMBL - European Bioinformatics Institute
>>     Wellcome Trust Genome Campus
>>     Hinxton, Cambridge CB10 1SD
>>     United Kingdom
>>
>>     http://www.ebi.ac.uk/~bert <http://www.ebi.ac.uk/%7Ebert>
>>
>>     Ensembl browser: http://www.ensembl.org <http://www.ensembl.org/>
>>
>>     Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>>
>>     Blog: http://www.ensembl.info <http://www.ensembl.info/>
>>
>>     YouTube: http://www.youtube.com/user/EnsemblHelpdesk
>>     Facebook: http://www.facebook.com/Ensembl.org
>>     Twitter: http://twitter.com/Ensembl
>>
>>
>
>
>
>
> -- 
> Bert Overduin, Ph.D.
> Vertebrate Genomics Team
>
> EMBL - European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SD
> United Kingdom
>
> http://www.ebi.ac.uk/~bert <http://www.ebi.ac.uk/%7Ebert>
>
> Ensembl browser: http://www.ensembl.org <http://www.ensembl.org/>
>
> Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>
> Blog: http://www.ensembl.info <http://www.ensembl.info/>
>
> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
> Facebook: http://www.facebook.com/Ensembl.org
> Twitter: http://twitter.com/Ensembl
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110506/a44ee255/attachment.html>


More information about the Dev mailing list