[ensembl-dev] getting exons from database directly

Bert Overduin bert at ebi.ac.uk
Fri May 6 16:44:11 BST 2011


Hi,

When I use the following code:

   #!/usr/bin/perl


use strict;

use Bio::EnsEMBL::Registry;


my $reg = "Bio::EnsEMBL::Registry";


$reg->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user =>
'anonymous' );


my $exon_adaptor  = $reg->get_adaptor( 'Bos taurus', 'Core', 'Exon' );


my $exons = $exon_adaptor->fetch_all;


print scalar( @{$exons} ), "\n";

I get:

farm2-head2[bert]2: perl test.pl
225837

Which is the same number I get with a MySQL query:

mysql -u anonymous -h ensembldb.ensembl.org -P 5306
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8610 to server version: 5.1.34-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use bos_taurus_core_62_4k
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> SELECT COUNT(*) FROM exon;
+----------+
| COUNT(*) |
+----------+
|   225837 |
+----------+
1 row in set (0.01 sec)

Cheers,
Bert


On Fri, May 6, 2011 at 4:22 PM, Andrea Edwards <edwardsa at cs.man.ac.uk>wrote:

>  I tried 2 ways :
>
> ===============================================
>
> my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core', 'Gene' );
> my $genes = $gene_adaptor->fetch_all();
>
> my $exon_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core', 'Exon' );
> $total_genes=0;
> $exon_count = 0;
> foreach $gene(@{$genes}) {
>     $total_genes++;
>
>     foreach $exon ($gene->get_all_Exons()) {
>       $exon_count++;
>     }
> } #end for each gene
>
>
> =============================================
>
> This way gave even less (23k) but i'm being stricter here about the
> chromosomes
>
> @slices = @{ $slice_adaptor->fetch_all('chromosome', undef, 0, 1) };
>
> $total_genes=0;
> $exon_count = 0;
> foreach $slice (@slices) {
>     unless ($slice->seq_region_name() =~ /Un/) {
>         print $slice->seq_region_name."\n";
>         my $genes = $gene_adaptor->fetch_all_by_Slice($slice);
>
>
>         foreach my $gene(@{$genes}) {
>             $total_genes++;
>
>             foreach my $exon ($gene->get_all_Exons()) {
>                   $exon_count++;
>                   print "$exon_count\n";
>             }
>
>
>
>
>         } #end for each gene
>     }
> }
>
> ==============================================
>
> But neither give anything like the sql results
>
> Why does the sql give so many more? Which should I use?
>
> thank you
>
>
>
> On 06/05/11 15:50, Bert Overduin wrote:
>
> Hi Andrea,
>
>  I suspect that your BioMart results are truncated because the query is
> too large.
>
>  However, that doesn't explain your API results .... How does your API
> code look like?
>
>  Cheers,
> Bert
>
> On Fri, May 6, 2011 at 3:45 PM, Andrea Edwards <edwardsa at cs.man.ac.uk>wrote:
>
>> Hello
>>
>> I'm sorry for the basic question but I was looking at the ensembl core
>> schema and trying to retrieve just the exons on chromosomes and couldn't
>> work out why i am getting such different figures than with biomart and the
>> perl api
>>
>> For example for cow there are 25670 exons in genes with biomart and the
>> api but with this sql  ~210k exons. This code is just looking for exons on
>> chromosomes 1-30 and X
>>
>> select count(distinct stable_id) from exon e inner join exon_stable_id es
>> using(exon_id) inner join seq_region sr using(seq_region_id) where
>> sr.coord_system_id = 2 and sr.name REGEXP '^[1-9]|^X'  and e.is_current=1
>>
>> I get 8k just on chromosome 1
>>
>> I'm sure this is simple and perhaps its because its Friday afternoon but
>> I'm just not seeing it!!
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
>
> --
> Bert Overduin, Ph.D.
> Vertebrate Genomics Team
>
> EMBL - European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SD
> United Kingdom
>
> http://www.ebi.ac.uk/~bert
>
>  Ensembl browser: http://www.ensembl.org
>
> Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>
> Blog: http://www.ensembl.info
>
> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
> Facebook: http://www.facebook.com/Ensembl.org
> Twitter: http://twitter.com/Ensembl
>
>
>


-- 
Bert Overduin, Ph.D.
Vertebrate Genomics Team

EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom

http://www.ebi.ac.uk/~bert

Ensembl browser: http://www.ensembl.org

Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html

Blog: http://www.ensembl.info

YouTube: http://www.youtube.com/user/EnsemblHelpdesk
Facebook: http://www.facebook.com/Ensembl.org
Twitter: http://twitter.com/Ensembl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110506/d5f25771/attachment.html>


More information about the Dev mailing list