[ensembl-dev] getting exons from database directly
Bert Overduin
bert at ebi.ac.uk
Fri May 6 16:44:11 BST 2011
Hi,
When I use the following code:
#!/usr/bin/perl
use strict;
use Bio::EnsEMBL::Registry;
my $reg = "Bio::EnsEMBL::Registry";
$reg->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user =>
'anonymous' );
my $exon_adaptor = $reg->get_adaptor( 'Bos taurus', 'Core', 'Exon' );
my $exons = $exon_adaptor->fetch_all;
print scalar( @{$exons} ), "\n";
I get:
farm2-head2[bert]2: perl test.pl
225837
Which is the same number I get with a MySQL query:
mysql -u anonymous -h ensembldb.ensembl.org -P 5306
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8610 to server version: 5.1.34-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> use bos_taurus_core_62_4k
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT COUNT(*) FROM exon;
+----------+
| COUNT(*) |
+----------+
| 225837 |
+----------+
1 row in set (0.01 sec)
Cheers,
Bert
On Fri, May 6, 2011 at 4:22 PM, Andrea Edwards <edwardsa at cs.man.ac.uk>wrote:
> I tried 2 ways :
>
> ===============================================
>
> my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core', 'Gene' );
> my $genes = $gene_adaptor->fetch_all();
>
> my $exon_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core', 'Exon' );
> $total_genes=0;
> $exon_count = 0;
> foreach $gene(@{$genes}) {
> $total_genes++;
>
> foreach $exon ($gene->get_all_Exons()) {
> $exon_count++;
> }
> } #end for each gene
>
>
> =============================================
>
> This way gave even less (23k) but i'm being stricter here about the
> chromosomes
>
> @slices = @{ $slice_adaptor->fetch_all('chromosome', undef, 0, 1) };
>
> $total_genes=0;
> $exon_count = 0;
> foreach $slice (@slices) {
> unless ($slice->seq_region_name() =~ /Un/) {
> print $slice->seq_region_name."\n";
> my $genes = $gene_adaptor->fetch_all_by_Slice($slice);
>
>
> foreach my $gene(@{$genes}) {
> $total_genes++;
>
> foreach my $exon ($gene->get_all_Exons()) {
> $exon_count++;
> print "$exon_count\n";
> }
>
>
>
>
> } #end for each gene
> }
> }
>
> ==============================================
>
> But neither give anything like the sql results
>
> Why does the sql give so many more? Which should I use?
>
> thank you
>
>
>
> On 06/05/11 15:50, Bert Overduin wrote:
>
> Hi Andrea,
>
> I suspect that your BioMart results are truncated because the query is
> too large.
>
> However, that doesn't explain your API results .... How does your API
> code look like?
>
> Cheers,
> Bert
>
> On Fri, May 6, 2011 at 3:45 PM, Andrea Edwards <edwardsa at cs.man.ac.uk>wrote:
>
>> Hello
>>
>> I'm sorry for the basic question but I was looking at the ensembl core
>> schema and trying to retrieve just the exons on chromosomes and couldn't
>> work out why i am getting such different figures than with biomart and the
>> perl api
>>
>> For example for cow there are 25670 exons in genes with biomart and the
>> api but with this sql ~210k exons. This code is just looking for exons on
>> chromosomes 1-30 and X
>>
>> select count(distinct stable_id) from exon e inner join exon_stable_id es
>> using(exon_id) inner join seq_region sr using(seq_region_id) where
>> sr.coord_system_id = 2 and sr.name REGEXP '^[1-9]|^X' and e.is_current=1
>>
>> I get 8k just on chromosome 1
>>
>> I'm sure this is simple and perhaps its because its Friday afternoon but
>> I'm just not seeing it!!
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
>
> --
> Bert Overduin, Ph.D.
> Vertebrate Genomics Team
>
> EMBL - European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SD
> United Kingdom
>
> http://www.ebi.ac.uk/~bert
>
> Ensembl browser: http://www.ensembl.org
>
> Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>
> Blog: http://www.ensembl.info
>
> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
> Facebook: http://www.facebook.com/Ensembl.org
> Twitter: http://twitter.com/Ensembl
>
>
>
--
Bert Overduin, Ph.D.
Vertebrate Genomics Team
EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom
http://www.ebi.ac.uk/~bert
Ensembl browser: http://www.ensembl.org
Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
Blog: http://www.ensembl.info
YouTube: http://www.youtube.com/user/EnsemblHelpdesk
Facebook: http://www.facebook.com/Ensembl.org
Twitter: http://twitter.com/Ensembl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110506/d5f25771/attachment.html>
More information about the Dev
mailing list