[ensembl-dev] getting exons from database directly

Rhoda Kinsella rhoda at ebi.ac.uk
Mon May 9 09:05:50 BST 2011


Hi Andrea
Can you send me the details of your biomart query so that I can look  
into the issue?
Regards
Rhoda

On 6 May 2011, at 17:04, Andrea Edwards wrote:

> Hi Bert,
>
>
> I agree with your code and i agree with the sql i originally posted.  
> I don't understand why biomart and this perl code (below)
> are only returning 25k, and it seems too coincidental they are  
> returning the same number
>
> The difference in results is huge. What exons is this code missing?  
> All i could think of was predicted exons but it seems
> unlikely there are 25k known exons and (250k-25k = 225k) predicted  
> exons not assigned to genes. I don't even know if ensembl deals with  
> predicted exons.
> I got the same 'discrepancy' with figures when i tested human too.
>
> ===============================================
>
> my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',  
> 'Gene' );
> my $genes = $gene_adaptor->fetch_all();
>
>
> $total_genes=0;
> $exon_count = 0;
> foreach $gene(@{$genes}) {
>     $total_genes++;
>
>     foreach $exon ($gene->get_all_Exons()) {
>       $exon_count++;
>     }
> } #end for each gene
>
>
> =============================================
>
>
> Thank you very much
>
> On 06/05/11 16:44, Bert Overduin wrote:
>>
>> Hi,
>>
>> When I use the following code:
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use Bio::EnsEMBL::Registry;
>>
>> my $reg = "Bio::EnsEMBL::Registry";
>>
>> $reg->load_registry_from_db( -host => 'ensembldb.ensembl.org', - 
>> user => 'anonymous' );
>>
>> my $exon_adaptor  = $reg->get_adaptor( 'Bos taurus', 'Core',  
>> 'Exon' );
>>
>> my $exons = $exon_adaptor->fetch_all;
>>
>> print scalar( @{$exons} ), "\n";
>>
>> I get:
>>
>> farm2-head2[bert]2: perl test.pl
>> 225837
>>
>> Which is the same number I get with a MySQL query:
>>
>> mysql -u anonymous -h ensembldb.ensembl.org -P 5306
>> Welcome to the MySQL monitor.  Commands end with ; or \g.
>> Your MySQL connection id is 8610 to server version: 5.1.34-log
>>
>> Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
>>
>> mysql> use bos_taurus_core_62_4k
>> Reading table information for completion of table and column names
>> You can turn off this feature to get a quicker startup with -A
>>
>> Database changed
>> mysql> SELECT COUNT(*) FROM exon;
>> +----------+
>> | COUNT(*) |
>> +----------+
>> |   225837 |
>> +----------+
>> 1 row in set (0.01 sec)
>>
>> Cheers,
>> Bert
>>
>>
>> On Fri, May 6, 2011 at 4:22 PM, Andrea Edwards  
>> <edwardsa at cs.man.ac.uk> wrote:
>> I tried 2 ways :
>>
>> ===============================================
>>
>> my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',  
>> 'Gene' );
>> my $genes = $gene_adaptor->fetch_all();
>>
>> my $exon_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',  
>> 'Exon' );
>> $total_genes=0;
>> $exon_count = 0;
>> foreach $gene(@{$genes}) {
>>     $total_genes++;
>>
>>     foreach $exon ($gene->get_all_Exons()) {
>>       $exon_count++;
>>     }
>> } #end for each gene
>>
>>
>> =============================================
>>
>> This way gave even less (23k) but i'm being stricter here about the  
>> chromosomes
>>
>> @slices = @{ $slice_adaptor->fetch_all('chromosome', undef, 0, 1) };
>>
>> $total_genes=0;
>> $exon_count = 0;
>> foreach $slice (@slices) {
>>     unless ($slice->seq_region_name() =~ /Un/) {
>>         print $slice->seq_region_name."\n";
>>         my $genes = $gene_adaptor->fetch_all_by_Slice($slice);
>>
>>
>>         foreach my $gene(@{$genes}) {
>>             $total_genes++;
>>
>>             foreach my $exon ($gene->get_all_Exons()) {
>>                   $exon_count++;
>>                   print "$exon_count\n";
>>             }
>>
>>
>>
>>
>>         } #end for each gene
>>     }
>> }
>>
>> ==============================================
>>
>> But neither give anything like the sql results
>>
>> Why does the sql give so many more? Which should I use?
>>
>> thank you
>>
>>
>>
>> On 06/05/11 15:50, Bert Overduin wrote:
>>>
>>> Hi Andrea,
>>>
>>> I suspect that your BioMart results are truncated because the  
>>> query is too large.
>>>
>>> However, that doesn't explain your API results .... How does your  
>>> API code look like?
>>>
>>> Cheers,
>>> Bert
>>>
>>> On Fri, May 6, 2011 at 3:45 PM, Andrea Edwards <edwardsa at cs.man.ac.uk 
>>> > wrote:
>>> Hello
>>>
>>> I'm sorry for the basic question but I was looking at the ensembl  
>>> core schema and trying to retrieve just the exons on chromosomes  
>>> and couldn't work out why i am getting such different figures than  
>>> with biomart and the perl api
>>>
>>> For example for cow there are 25670 exons in genes with biomart  
>>> and the api but with this sql  ~210k exons. This code is just  
>>> looking for exons on chromosomes 1-30 and X
>>>
>>> select count(distinct stable_id) from exon e inner join  
>>> exon_stable_id es using(exon_id) inner join seq_region sr  
>>> using(seq_region_id) where sr.coord_system_id = 2 and  
>>> sr.nameREGEXP '^[1-9]|^X'  and e.is_current=1
>>>
>>> I get 8k just on chromosome 1
>>>
>>> I'm sure this is simple and perhaps its because its Friday  
>>> afternoon but I'm just not seeing it!!
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> -- 
>>> Bert Overduin, Ph.D.
>>> Vertebrate Genomics Team
>>>
>>> EMBL - European Bioinformatics Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton, Cambridge CB10 1SD
>>> United Kingdom
>>>
>>> http://www.ebi.ac.uk/~bert
>>>
>>> Ensembl browser: http://www.ensembl.org
>>> Mailing lists: http://www.ensembl.org/info/about/contact/ 
>>> mailing.html
>>> Blog: http://www.ensembl.info
>>> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
>>> Facebook: http://www.facebook.com/Ensembl.org
>>> Twitter: http://twitter.com/Ensembl
>>>
>>
>>
>>
>>
>> -- 
>> Bert Overduin, Ph.D.
>> Vertebrate Genomics Team
>>
>> EMBL - European Bioinformatics Institute
>> Wellcome Trust Genome Campus
>> Hinxton, Cambridge CB10 1SD
>> United Kingdom
>>
>> http://www.ebi.ac.uk/~bert
>>
>> Ensembl browser: http://www.ensembl.org
>> Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>> Blog: http://www.ensembl.info
>> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
>> Facebook: http://www.facebook.com/Ensembl.org
>> Twitter: http://twitter.com/Ensembl
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110509/0d70db8f/attachment.html>


More information about the Dev mailing list