[ensembl-dev] getting exons from database directly
Rhoda Kinsella
rhoda at ebi.ac.uk
Mon May 9 09:05:50 BST 2011
Hi Andrea
Can you send me the details of your biomart query so that I can look
into the issue?
Regards
Rhoda
On 6 May 2011, at 17:04, Andrea Edwards wrote:
> Hi Bert,
>
>
> I agree with your code and i agree with the sql i originally posted.
> I don't understand why biomart and this perl code (below)
> are only returning 25k, and it seems too coincidental they are
> returning the same number
>
> The difference in results is huge. What exons is this code missing?
> All i could think of was predicted exons but it seems
> unlikely there are 25k known exons and (250k-25k = 225k) predicted
> exons not assigned to genes. I don't even know if ensembl deals with
> predicted exons.
> I got the same 'discrepancy' with figures when i tested human too.
>
> ===============================================
>
> my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',
> 'Gene' );
> my $genes = $gene_adaptor->fetch_all();
>
>
> $total_genes=0;
> $exon_count = 0;
> foreach $gene(@{$genes}) {
> $total_genes++;
>
> foreach $exon ($gene->get_all_Exons()) {
> $exon_count++;
> }
> } #end for each gene
>
>
> =============================================
>
>
> Thank you very much
>
> On 06/05/11 16:44, Bert Overduin wrote:
>>
>> Hi,
>>
>> When I use the following code:
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use Bio::EnsEMBL::Registry;
>>
>> my $reg = "Bio::EnsEMBL::Registry";
>>
>> $reg->load_registry_from_db( -host => 'ensembldb.ensembl.org', -
>> user => 'anonymous' );
>>
>> my $exon_adaptor = $reg->get_adaptor( 'Bos taurus', 'Core',
>> 'Exon' );
>>
>> my $exons = $exon_adaptor->fetch_all;
>>
>> print scalar( @{$exons} ), "\n";
>>
>> I get:
>>
>> farm2-head2[bert]2: perl test.pl
>> 225837
>>
>> Which is the same number I get with a MySQL query:
>>
>> mysql -u anonymous -h ensembldb.ensembl.org -P 5306
>> Welcome to the MySQL monitor. Commands end with ; or \g.
>> Your MySQL connection id is 8610 to server version: 5.1.34-log
>>
>> Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
>>
>> mysql> use bos_taurus_core_62_4k
>> Reading table information for completion of table and column names
>> You can turn off this feature to get a quicker startup with -A
>>
>> Database changed
>> mysql> SELECT COUNT(*) FROM exon;
>> +----------+
>> | COUNT(*) |
>> +----------+
>> | 225837 |
>> +----------+
>> 1 row in set (0.01 sec)
>>
>> Cheers,
>> Bert
>>
>>
>> On Fri, May 6, 2011 at 4:22 PM, Andrea Edwards
>> <edwardsa at cs.man.ac.uk> wrote:
>> I tried 2 ways :
>>
>> ===============================================
>>
>> my $gene_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',
>> 'Gene' );
>> my $genes = $gene_adaptor->fetch_all();
>>
>> my $exon_adaptor = $registry->get_adaptor( 'bos_taurus', 'Core',
>> 'Exon' );
>> $total_genes=0;
>> $exon_count = 0;
>> foreach $gene(@{$genes}) {
>> $total_genes++;
>>
>> foreach $exon ($gene->get_all_Exons()) {
>> $exon_count++;
>> }
>> } #end for each gene
>>
>>
>> =============================================
>>
>> This way gave even less (23k) but i'm being stricter here about the
>> chromosomes
>>
>> @slices = @{ $slice_adaptor->fetch_all('chromosome', undef, 0, 1) };
>>
>> $total_genes=0;
>> $exon_count = 0;
>> foreach $slice (@slices) {
>> unless ($slice->seq_region_name() =~ /Un/) {
>> print $slice->seq_region_name."\n";
>> my $genes = $gene_adaptor->fetch_all_by_Slice($slice);
>>
>>
>> foreach my $gene(@{$genes}) {
>> $total_genes++;
>>
>> foreach my $exon ($gene->get_all_Exons()) {
>> $exon_count++;
>> print "$exon_count\n";
>> }
>>
>>
>>
>>
>> } #end for each gene
>> }
>> }
>>
>> ==============================================
>>
>> But neither give anything like the sql results
>>
>> Why does the sql give so many more? Which should I use?
>>
>> thank you
>>
>>
>>
>> On 06/05/11 15:50, Bert Overduin wrote:
>>>
>>> Hi Andrea,
>>>
>>> I suspect that your BioMart results are truncated because the
>>> query is too large.
>>>
>>> However, that doesn't explain your API results .... How does your
>>> API code look like?
>>>
>>> Cheers,
>>> Bert
>>>
>>> On Fri, May 6, 2011 at 3:45 PM, Andrea Edwards <edwardsa at cs.man.ac.uk
>>> > wrote:
>>> Hello
>>>
>>> I'm sorry for the basic question but I was looking at the ensembl
>>> core schema and trying to retrieve just the exons on chromosomes
>>> and couldn't work out why i am getting such different figures than
>>> with biomart and the perl api
>>>
>>> For example for cow there are 25670 exons in genes with biomart
>>> and the api but with this sql ~210k exons. This code is just
>>> looking for exons on chromosomes 1-30 and X
>>>
>>> select count(distinct stable_id) from exon e inner join
>>> exon_stable_id es using(exon_id) inner join seq_region sr
>>> using(seq_region_id) where sr.coord_system_id = 2 and
>>> sr.nameREGEXP '^[1-9]|^X' and e.is_current=1
>>>
>>> I get 8k just on chromosome 1
>>>
>>> I'm sure this is simple and perhaps its because its Friday
>>> afternoon but I'm just not seeing it!!
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> --
>>> Bert Overduin, Ph.D.
>>> Vertebrate Genomics Team
>>>
>>> EMBL - European Bioinformatics Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton, Cambridge CB10 1SD
>>> United Kingdom
>>>
>>> http://www.ebi.ac.uk/~bert
>>>
>>> Ensembl browser: http://www.ensembl.org
>>> Mailing lists: http://www.ensembl.org/info/about/contact/
>>> mailing.html
>>> Blog: http://www.ensembl.info
>>> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
>>> Facebook: http://www.facebook.com/Ensembl.org
>>> Twitter: http://twitter.com/Ensembl
>>>
>>
>>
>>
>>
>> --
>> Bert Overduin, Ph.D.
>> Vertebrate Genomics Team
>>
>> EMBL - European Bioinformatics Institute
>> Wellcome Trust Genome Campus
>> Hinxton, Cambridge CB10 1SD
>> United Kingdom
>>
>> http://www.ebi.ac.uk/~bert
>>
>> Ensembl browser: http://www.ensembl.org
>> Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>> Blog: http://www.ensembl.info
>> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
>> Facebook: http://www.facebook.com/Ensembl.org
>> Twitter: http://twitter.com/Ensembl
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110509/0d70db8f/attachment.html>
More information about the Dev
mailing list