[ensembl-dev] gene count anomaly

john samuel john.samuel at senecacollege.ca
Mon Dec 7 02:17:18 GMT 2015


Thanks Dan.
I thought of that, and I tried the same code but looking for genes in 
all the scaffolds, thinking that there might be some unplaced scaffolds, 
but the total for all scaffolds adds up to 31,501.  This could be, as 
you said, all the genes mapped to chromosomes, plus some unplaced 
scaffolds, but that doesn't match any of the other totals, so I'm no 
closer to knowing which total is correct.
Any other thoughts?
John

On 15-12-06 09:07 PM, Daniel Lawson wrote:
> Hi John,
>
> There may be other sequences in the assembly that have not been 
> assigned to a chromosome. You can check this via the API or in Mart. I 
> expect you'll find a bunch of small sequences that harbour some genes 
> - maybe that will get your totals to balance.
>
> cheers
> Dan
>
>
>
>
> On 7 December 2015 at 01:59, john samuel <john.samuel at senecacollege.ca 
> <mailto:john.samuel at senecacollege.ca>> wrote:
>
>     Hi,
>     I am trying to get an accurate count of all the ENSDARG genes from
>     the latest zebrafish data (GRCz10) in ensembl.
>     If I use the perl api to get all the genes in all the chromosomes
>     I get a total of 31,388 i.e.
>
>     my $slice_adaptor = $registry->get_adaptor( 'danio_rerio', 'Core',
>     'Slice' );
>     my @slices = @{ $slice_adaptor->fetch_all('chromosome') };
>     my $total = 0;
>     my %all;
>     foreach my $slice (@slices) {
>         my @genes = @{ $slice->get_all_Genes() };
>         my $count = scalar @genes;
>         $all{$slice->seq_region_name()}=$count;
>         $total += $count;
>     }
>     foreach my $sorted (sort {$a<=>$b} keys %all) {
>         print "chromosome: $sorted\t$all{$sorted}\n";
>     }
>     print "gene total is\t$total\n";
>
>     chromosome: MT    37
>     chromosome: 1    1386
>     chromosome: 2    1587
>     chromosome: 3    1611
>     chromosome: 4    3103
>     chromosome: 5    1704
>     chromosome: 6    1280
>     chromosome: 7    1507
>     chromosome: 8    1216
>     chromosome: 9    1108
>     chromosome: 10    1108
>     chromosome: 11    1039
>     chromosome: 12    952
>     chromosome: 13    1013
>     chromosome: 14    953
>     chromosome: 15    1146
>     chromosome: 16    1241
>     chromosome: 17    1048
>     chromosome: 18    942
>     chromosome: 19    1123
>     chromosome: 20    1253
>     chromosome: 21    1092
>     chromosome: 22    1174
>     chromosome: 23    1031
>     chromosome: 24    800
>     chromosome: 25    934
>     gene total is    31388
>
>     Anyone see anything wrong with how I get the total?  I don't, but
>     then when I go to biomart (see below), I get a total of 31953
>
>
>
>     and if I go to the info page for the genome at
>     http://useast.ensembl.org/Danio_rerio/Info/Annotation I see a
>     differenttotal there too (31,650 not counting pseudogenes).
>
>
>
>     Anyone have any idea why the different totals and which one to
>     believe and whether there's anything wrong with using the one that
>     my code calculated as the definitive one?  I need to compare the
>     total number of genes vs. the number that we are finding under
>     certain conditions, to do some stats.
>     John
>
>
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> -- 
> VectorBase | i5K insect genome initiative
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151206/5f20039a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 38620 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151206/5f20039a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 23932 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151206/5f20039a/attachment-0001.png>


More information about the Dev mailing list