[ensembl-dev] gene count anomaly

Daniel Lawson lawson at ebi.ac.uk
Mon Dec 7 02:07:30 GMT 2015


Hi John,

There may be other sequences in the assembly that have not been assigned to
a chromosome. You can check this via the API or in Mart. I expect you'll
find a bunch of small sequences that harbour some genes - maybe that will
get your totals to balance.

cheers
Dan




On 7 December 2015 at 01:59, john samuel <john.samuel at senecacollege.ca>
wrote:

> Hi,
> I am trying to get an accurate count of all the ENSDARG genes from the
> latest zebrafish data (GRCz10) in ensembl.
> If I use the perl api to get all the genes in all the chromosomes I get a
> total of 31,388 i.e.
>
> my $slice_adaptor = $registry->get_adaptor( 'danio_rerio', 'Core', 'Slice'
> );
> my @slices = @{ $slice_adaptor->fetch_all('chromosome') };
> my $total = 0;
> my %all;
> foreach my $slice (@slices) {
>     my @genes = @{ $slice->get_all_Genes() };
>     my $count = scalar @genes;
>     $all{$slice->seq_region_name()}=$count;
>     $total += $count;
> }
> foreach my $sorted (sort {$a<=>$b} keys %all) {
>     print "chromosome: $sorted\t$all{$sorted}\n";
> }
> print "gene total is\t$total\n";
>
> chromosome: MT    37
> chromosome: 1    1386
> chromosome: 2    1587
> chromosome: 3    1611
> chromosome: 4    3103
> chromosome: 5    1704
> chromosome: 6    1280
> chromosome: 7    1507
> chromosome: 8    1216
> chromosome: 9    1108
> chromosome: 10    1108
> chromosome: 11    1039
> chromosome: 12    952
> chromosome: 13    1013
> chromosome: 14    953
> chromosome: 15    1146
> chromosome: 16    1241
> chromosome: 17    1048
> chromosome: 18    942
> chromosome: 19    1123
> chromosome: 20    1253
> chromosome: 21    1092
> chromosome: 22    1174
> chromosome: 23    1031
> chromosome: 24    800
> chromosome: 25    934
> gene total is    31388
>
> Anyone see anything wrong with how I get the total?  I don't, but then
> when I go to biomart (see below), I get a total of 31953
>
>
>
> and if I go to the info page for the genome at
> http://useast.ensembl.org/Danio_rerio/Info/Annotation I see a different
> total there too (31,650 not counting pseudogenes).
>
>
>
> Anyone have any idea why the different totals and which one to believe and
> whether there's anything wrong with using the one that my code calculated
> as the definitive one?  I need to compare the total number of genes vs. the
> number that we are finding under certain conditions, to do some stats.
> John
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
VectorBase | i5K insect genome initiative
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/df92cd12/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 38620 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/df92cd12/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 23932 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/df92cd12/attachment-0001.png>


More information about the Dev mailing list