[ensembl-dev] gene count anomaly
Daniel Lawson
lawson at ebi.ac.uk
Mon Dec 7 02:07:30 GMT 2015
Hi John,
There may be other sequences in the assembly that have not been assigned to
a chromosome. You can check this via the API or in Mart. I expect you'll
find a bunch of small sequences that harbour some genes - maybe that will
get your totals to balance.
cheers
Dan
On 7 December 2015 at 01:59, john samuel <john.samuel at senecacollege.ca>
wrote:
> Hi,
> I am trying to get an accurate count of all the ENSDARG genes from the
> latest zebrafish data (GRCz10) in ensembl.
> If I use the perl api to get all the genes in all the chromosomes I get a
> total of 31,388 i.e.
>
> my $slice_adaptor = $registry->get_adaptor( 'danio_rerio', 'Core', 'Slice'
> );
> my @slices = @{ $slice_adaptor->fetch_all('chromosome') };
> my $total = 0;
> my %all;
> foreach my $slice (@slices) {
> my @genes = @{ $slice->get_all_Genes() };
> my $count = scalar @genes;
> $all{$slice->seq_region_name()}=$count;
> $total += $count;
> }
> foreach my $sorted (sort {$a<=>$b} keys %all) {
> print "chromosome: $sorted\t$all{$sorted}\n";
> }
> print "gene total is\t$total\n";
>
> chromosome: MT 37
> chromosome: 1 1386
> chromosome: 2 1587
> chromosome: 3 1611
> chromosome: 4 3103
> chromosome: 5 1704
> chromosome: 6 1280
> chromosome: 7 1507
> chromosome: 8 1216
> chromosome: 9 1108
> chromosome: 10 1108
> chromosome: 11 1039
> chromosome: 12 952
> chromosome: 13 1013
> chromosome: 14 953
> chromosome: 15 1146
> chromosome: 16 1241
> chromosome: 17 1048
> chromosome: 18 942
> chromosome: 19 1123
> chromosome: 20 1253
> chromosome: 21 1092
> chromosome: 22 1174
> chromosome: 23 1031
> chromosome: 24 800
> chromosome: 25 934
> gene total is 31388
>
> Anyone see anything wrong with how I get the total? I don't, but then
> when I go to biomart (see below), I get a total of 31953
>
>
>
> and if I go to the info page for the genome at
> http://useast.ensembl.org/Danio_rerio/Info/Annotation I see a different
> total there too (31,650 not counting pseudogenes).
>
>
>
> Anyone have any idea why the different totals and which one to believe and
> whether there's anything wrong with using the one that my code calculated
> as the definitive one? I need to compare the total number of genes vs. the
> number that we are finding under certain conditions, to do some stats.
> John
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
--
VectorBase | i5K insect genome initiative
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/df92cd12/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 38620 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/df92cd12/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 23932 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/df92cd12/attachment-0001.png>
More information about the Dev
mailing list