[ensembl-dev] gene count anomaly

Daniel Lawson lawson at ebi.ac.uk
Mon Dec 7 02:24:56 GMT 2015


Hi John,

You are missing 565 loci correct (31953 - 31388).

I open Mart and using the Region filter select all non-chromosome
scaffolds. The 'Gene' count for these is 565, see image if that works on
the email list, else I include a URL for the Mart query.

http://www.ensembl.org/biomart/martview/68b3c7a216540966e2b2f569b596e7e6?VIRTUALSCHEMANAME=default&ATTRIBUTES=drerio_gene_ensembl.default.feature_page.ensembl_gene_id|drerio_gene_ensembl.default.feature_page.ensembl_transcript_id&FILTERS=drerio_gene_ensembl.default.filters.chromosome_name."KN149679.1,KN149681.1,KN149682.1,KN149684.1,KN149686.1,KN149687.1,KN149688.1,KN149689.1,KN149690.1,KN149691.1,KN149694.1,KN149695.1,KN149696.1,KN149697.1,KN149698.1,KN149702.1,KN149704.1,KN149706.1,KN149707.1,KN149710.1,KN149711.1,KN149713.1,KN149715.1,KN149717.1,KN149719.1,KN149725.1,KN149727.1,KN149730.1,KN149731.1,KN149732.1,KN149734.1,KN149735.1,KN149739.1,KN149753.1,KN149755.1,KN149764.1,KN149765.1,KN149776.1,KN149779.1,KN149781.1,KN149782.1,KN149784.1,KN149787.1,KN149790.1,KN149795.1,KN149797.1,KN149798.1,KN149799.1,KN149803.1,KN149813.1,KN149816.1,KN149818.1,KN149829.1,KN149830.1,KN149831.1,KN149842.1,KN149843.1,KN149846.1,KN149847.1,KN149850.1,KN149855.1,KN149857.1,KN149858.1,KN149859.1,KN149861.1,KN149868.1,KN149874.1,KN149878.1,KN149880.1,KN149883.1,KN149884.1,KN149886.1,KN149894.1,KN149895.1,KN149896.1,KN149897.1,KN149900.1,KN149904.1,KN149906.1,KN149909.1,KN149910.1,KN149912.1,KN149914.1,KN149916.1,KN149917.1,KN149921.1,KN149923.1,KN149929.1,KN149930.1,KN149933.1,KN149934.1,KN149936.1,KN149939.1,KN149943.1,KN149945.1,KN149946.1,KN149947.1,KN149948.1,KN149951.1,KN149955.1,KN149959.1,KN149962.1,KN149964.1,KN149966.1,KN149968.1,KN149978.1,KN149986.1,KN149987.1,KN149989.1,KN149992.1,KN149995.1,KN149997.1,KN149998.1,KN150000.1,KN150001.1,KN150002.1,KN150003.1,KN150008.1,KN150013.1,KN150015.1,KN150027.1,KN150032.1,KN150038.1,KN150039.1,KN150040.1,KN150041.1,KN150042.1,KN150046.1,KN150051.1,KN150052.1,KN150056.1,KN150062.1,KN150064.1,KN150066.1,KN150067.1,KN150071.1,KN150072.1,KN150075.1,KN150079.1,KN150080.1,KN150084.1,KN150086.1,KN150088.1,KN150090.1,KN150096.1,KN150099.1,KN150102.1,KN150104.1,KN150108.1,KN150109.1,KN150112.1,KN150115.1,KN150120.1,KN150125.1,KN150127.1,KN150128.1,KN150131.1,KN150137.1,KN150141.1,KN150142.1,KN150148.1,KN150156.1,KN150158.1,KN150162.1,KN150164.1,KN150165.1,KN150168.1,KN150169.1,KN150170.1,KN150171.1,KN150172.1,KN150173.1,KN150176.1,KN150177.1,KN150178.1,KN150188.1,KN150189.1,KN150193.1,KN150196.1,KN150199.1,KN150205.1,KN150207.1,KN150208.1,KN150212.1,KN150213.1,KN150214.1,KN150216.1,KN150221.1,KN150229.1,KN150230.1,KN150232.1,KN150239.1,KN150240.1,KN150241.1,KN150251.1,KN150259.1,KN150262.1,KN150265.1,KN150267.1,KN150269.1,KN150272.1,KN150273.1,KN150277.1,KN150285.1,KN150305.1,KN150307.1,KN150311.1,KN150312.1,KN150314.1,KN150317.1,KN150320.1,KN150322.1,KN150324.1,KN150326.1,KN150328.1,KN150332.1,KN150334.1,KN150335.1,KN150336.1,KN150339.1,KN150342.1,KN150345.1,KN150346.1,KN150348.1,KN150350.1,KN150351.1,KN150353.1,KN150355.1,KN150359.1,KN150361.1,KN150362.1,KN150365.1,KN150366.1,KN150371.1,KN150372.1,KN150379.1,KN150380.1,KN150383.1,KN150387.1,KN150390.1,KN150399.1,KN150400.1,KN150401.1,KN150402.1,KN150403.1,KN150405.1,KN150407.1,KN150411.1,KN150412.1,KN150415.1,KN150416.1,KN150424.1,KN150425.1,KN150432.1,KN150433.1,KN150435.1,KN150442.1,KN150447.1,KN150449.1,KN150451.1,KN150456.1,KN150470.1,KN150474.1,KN150475.1,KN150482.1,KN150487.1,KN150490.1,KN150491.1,KN150492.1,KN150505.1,KN150506.1,KN150508.1,KN150516.1,KN150518.1,KN150521.1,KN150527.1,KN150530.1,KN150531.1,KN150532.1,KN150541.1,KN150543.1,KN150544.1,KN150545.1,KN150550.1,KN150552.1,KN150561.1,KN150562.1,KN150564.1,KN150566.1,KN150568.1,KN150570.1,KN150572.1,KN150574.1,KN150576.1,KN150578.1,KN150589.1,KN150590.1,KN150596.1,KN150597.1,KN150600.1,KN150603.1,KN150605.1,KN150608.1,KN150614.1,KN150616.1,KN150617.1,KN150620.1,KN150628.1,KN150630.1,KN150631.1,KN150635.1,KN150636.1,KN150637.1,KN150642.1,KN150647.1,KN150650.1,KN150653.1,KN150654.1,KN150663.1,KN150665.1,KN150666.1,KN150667.1,KN150670.1,KN150672.1,KN150674.1,KN150677.1,KN150680.1,KN150681.1,KN150683.1,KN150685.1,KN150691.1,KN150696.1,KN150698.1,KN150699.1,KN150700.1,KN150702.1,KN150703.1,KN150706.1,KN150708.1,KN150709.1"&VISIBLEPANEL=filterpanel


Hope that helps/goes some way to explaining the difference between Mart and
your API script. I can't comment on whether or not either of these are the
definitive gene count for zebrafish.

regards
Dan


On 7 December 2015 at 02:17, john samuel <john.samuel at senecacollege.ca>
wrote:

> Thanks Dan.
> I thought of that, and I tried the same code but looking for genes in all
> the scaffolds, thinking that there might be some unplaced scaffolds, but
> the total for all scaffolds adds up to 31,501.  This could be, as you said,
> all the genes mapped to chromosomes, plus some unplaced scaffolds, but that
> doesn't match any of the other totals, so I'm no closer to knowing which
> total is correct.
> Any other thoughts?
> John
>
>
> On 15-12-06 09:07 PM, Daniel Lawson wrote:
>
> Hi John,
>
> There may be other sequences in the assembly that have not been assigned
> to a chromosome. You can check this via the API or in Mart. I expect you'll
> find a bunch of small sequences that harbour some genes - maybe that will
> get your totals to balance.
>
> cheers
> Dan
>
>
>
>
> On 7 December 2015 at 01:59, john samuel <john.samuel at senecacollege.ca>
> wrote:
>
>> Hi,
>> I am trying to get an accurate count of all the ENSDARG genes from the
>> latest zebrafish data (GRCz10) in ensembl.
>> If I use the perl api to get all the genes in all the chromosomes I get a
>> total of 31,388 i.e.
>>
>> my $slice_adaptor = $registry->get_adaptor( 'danio_rerio', 'Core',
>> 'Slice' );
>> my @slices = @{ $slice_adaptor->fetch_all('chromosome') };
>> my $total = 0;
>> my %all;
>> foreach my $slice (@slices) {
>>     my @genes = @{ $slice->get_all_Genes() };
>>     my $count = scalar @genes;
>>     $all{$slice->seq_region_name()}=$count;
>>     $total += $count;
>> }
>> foreach my $sorted (sort {$a<=>$b} keys %all) {
>>     print "chromosome: $sorted\t$all{$sorted}\n";
>> }
>> print "gene total is\t$total\n";
>>
>> chromosome: MT    37
>> chromosome: 1    1386
>> chromosome: 2    1587
>> chromosome: 3    1611
>> chromosome: 4    3103
>> chromosome: 5    1704
>> chromosome: 6    1280
>> chromosome: 7    1507
>> chromosome: 8    1216
>> chromosome: 9    1108
>> chromosome: 10    1108
>> chromosome: 11    1039
>> chromosome: 12    952
>> chromosome: 13    1013
>> chromosome: 14    953
>> chromosome: 15    1146
>> chromosome: 16    1241
>> chromosome: 17    1048
>> chromosome: 18    942
>> chromosome: 19    1123
>> chromosome: 20    1253
>> chromosome: 21    1092
>> chromosome: 22    1174
>> chromosome: 23    1031
>> chromosome: 24    800
>> chromosome: 25    934
>> gene total is    31388
>>
>> Anyone see anything wrong with how I get the total?  I don't, but then
>> when I go to biomart (see below), I get a total of 31953
>>
>>
>>
>> and if I go to the info page for the genome at
>> http://useast.ensembl.org/Danio_rerio/Info/Annotation I see a different
>> total there too (31,650 not counting pseudogenes).
>>
>>
>>
>> Anyone have any idea why the different totals and which one to believe
>> and whether there's anything wrong with using the one that my code
>> calculated as the definitive one?  I need to compare the total number of
>> genes vs. the number that we are finding under certain conditions, to do
>> some stats.
>> John
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> --
> VectorBase | i5K insect genome initiative
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
VectorBase | i5K insect genome initiative
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/7bfa4c78/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 23932 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/7bfa4c78/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 38620 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/7bfa4c78/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2015-12-07 at 02.22.22.png
Type: image/png
Size: 128592 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151207/7bfa4c78/attachment-0002.png>


More information about the Dev mailing list