[ensembl-dev] 1KGp3 markers under-annotated for human chr 22 in Ensembl API v96?
andrew126 at mac.com
andrew126 at mac.com
Wed Oct 9 14:40:49 BST 2019
Thanks very much, Helen.
I can confirm that I get counts returned from the script against ensembldb.ensembl.org now .. smile.
Best,
Andrew
> On Oct 4, 2019, at 8:30 AM, Helen Schuilenburg <helens at ebi.ac.uk> wrote:
>
> Hi Andrew
>
> Thank you for the list. We will correct chr22 in the next release. Thank you for reporting this.
>
> The issue with the database on ensembldb.ensembl.org is fixed. Counts should be returned if you run your script.
>
> Regards
>
> Helen
>
> On 01/10/2019 10:22, andrew126 at mac.com wrote:
>> Hi Helen,
>>
>> Thanks for the response.
>>
>> I ran the numbers for all autosomes. Chr 9 seems to show a bit of a deficit relative to its neighbors, but nothing as obvious as chr 22.
>>
>> 1 .. 6123396
>> 2 .. 6873996
>> 3 .. 5642383
>> 4 .. 5585136
>> 5 .. 5110655
>> 6 .. 4842774
>> 7 .. 4559179
>> 8 .. 4465690
>> 9 .. 3433163
>> 10 .. 3847658
>> 11 .. 3912747
>> 12 .. 3647900
>> 13 .. 2772148
>> 14 .. 2556819
>> 15 .. 2345835
>> 16 .. 2609023
>> 17 .. 2235443
>> 18 .. 2192269
>> 19 .. 1752444
>> 20 .. 1747663
>> 21 .. 1066903
>> 22 .. 3487
>>
>> Thanks for the help.
>>
>> Best,
>>
>> Andrew
>>
>>
>>
>>
>>> On Sep 30, 2019, at 10:40 AM, Helen Schuilenburg <helens at ebi.ac.uk> wrote:
>>>
>>> Hi Andrew
>>>
>>> Thank you for reporting these issues.
>>>
>>> We are looking into why the marker counts for chr22 are low and will correct this in a future release.
>>>
>>> We are also investigating why the ensembldb.ensembl.org is returning 0 counts.
>>>
>>> Regards
>>>
>>> Helen
>>>
>>> On 30/09/2019 06:25, andrew126 at mac.com wrote:
>>>> Hi,
>>>>
>>>> We have a local install of Ensembl 96 (API and human db), and the below perl script returns the following marker counts for 1KGp3:EUR, chromosomes 20, 21, and 22:
>>>>
>>>> 20 .. 1747663
>>>> 21 .. 1066903
>>>> 22 .. 3487
>>>>
>>>> We see the exact same counts regardless of if we use locally downloaded 1KG vcfs or the remote vcfs (as config'd in ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json).
>>>>
>>>> I can't find a separate resource with summary numbers, but chr 22 being three orders of magnitude below chr 20 or 21 seems unexpected. Is that value correct?
>>>>
>>>> Further, gunzip'ing the ALL.chr###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz files for chroms 20-22 gives more expected consistency via 'cut -f3 FILE | grep rs | grep -v "#" | wc -l':
>>>>
>>>> 20: 1,811,392
>>>> 21: 1,100,065
>>>> 22: 1,100,429
>>>>
>>>> All of the chrom 22 variants are QUAL=100, FILTER=PASS.
>>>>
>>>> Any help understanding why the chr-22 1KGp3:EUR marker count is so low would be greatly appreciated. I'm not sure if this impacts other chromosomes or not.
>>>>
>>>> Trying to reproduce the problem/result against remote Ensembl 96 (ensembldb.ensembl.org) has exposed another issue: the same script returns 0 counts for each chromosome, regardless of if is using remote or local 1KG vcfs. Further, no errors are reported/thrown.
>>>>
>>>> 20 .. 0
>>>> 21 .. 0
>>>> 22 .. 0
>>>>
>>>> Can you also clarify why we see that behavior, given that the script reports values against a local Ensembl 96 install?
>>>>
>>>> Many thanks for any help/information.
>>>>
>>>> Best,
>>>>
>>>> Andrew
>>>>
>>>>
>>>> Here is the perl script:
>>>>
>>>> use strict;
>>>> $|=1;
>>>> use Bio::EnsEMBL::Registry;
>>>> use Bio::EnsEMBL::ApiVersion;
>>>>
>>>> my $registry = 'Bio::EnsEMBL::Registry';
>>>> $registry->load_all();
>>>> $registry->load_registry_from_db(
>>>> -host => 'ensembldb.ensembl.org',
>>>> -user => 'anonymous'
>>>> );
>>>>
>>>> my $vs_adaptor = $registry->get_adaptor('human','variation','variationset');
>>>> $vs_adaptor->db->use_vcf(1);
>>>> my $vs = $vs_adaptor->fetch_by_short_name('1kg_3_eur');
>>>>
>>>> foreach my $chr (20..22) {
>>>> my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'slice');
>>>> my $slice = $slice_adaptor->fetch_by_region('chromosome', $chr);
>>>> my $s_start = $slice->start;
>>>> my $s_end = $slice->end;
>>>> my $tmp_start = $s_start;
>>>> my $tmp_end = -1;
>>>> my $tcnt=0;
>>>> while ($tmp_start<=$s_end) {
>>>> $tmp_end = $tmp_start + 1000;
>>>> $slice = $slice_adaptor->fetch_by_region('chromosome', $chr, $tmp_start, $tmp_end);
>>>> my $vfs = $vs->get_all_VariationFeatures_by_Slice($slice);
>>>> my @tmp = @{$vfs};
>>>> $tcnt+=($#tmp+1);
>>>> $tmp_start = $tmp_end+1;
>>>> }
>>>> print "$chr .. $tcnt\n";
>>>> }
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list