[ensembl-dev] 1KGp3 markers under-annotated for human chr 22 in Ensembl API v96?

andrew126 at mac.com andrew126 at mac.com
Wed Oct 9 14:40:49 BST 2019


Thanks very much, Helen.

I can confirm that I get counts returned from the script against ensembldb.ensembl.org now .. smile.

Best,

Andrew

> On Oct 4, 2019, at 8:30 AM, Helen Schuilenburg <helens at ebi.ac.uk> wrote:
> 
> Hi Andrew
> 
> Thank you for the list.  We will correct chr22 in the next release. Thank you for reporting this.
> 
> The issue with the database on ensembldb.ensembl.org is fixed. Counts should be returned if you run your script.
> 
> Regards
> 
> Helen
> 
> On 01/10/2019 10:22, andrew126 at mac.com wrote:
>> Hi Helen,
>> 
>> Thanks for the response.
>> 
>> I ran the numbers for all autosomes.  Chr 9 seems to show a bit of a deficit relative to its neighbors, but nothing as obvious as chr 22.
>> 
>> 1 .. 6123396
>> 2 .. 6873996
>> 3 .. 5642383
>> 4 .. 5585136
>> 5 .. 5110655
>> 6 .. 4842774
>> 7 .. 4559179
>> 8 .. 4465690
>> 9 .. 3433163
>> 10 .. 3847658
>> 11 .. 3912747
>> 12 .. 3647900
>> 13 .. 2772148
>> 14 .. 2556819
>> 15 .. 2345835
>> 16 .. 2609023
>> 17 .. 2235443
>> 18 .. 2192269
>> 19 .. 1752444
>> 20 .. 1747663
>> 21 .. 1066903
>> 22 .. 3487
>> 
>> Thanks for the help.
>> 
>> Best,
>> 
>> Andrew
>> 
>> 
>> 
>> 
>>> On Sep 30, 2019, at 10:40 AM, Helen Schuilenburg <helens at ebi.ac.uk> wrote:
>>> 
>>> Hi Andrew
>>> 
>>> Thank you for reporting these issues.
>>> 
>>> We are looking into why the marker counts for chr22 are low and will correct this in a future release.
>>> 
>>> We are also investigating why the ensembldb.ensembl.org is returning 0 counts.
>>> 
>>> Regards
>>> 
>>> Helen
>>> 
>>> On 30/09/2019 06:25, andrew126 at mac.com wrote:
>>>> Hi,
>>>> 
>>>> We have a local install of Ensembl 96 (API and human db), and the below perl script returns the following marker counts for 1KGp3:EUR, chromosomes 20, 21, and 22:
>>>> 
>>>> 	20 .. 1747663
>>>> 	21 .. 1066903
>>>> 	22 .. 3487
>>>> 
>>>> We see the exact same counts regardless of if we use locally downloaded 1KG vcfs or the remote vcfs (as config'd in ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json).
>>>> 
>>>> I can't find a separate resource with summary numbers, but chr 22 being three orders of magnitude below chr 20 or 21 seems unexpected.  Is that value correct?
>>>> 
>>>> Further, gunzip'ing the ALL.chr###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz files for chroms 20-22 gives more expected consistency via 'cut -f3 FILE | grep rs | grep -v "#" | wc -l':
>>>> 
>>>> 	20: 1,811,392
>>>> 	21: 1,100,065
>>>> 	22: 1,100,429
>>>> 
>>>> All of the chrom 22 variants are QUAL=100, FILTER=PASS.
>>>> 
>>>> Any help understanding why the chr-22 1KGp3:EUR marker count is so low would be greatly appreciated.  I'm not sure if this impacts other chromosomes or not.
>>>> 
>>>> Trying to reproduce the problem/result against remote Ensembl 96 (ensembldb.ensembl.org) has exposed another issue:  the same script returns 0 counts for each chromosome, regardless of if is using remote or local 1KG vcfs.  Further, no errors are reported/thrown.
>>>> 
>>>> 	20 .. 0
>>>> 	21 .. 0
>>>> 	22 .. 0
>>>> 
>>>> Can you also clarify why we see that behavior, given that the script reports values against a local Ensembl 96 install?
>>>> 
>>>> Many thanks for any help/information.
>>>> 
>>>> Best,
>>>> 
>>>> Andrew
>>>> 
>>>> 
>>>> Here is the perl script:
>>>> 
>>>> use strict;
>>>> $|=1;
>>>> use Bio::EnsEMBL::Registry;
>>>> use Bio::EnsEMBL::ApiVersion;
>>>> 
>>>> my $registry = 'Bio::EnsEMBL::Registry';
>>>> $registry->load_all();
>>>> $registry->load_registry_from_db(
>>>>     -host => 'ensembldb.ensembl.org',
>>>>     -user => 'anonymous'
>>>>     );
>>>> 
>>>> my $vs_adaptor = $registry->get_adaptor('human','variation','variationset');
>>>> $vs_adaptor->db->use_vcf(1);
>>>> my $vs = $vs_adaptor->fetch_by_short_name('1kg_3_eur');
>>>> 
>>>> foreach my $chr (20..22) {
>>>>     my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'slice');
>>>>     my $slice = $slice_adaptor->fetch_by_region('chromosome', $chr);
>>>>     my $s_start = $slice->start;
>>>>     my $s_end = $slice->end;
>>>>     my $tmp_start = $s_start;
>>>>     my $tmp_end = -1;
>>>>     my $tcnt=0;
>>>>     while ($tmp_start<=$s_end) {
>>>>         $tmp_end = $tmp_start + 1000;
>>>>         $slice = $slice_adaptor->fetch_by_region('chromosome', $chr, $tmp_start, $tmp_end);
>>>>         my $vfs = $vs->get_all_VariationFeatures_by_Slice($slice);
>>>>         my @tmp = @{$vfs};
>>>>         $tcnt+=($#tmp+1);
>>>>         $tmp_start = $tmp_end+1;
>>>>     }
>>>>     print "$chr .. $tcnt\n";
>>>> }
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list