[ensembl-dev] 1KGp3 markers under-annotated for human chr 22 in Ensembl API v96?

andrew126 at mac.com andrew126 at mac.com
Tue Oct 1 10:22:43 BST 2019


Hi Helen,

Thanks for the response.

I ran the numbers for all autosomes.  Chr 9 seems to show a bit of a deficit relative to its neighbors, but nothing as obvious as chr 22.

1 .. 6123396
2 .. 6873996
3 .. 5642383
4 .. 5585136
5 .. 5110655
6 .. 4842774
7 .. 4559179
8 .. 4465690
9 .. 3433163
10 .. 3847658
11 .. 3912747
12 .. 3647900
13 .. 2772148
14 .. 2556819
15 .. 2345835
16 .. 2609023
17 .. 2235443
18 .. 2192269
19 .. 1752444
20 .. 1747663
21 .. 1066903
22 .. 3487

Thanks for the help.

Best,

Andrew




> On Sep 30, 2019, at 10:40 AM, Helen Schuilenburg <helens at ebi.ac.uk> wrote:
> 
> Hi Andrew
> 
> Thank you for reporting these issues.
> 
> We are looking into why the marker counts for chr22 are low and will correct this in a future release.
> 
> We are also investigating why the ensembldb.ensembl.org is returning 0 counts.
> 
> Regards
> 
> Helen
> 
> On 30/09/2019 06:25, andrew126 at mac.com wrote:
>> Hi,
>> 
>> We have a local install of Ensembl 96 (API and human db), and the below perl script returns the following marker counts for 1KGp3:EUR, chromosomes 20, 21, and 22:
>> 
>> 	20 .. 1747663
>> 	21 .. 1066903
>> 	22 .. 3487
>> 
>> We see the exact same counts regardless of if we use locally downloaded 1KG vcfs or the remote vcfs (as config'd in ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json).
>> 
>> I can't find a separate resource with summary numbers, but chr 22 being three orders of magnitude below chr 20 or 21 seems unexpected.  Is that value correct?
>> 
>> Further, gunzip'ing the ALL.chr###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz files for chroms 20-22 gives more expected consistency via 'cut -f3 FILE | grep rs | grep -v "#" | wc -l':
>> 
>> 	20: 1,811,392
>> 	21: 1,100,065
>> 	22: 1,100,429
>> 
>> All of the chrom 22 variants are QUAL=100, FILTER=PASS.
>> 
>> Any help understanding why the chr-22 1KGp3:EUR marker count is so low would be greatly appreciated.  I'm not sure if this impacts other chromosomes or not.
>> 
>> Trying to reproduce the problem/result against remote Ensembl 96 (ensembldb.ensembl.org) has exposed another issue:  the same script returns 0 counts for each chromosome, regardless of if is using remote or local 1KG vcfs.  Further, no errors are reported/thrown.
>> 
>> 	20 .. 0
>> 	21 .. 0
>> 	22 .. 0
>> 
>> Can you also clarify why we see that behavior, given that the script reports values against a local Ensembl 96 install?
>> 
>> Many thanks for any help/information.
>> 
>> Best,
>> 
>> Andrew
>> 
>> 
>> Here is the perl script:
>> 
>> use strict;
>> $|=1;
>> use Bio::EnsEMBL::Registry;
>> use Bio::EnsEMBL::ApiVersion;
>> 
>> my $registry = 'Bio::EnsEMBL::Registry';
>> $registry->load_all();
>> $registry->load_registry_from_db(
>>     -host => 'ensembldb.ensembl.org',
>>     -user => 'anonymous'
>>     );
>> 
>> my $vs_adaptor = $registry->get_adaptor('human','variation','variationset');
>> $vs_adaptor->db->use_vcf(1);
>> my $vs = $vs_adaptor->fetch_by_short_name('1kg_3_eur');
>> 
>> foreach my $chr (20..22) {
>>     my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'slice');
>>     my $slice = $slice_adaptor->fetch_by_region('chromosome', $chr);
>>     my $s_start = $slice->start;
>>     my $s_end = $slice->end;
>>     my $tmp_start = $s_start;
>>     my $tmp_end = -1;
>>     my $tcnt=0;
>>     while ($tmp_start<=$s_end) {
>>         $tmp_end = $tmp_start + 1000;
>>         $slice = $slice_adaptor->fetch_by_region('chromosome', $chr, $tmp_start, $tmp_end);
>>         my $vfs = $vs->get_all_VariationFeatures_by_Slice($slice);
>>         my @tmp = @{$vfs};
>>         $tcnt+=($#tmp+1);
>>         $tmp_start = $tmp_end+1;
>>     }
>>     print "$chr .. $tcnt\n";
>> }
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list