[ensembl-dev] bgzf_read_block message in stderr

andrew126 at mac.com andrew126 at mac.com
Tue Dec 5 12:29:41 GMT 2017


Hi Will,

Thanks for the response.

Extra information:

	*There is no marker that reliably recreates the problem, but rs887204533 is one which has previously shown the issue.  My current work-around is to have a master script run a sub-script job which collects allele frequencies for a small chunk of the region of interest .. upon the completion of the sub-script, the master script scans the sub-script's stderr for the bgzf message .. if found, the sub-script runs its job over again.  This repeats until no bgzf message is found.  Rerunning the sub-script a small number of times on the same region/chunk always seems to create a run with no bgzf message, fortunately.

	*These vcfs are being accessed remotely (I had no idea the vcfs carried frequency information (vrs. just genotypes)).

Below shows the recovered allele frequencies for rs887204533 without/with the bgzf message.
Below that is a simplified version of the sub-script.

Please let me know if anything else would be helpful.

Thanks for the assistance.

Best,

Andrew


When no bgzf message found:

        rs887204533
        G/A
        Chromosome: 16
        Start-End: 5985270-5985270
        Consequences: non_coding_transcript_variant,intron_variant
        Allele Frequencies:
                Allele A has frequency: 3.4343e-005 in population TOPMed.
                Allele G has frequency: 0.999965657 in population TOPMed.
                Allele A has frequency: 0.00000e+00 in population gnomADg:AMR.
                Allele G has frequency: 1 in population gnomADg:AMR.
                Allele A has frequency: 0.00000e+00 in population gnomADg:NFE.
                Allele G has frequency: 1 in population gnomADg:NFE.
                Allele A has frequency: 1.14837e-04 in population gnomADg:AFR.
                Allele G has frequency: 0.999885163 in population gnomADg:AFR.
                Allele A has frequency: 0.00000e+00 in population gnomADg:FIN.
                Allele G has frequency: 1 in population gnomADg:FIN.
                Allele A has frequency: 0.00000e+00 in population gnomADg:EAS.
                Allele G has frequency: 1 in population gnomADg:EAS.
                Allele A has frequency: 0.00000e+00 in population gnomADg:OTH.
                Allele G has frequency: 1 in population gnomADg:OTH.
                Allele A has frequency: 3.24570e-05 in population gnomADg:ALL.
                Allele G has frequency: 0.999967543 in population gnomADg:ALL.
                Allele A has frequency: 0.00000e+00 in population gnomADg:ASJ.
                Allele G has frequency: 1 in population gnomADg:ASJ.

When bgzf message found:

        rs887204533
        G/A
        Chromosome: 16
        Start-End: 5985270-5985270
        Consequences: non_coding_transcript_variant,intron_variant
        Allele Frequencies:
                Allele A has frequency: 3.4343e-005 in population TOPMed.
                Allele G has frequency: 0.999965657 in population TOPMed.



use strict;
$|=1;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::ApiVersion;

my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_all();
$registry->load_registry_from_db(
    -host => 'erismysql01.eisai.local', # alternatively 'useastdb.ensembl.org'
    -user => 'ensembl_ro',
    -pass => 'ensembl_ro'
    );

$registry->set_reconnect_when_lost(1);

my $chr = $ARGV[0];
my $lw = $ARGV[1];
my $hg = $ARGV[2];

my $gene_adaptor = $registry->get_adaptor('homo_sapiens','core','gene');
my $slice_adaptor = $registry->get_adaptor('homo_sapiens','core','slice');
my $vf_adaptor = $registry->get_adaptor( 'human', 'variation', 'variationfeature' );
$vf_adaptor->db->use_vcf(1);

my $slice2 = $slice_adaptor->fetch_by_region( 'chromosome', "$chr", $lw, $hg);
my $varfs = $vf_adaptor->fetch_all_by_Slice($slice2); #return ALL variationfeatures defined in $slice
foreach my $vf (@{$varfs}){
    print "\t".$vf->variation_name(),"\n"; # print rsID
    print "\t".$vf->allele_string(),"\n"; # print alleles
    print "\tChromosome: ".$slice2->seq_region_name."\n";
    print "\tStart-End: ".$vf->seq_region_start."-".$vf->seq_region_end."\n";
    print "\tConsequences: ".join(",",@{$vf->consequence_type()}),"\n"; # print consequenceType
    my $akc = 1;
    print "\tAllele Frequencies:\n";
    my $alleles = $vf->variation->get_all_Alleles();
    foreach my $allele (@{$alleles}) {
        next unless (defined $allele->population);
        my $allele_string   = $allele->allele;
        my $frequency       = $allele->frequency || 'NA';
        my $population_name = $allele->population->name;
        printf("\t\tAllele %s has frequency: %s in population %s.\n", $allele_string, $frequency, $population_name);
    }
    print "\n";
}





> On Dec 5, 2017, at 1:41 AM, William McLaren <wm2 at ebi.ac.uk> wrote:
> 
> Hi Andrew,
> 
> A few of pieces of information would be useful:
> 
> - a particular variant or location you’re querying that recreates the issue
> - the code and context from which you’re making the call
> - whether you’re using remotely accessed VCFs or locally downloaded (if you don’t know then it’s likely the VCFs are being accessed remotely)
> 
> Regards
> 
> Will McLaren
> Ensembl Variation
> 
> 
> On 4 December 2017 at 5:31:07 pm, andrew126 at mac.com <mailto:andrew126 at mac.com> (andrew126 at mac.com <mailto:andrew126 at mac.com>) wrote:
> 
>> Hi, 
>> 
>> In using API 90, when recovering all available allele frequencies for a particular variation feature, occassionally this message is shown in stderr: 
>> 
>> [W::bgzf_read_block] EOF marker is absent. The input is probably truncated 
>> 
>> That message seems to correlate with some allele frequencies not being found/reported by my script (in my case, the gnomAD frequencies). 
>> 
>> That is, when that message is not seen in stderr, I recover gnomAD frequencies for a particular variation feature; but when the message is seen, then it can look like that same variation feature doesn't have any gnomAD frequencies (even though it does). 
>> 
>> Is there any way to make such an event fatal to any API script? I'd much rather have my script fail/exit under such errors, than to not have it fail and risk believing such allele frequencies don't exist. 
>> 
>> Thanks. 
>> 
>> Please let me know if any other information would be useful. 
>> 
>> Best, 
>> 
>> Andrew 
>> 
>> 
>> _______________________________________________ 
>> Dev mailing list  Dev at ensembl.org <mailto:Dev at ensembl.org> 
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev> 
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20171205/78eca3a4/attachment.html>


More information about the Dev mailing list