[ensembl-dev] r2-pair calculations

andrew126 at mac.com andrew126 at mac.com
Sat Jun 2 16:10:50 BST 2018


Hi,

rs1406071 shows identical mapping information, 1KG variant-set participation, and 1KG allele frequencies between Ensembl API 90 and 92; however, API 90 returns no high-r2 LD results, while API 92 returns high-r2 LD results.  Script to reproduce and output shown below.  I'm not sure why API 90 returns no results?

In general, when $ldFeatureContainerAdaptor->fetch_by_VariationFeature returns no r2 results, it either means the variation feature has no high-LD pairs or the variation feature is not in the LD dataset and so no r2 pairs can be calculated.  It seems very useful (and important) to distinguish those two possibilities when interpretting r2 results.

Based on the API 90 result, pre-screening for query variant features which have 1KG frequencies and 1KG variant sets is insufficient for determining if a variation feature is capable of returning 1KG high-LD results.  Is there a method to answer "can this variant feature generate any LD results?" .. presumably it would also cover tri-allelics, which I don't think report LD results either?

Thanks for any guidance.

Best regards,

Andrew

API 90 output:

	The API version used is 90
	variation-feature chromosomes:
	        17
	        CHR_HSCHR17_1_CTG5
	        CHR_HSCHR17_2_CTG5
	allele frequencies:
	        Allele T has frequency: 0.759443339960239 in population 1000GENOMES:phase_3:EUR.
	        Allele C has frequency: 0.240556660039761 in population 1000GENOMES:phase_3:EUR.
	variant sets:
	        1000 Genomes 3 - EUR
	        1000 Genomes 3 - EUR - common
	LD results >= r2 0.99:

API 92 output:

	The API version used is 92
	variation-feature chromosomes:
		17
		CHR_HSCHR17_1_CTG5
		CHR_HSCHR17_2_CTG5
	allele frequencies:
		Allele C has frequency: 0.240556660039761 in population 1000GENOMES:phase_3:EUR.
		Allele T has frequency: 0.759443339960239 in population 1000GENOMES:phase_3:EUR.
	variant sets:
		1000 Genomes 3 - EUR - common
		1000 Genomes 3 - EUR
	LD results >= r2 0.99:
		rs1406071 rs74348235 1.000000
		rs1406071 rs17660251 1.000000
		rs1406071 rs2696707 1.000000
		rs1406071 rs2696590 0.994581
		rs1406071 rs2696703 1.000000
		rs1406071 rs62061766 1.000000
		rs1406071 rs2732596 0.994566
		rs1406071 rs62061821 1.000000
	Etc.

Script:

use strict;
$|=1;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::ApiVersion;
printf( "The API version used is %s\n", software_version() );

my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_all();
$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org', # alternatively 'useastdb.ensembl.org'
    -user => 'anonymous',
    );

$registry->set_reconnect_when_lost(1);
my $v_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');
$v_adaptor->db->use_vcf(1);
my $pop_adaptor = $registry->get_adaptor("human","variation","population");
my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer');
$ldFeatureContainerAdaptor->max_snp_distance(100000);
my $population = $pop_adaptor->fetch_by_name("1000GENOMES:phase_3:EUR");
my $variation = $v_adaptor->fetch_by_name("rs1406071");
my $vfref = $variation->get_all_VariationFeatures();
my $query_variation_feature;
print "variation-feature chromosomes:\n";
foreach my $vf (@{$vfref}) {
    print "\t".$vf->seq_region_name."\n";
    if ($vf->seq_region_name eq "17") {
        $query_variation_feature = $vf;
    }
}
print "allele frequencies:\n";
my $alleles = $query_variation_feature->variation->get_all_Alleles();
foreach my $allele (@{$alleles}) {
    next unless (defined $allele->population);
    my $allele_string   = $allele->allele;
    my $frequency       = $allele->frequency || 'NA';
    my $population_name = $allele->population->name;
    if ($population_name=~/EUR/ && $population_name=~/1000/) {
        printf("\tAllele %s has frequency: %s in population %s.\n", $allele_string, $frequency, $population_name);
    }
}
my $variant_sets = $query_variation_feature->get_all_VariationSets();
print "variant sets:\n";
if (defined $variant_sets) {
    foreach my $vs (@{$variant_sets}){
        if ($vs->name()=~/EUR/) {
            print "\t".$vs->name()."\n";
        }
    }
}
print "LD results >= r2 0.99:\n";
my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_VariationFeature($query_variation_feature,$population);
foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
    if ($r_square->{r2} >= 0.99){
        print "\t".$r_square->{variation1}->variation_name." ".$r_square->{variation2}->variation_name." ".$r_square->{r2}."\n";
    }
}







More information about the Dev mailing list