[ensembl-dev] LDFeatureContainerAdaptor->fetch_by_VariationFeature is returning unrelated results co-localized with query variation feature

Anja Thormann anja at ebi.ac.uk
Tue Jan 3 13:07:41 GMT 2017


Hi Andrew,

thank you for your detailed error report. This has now been fixed for release/86 and release/87 of our ensembl-variation repository. Please update your checkout. 

rs111781203 returned data before the fix because we were using the genotypes from the co-located variant rs7556897 to compute LD. I added another check to avoid that now.

The main reason why rs111781203 is not going to return LD data now is the lack of genotypes from the 1000 Genomes project which we need for LD computation. rs7556897 has genotypes from the 1000 Genomes project.


Thank you,
Anja

> On 2 Jan 2017, at 22:57, andrew126 at mac.com wrote:
> 
> Hi,
> 
> I’m using API v86.
> 
> When trying to recover variant features in high-LD with a particular variant via LDFeatureContainerAdaptor->fetch_by_VariationFeature(), results are being returned that do not contain the query variation feature .. instead the extra results appear to belong to variant features that live at the same location as the query variation feature.
> 
> For example, querying high-LD pairs to rs111781203 is returning the below results ..
> 
> High r2 (0.758709) between variations rs58145932-rs7556897
> High r2 (0.807347) between variations rs733693-rs7556897
> High r2 (0.810721) between variations rs2063021-rs7556897
> High r2 (0.806414) between variations rs2396495-rs7556897
> High r2 (0.807347) between variations rs12694756-rs7556897
> High r2 (1.000000) between variations rs7556897-rs4973341
> High r2 (0.814778) between variations rs548837969-rs7556897
> High r2 (0.766439) between variations rs11692411-rs7556897
> High r2 (0.810721) between variations rs1395338-rs7556897
> High r2 (0.807347) between variations rs8179522-rs7556897
> Etc.
> 
> rs7556897 happens to live at the same location as (but is considered distinct from) rs111781203.
> 
> Is this the expected behavior?
> 
> Thanks for any guidance.
> 
> Best,
> 
> Andrew
> 
> 
> The script to recreate this is below:
> 
> use strict;
> $|=1;
> use DBI;
> use IPC::Run;
> use Bio::EnsEMBL::Registry;
> use Bio::EnsEMBL::ApiVersion; 
> printf( "The API version used is %s\n", software_version() ); 
> 
> # API version is same as Ensembl version, which provides version of gene annotations
> 
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_all();
> $registry->set_reconnect_when_lost(1);
> $registry->load_registry_from_db(
>    -host => 'useastdb.ensembl.org', # alternatively ensembldb 'useastdb.ensembl.org'
>    -user => 'anonymous'
>    );
> 
> $registry->set_reconnect_when_lost(1);
> my ($index_var, $chrom, $index_region_start, $index_region_end) = split /\s+/, "rs111781203 2 227795396 227795396";
> my $pop_adaptor = $registry->get_adaptor("human","variation","population");
> $pop_adaptor->db->use_vcf(1);
> my $population = $pop_adaptor->fetch_by_name("1000GENOMES:phase_3:EUR");
> 
> my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer'); #get adaptor for LDFeatureContainer object
> $ldFeatureContainerAdaptor->db->use_vcf(1);
> $ldFeatureContainerAdaptor->max_snp_distance(50000);
> 
> print "index variant: $index_var $chrom $index_region_start\-$index_region_end\n";
> 
> my $variation_adaptor = $registry->get_adaptor( 'human', 'variation', 'variation' );
> $variation_adaptor->db->use_vcf(1);
> my $query_variation_feature;
> my $variation = $variation_adaptor->fetch_by_name("$index_var");
> my $vfref = $variation->get_all_VariationFeatures();
> foreach my $vf (@{$vfref}) {
>    if ($vf->variation_name eq $index_var) {
> 	if ($chrom eq $vf->seq_region_name) {
> 	    if ($index_region_start eq $vf->seq_region_start && $index_region_end eq $vf->seq_region_end) {
> 		$query_variation_feature = $vf;
> 	    }
> 	}
>    }
> }
> 
> my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_VariationFeature($query_variation_feature,$population);
> 	
> foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
>    if ($r_square->{r2} >= 0.6){ 
> 	print "High r2 (".($r_square->{r2}).") between variations ", $r_square->{variation1}->variation_name, "-", $r_square->{variation2}->variation_name, "\n";
>    }
> }
> 
> 
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list