[ensembl-dev] Problems with fetch_by_VariationFeatures
    Anja Thormann 
    anja at ebi.ac.uk
       
    Tue May  9 17:08:42 BST 2017
    
    
  
Hi Andrew,
thank you very much for the excellent bug report. You have again found some very subtle bugs:
1) rs56391266 doesn’t have any genotypes from 1kg which we can use for calculating LD. I will add a warning message to the code which will be thrown if this is the case.
2) the result you get is the LD computation for rs11291652 and rs72643554. The name rs11291652 gets overwritten by rs72643554 because they are both are located at the same position. I have to add a better filtering to make sure we return the correct LD stats for the given input variants. (For your case it would return no results because we only have genotypes for one of your variants)
I fix 1) and 2) asap and will let you know when the updated code is available.
Thank you,
Anja
> On 6 May 2017, at 21:43, andrew126 at mac.com wrote:
> 
> Hi,
> 
> I am using v88 of the API.
> 
> I have two variants recovered from REST eQTL data:
> 
> rs56391266
> http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=11:60376122-60377123;v=rs56391266;vdb=variation;vf=11255007
> It has no reported allele frequencies in 1000GENOMES:phase_3 datasets
> 
> rs11291652
> http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=11:60360031-60361031;v=rs11291652;vdb=variation;vf=6622904
> It is typed in 1000GENOMES:phase_3 datasets
> 
> Using script I paste below, I try to calculate r2 between these two variants in 1000GENOMES:phase_3:EUR.
> 
> Here is the output of script:
> 
> 	variant 1: rs72643554 60360531 60360531
> 	variant 2: rs72643554 60360531 60360531
> 	r^2 0.180796
> 
> Here are my concerns/questions:
> 
> 	*If rs56391266 is not typed in 1000GENOMES:phase_3:EUR (I did not know this before I calculate r2), how can any result be reported?
> 	*Function reports neither of the query variants.
> 	*A single variant is reported twice (rs72643554 .. which lives at same genomic location as rs11291652, but is considered a different variant)
> 	*If function is calculating r2 between variant and itself, why does it not report 1.000?
> 
> Please let me know if I am doing anything incorrectly.
> 
> Thanks very much.
> 
> Best regards,
> 
> Andrew
> 
> 
> 
> 
> use strict;
> $|=1;
> use IPC::Run;
> use Bio::EnsEMBL::Registry;
> use Bio::EnsEMBL::ApiVersion; 
> printf( "The API version used is %s\n", software_version() ); 
> 
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_all();
> $registry->set_reconnect_when_lost(1);
> $registry->load_registry_from_db(
>    -host => 'useastdb.ensembl.org', # alternatively ensembldb 'useastdb.ensembl.org'
>    -user => 'anonymous'
>    );
> 
> my $LD_BASE_RADIUS = 100000;
> my $chrom = "11";
> 
> my $var1 = "rs56391266";
> my $region_start1 = "60376622";
> my $region_end1 = "60376623";
> my $var2 = "rs11291652";
> my $region_start2 = "60360531";
> my $region_end2 = "60360531";
> 
> 
> my $pop_adaptor = $registry->get_adaptor("human","variation","population");
> $pop_adaptor->db->use_vcf(1);
> my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer'); #get adaptor for LDFeatureContainer object
> $ldFeatureContainerAdaptor->db->use_vcf(1);
> $ldFeatureContainerAdaptor->max_snp_distance($LD_BASE_RADIUS);
> 
> my $variation_adaptor = $registry->get_adaptor( 'human', 'variation', 'variation' );
> $variation_adaptor->db->use_vcf(1);
> 	
> my $population_name = "1000GENOMES:phase_3:EUR";
> my $population = $pop_adaptor->fetch_by_name($population_name); #get population object from database
> 	
> my $variation1 = $variation_adaptor->fetch_by_name("$var1");
> my $vfref1 = $variation1->get_all_VariationFeatures();
> my $query_variation_feature1;
> my $found=0;
> foreach my $vf1 (@{$vfref1}) {
>    if ($chrom eq $vf1->seq_region_name) {
> 	if ($region_start1 eq $vf1->seq_region_start && $region_end1 eq $vf1->seq_region_end) {
> 	    $found=1;
> 	    $query_variation_feature1 = $vf1;
> 	}
>    }
> }
> if ($found==0) {die;}
> 
> my $variation2 = $variation_adaptor->fetch_by_name("$var2");
> my $vfref2 = $variation2->get_all_VariationFeatures();
> my $query_variation_feature2;
> $found=0;
> foreach my $vf2 (@{$vfref2}) {
>    if ($chrom eq $vf2->seq_region_name) {
> 	if ($region_start2 eq $vf2->seq_region_start && $region_end2 eq $vf2->seq_region_end) {
> 	    $found=1;
> 	    $query_variation_feature2 = $vf2;
> 	}
>    }
> }
> if ($found==0) {die;}
> 
> my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_VariationFeatures([$query_variation_feature1, $query_variation_feature2],$population); #retrieve all LD values in the region
> 
> foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
>    if ($r_square->{r2} >= 0.1){ 
> 	print "variant 1: ".$r_square->{variation1}->variation_name." ".$r_square->{variation1}->seq_region_start." ".$r_square->{variation1}->seq_region_end."\n";
> 	print "variant 2: ".$r_square->{variation2}->variation_name." ".$r_square->{variation2}->seq_region_start." ".$r_square->{variation2}->seq_region_end."\n";
> 	print "r^2 ".$r_square->{r2}."\n";
>    }
> }
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
    
    
More information about the Dev
mailing list