[ensembl-dev] Problems with fetch_by_VariationFeatures
    Anja Thormann 
    anja at ebi.ac.uk
       
    Wed May 17 14:40:16 BST 2017
    
    
  
Hi Andrew,
I’m sorry for that. I left a debug print in the code. I removed it now. Please update your code again.
Thank you,
Anja
> On 16 May 2017, at 19:10, andrew126 at mac.com wrote:
> 
> Thank you, Anja.
> 
> I refreshed the API.
> 
> Now test code does not print out incorrect LD .. smile.
> 
> However, the test code in OP now dumps this to stdout: 
> 
> /home/andrew/gte/src_88/ensembl-variation/C_code//ld_vcf -f /home/andrew/gte/vcfs/ALL.chr11.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz -r 11:60376621-60376624 -g /home/andrew/gte/vcfs/ALL.chr11.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz -s 11:60360530-60360532 -l HG00158,NA12812,HG00253,NA20808,HG01762,HG00309,NA20783,HG00320,NA20538,NA11931,HG00338,HG00139,NA20525,HG00272,NA11930,NA20797,HG00284,NA20803,HG00137,NA12144,HG00243,NA12287,HG01679,HG01503,NA20752,NA20586,NA20585,HG00383,NA20508,HG00183,HG00118,HG00266,HG01513,HG00171,NA11830,NA20792,HG01680,NA12890,NA20522,NA20819,NA12006,HG00141,NA20768,HG00150,HG01781,HG00250,NA07051,HG00277,HG00366,HG01602,HG00173,NA20587,NA20814,NA20534,HG01507,NA11995,HG01710,NA20818,NA20588,HG00238,NA12748,HG01789,HG00110,HG00269,NA12046,HG01537,HG00346,HG01695,NA20518,NA11992,HG01624,NA20541,NA12873,NA20809,HG00157,NA10851,NA20756,HG00126,NA06984,HG00123,HG01531,HG00234,NA12776,HG00122,NA12156,HG01697,HG00319,HG00140,HG01766,HG00339,HG00376,NA06985,NA20815,NA20775,NA20543,HG00102,NA20509,HG01705,HG00372,NA20533,HG00114,NA20765,HG00345,NA20504,HG01608,NA20821,NA11843,NA20524,HG02232,NA20804,NA12718,NA12348,NA12717,NA20515,NA20807,HG01501,HG02219,HG00111,HG00244,HG01530,HG01612,HG00182,HG00337,HG01747,NA20787,HG00341,NA12546,NA07037,HG00311,NA20530,NA20770,HG00252,HG00336,HG01761,HG01756,NA11881,NA20801,NA12751,HG00097,NA20503,NA20589,NA20521,NA11893,HG00256,HG00133,HG00357,HG00379,HG01673,NA12815,NA12749,NA20539,HG00369,HG01672,NA12878,HG00190,NA20582,HG01700,HG00356,NA20764,NA12249,HG00245,HG00142,NA12874,NA20536,NA12347,NA11829,NA12005,HG00186,HG00329,NA12842,HG00128,HG00106,HG01603,HG00310,NA20532,NA12282,HG01686,HG00323,HG01684,NA20513,HG00178,HG01528,HG00332,HG01770,HG00179,NA12777,NA12775,HG01790,NA12413,HG00349,HG00315,HG01669,HG01334,NA12383,HG01678,NA11933,NA20757,NA12872,NA12763,NA12843,HG01628,NA20822,HG01709,NA11894,NA20542,HG00362,HG00355,NA20527,NA20760,HG01785,NA20514,HG01791,NA12829,HG02238,HG01605,HG00145,NA20826,HG01631,HG00263,NA07347,HG00313,HG01675,HG01757,HG00096,HG00239,HG00260,HG01519,NA20805,HG01522,HG01527,NA07056,HG01623,NA11832,HG00343,HG00282,NA20517,HG01626,NA20795,HG00107,NA20531,HG00330,NA20540,NA20507,NA06989,HG01618,HG00280,HG00103,NA20516,NA20811,HG02220,HG01606,HG01783,HG00237,HG00361,HG01773,HG01694,NA20767,HG00151,HG01607,HG00344,NA20512,HG00265,HG01524,NA20774,HG01699,HG00335,HG00381,NA12778,HG00318,HG00321,NA12340,HG00325,NA12058,HG00326,HG01509,HG01510,NA20802,NA20519,NA20771,NA20763,HG00240,HG00262,NA12003,HG01632,HG00285,HG00254,HG00187,NA20790,HG01521,HG00176,NA12045,NA11831,HG00108,HG01771,HG00267,HG00100,NA20796,NA11994,HG00131,NA12004,HG00125,HG00112,NA11918,HG02215,HG00101,HG01765,HG00143,HG01625,NA11920,HG01518,NA12716,HG00328,HG00154,NA11919,HG02239,NA07357,HG00188,HG00268,HG00274,HG01704,NA20511,HG01670,NA07048,HG00368,HG00246,NA20778,HG00290,HG00360,HG00160,HG01610,NA20506,HG00235,HG01619,HG01777,HG00251,NA20581,HG00180,HG00109,HG00281,HG01613,HG00351,HG00174,HG00334,NA06986,HG00138,HG01506,HG00261,HG00380,NA20505,HG00129,HG00181,NA12813,HG01767,NA20812,NA12762,HG01516,NA20828,NA20529,NA20769,NA12414,HG00155,NA20827,NA12828,HG02233,NA12827,HG00130,HG00327,HG00373,HG00149,HG01776,HG00257,HG00115,HG00148,HG00255,NA20772,HG00159,HG00271,NA12234,NA20510,NA12889,HG01708,HG00264,HG00382,NA20753,NA06994,NA12342,HG00127,HG02223,NA12830,NA12044,HG00099,HG01512,HG01630,HG00132,HG00350,HG00259,NA20832,HG00119,HG00324,HG01775,HG00189,HG00331,NA20502,HG02230,NA12750,NA12814,NA20754,NA11932,NA12341,NA20535,NA20759,NA12283,HG01702,HG01779,HG00371,HG00367,NA12489,NA20766,NA12275,HG00365,NA20798,HG01682,HG00273,HG00278,HG00375,HG01685,HG02235,NA20520,HG00342,NA20786,HG02231,HG01620,NA20761,HG00358,NA12761,HG00120,NA20800,HG00233,HG00275,HG00236,HG02221,NA20813,NA20785,NA07000,HG00121,HG01668,NA11892,NA12043,NA20773,HG01786,NA12286,HG00308,NA12272,HG01536,NA12760,HG01784,HG00105,NA12399,NA12273,HG00384,NA20762,HG00177,HG00232,NA20810,HG00304,NA20799,HG01617,HG00353,NA20544,HG01676,HG00117,HG02236,NA10847,NA20806,HG00113,HG00116,NA12400,HG01746,HG01500,NA11840,HG00242,HG01707,NA20758,HG00185,HG01515,HG00231,NA12155,HG01615,HG00258,HG00288,HG01768,HG01525,HG00378,NA20755,HG00136,HG00364,HG00146,HG00276,NA12154,HG02224,HG00306,HG01504,NA20528
> 
> Is that on purpose?
> 
> I think stdout is because of:
> 	print "$cmd\n";
> (line 595, LDFeatureContainerAdaptor.pm)
> 
> Is that line needed?
> 
> Thank you.
> 
> Best,
> 
> Andrew
> 
> 
> 
>> On May 16, 2017, at 4:33 AM, Anja Thormann <anja at ebi.ac.uk> wrote:
>> 
>> Hi Andrew,
>> 
>> this is now fixed. Please refresh your checkout of the ensembl-variation API.
>> 
>> Kind regards,
>> Anja
>> 
>>> On 11 May 2017, at 09:17, andrew126 at mac.com wrote:
>>> 
>>> Hi Anja,
>>> 
>>> Great .. I am glad they reproduce for you .. smile.
>>> 
>>> We will wait for changes.
>>> 
>>> Many thanks,
>>> 
>>> Andrew
>>> 
>>>> On May 9, 2017, at 9:08 AM, Anja Thormann <anja at ebi.ac.uk> wrote:
>>>> 
>>>> Hi Andrew,
>>>> 
>>>> thank you very much for the excellent bug report. You have again found some very subtle bugs:
>>>> 
>>>> 1) rs56391266 doesn’t have any genotypes from 1kg which we can use for calculating LD. I will add a warning message to the code which will be thrown if this is the case.
>>>> 2) the result you get is the LD computation for rs11291652 and rs72643554. The name rs11291652 gets overwritten by rs72643554 because they are both are located at the same position. I have to add a better filtering to make sure we return the correct LD stats for the given input variants. (For your case it would return no results because we only have genotypes for one of your variants)
>>>> 
>>>> I fix 1) and 2) asap and will let you know when the updated code is available.
>>>> 
>>>> Thank you,
>>>> Anja
>>>> 
>>>>> On 6 May 2017, at 21:43, andrew126 at mac.com wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am using v88 of the API.
>>>>> 
>>>>> I have two variants recovered from REST eQTL data:
>>>>> 
>>>>> rs56391266
>>>>> http://www.ensembl.org/Homo_sapiens/Variation/Explore?r=11:60376122-60377123;v=rs56391266;vdb=variation;vf=11255007
>>>>> It has no reported allele frequencies in 1000GENOMES:phase_3 datasets
>>>>> 
>>>>> rs11291652
>>>>> http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=11:60360031-60361031;v=rs11291652;vdb=variation;vf=6622904
>>>>> It is typed in 1000GENOMES:phase_3 datasets
>>>>> 
>>>>> Using script I paste below, I try to calculate r2 between these two variants in 1000GENOMES:phase_3:EUR.
>>>>> 
>>>>> Here is the output of script:
>>>>> 
>>>>> 	variant 1: rs72643554 60360531 60360531
>>>>> 	variant 2: rs72643554 60360531 60360531
>>>>> 	r^2 0.180796
>>>>> 
>>>>> Here are my concerns/questions:
>>>>> 
>>>>> 	*If rs56391266 is not typed in 1000GENOMES:phase_3:EUR (I did not know this before I calculate r2), how can any result be reported?
>>>>> 	*Function reports neither of the query variants.
>>>>> 	*A single variant is reported twice (rs72643554 .. which lives at same genomic location as rs11291652, but is considered a different variant)
>>>>> 	*If function is calculating r2 between variant and itself, why does it not report 1.000?
>>>>> 
>>>>> Please let me know if I am doing anything incorrectly.
>>>>> 
>>>>> Thanks very much.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Andrew
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> use strict;
>>>>> $|=1;
>>>>> use IPC::Run;
>>>>> use Bio::EnsEMBL::Registry;
>>>>> use Bio::EnsEMBL::ApiVersion; 
>>>>> printf( "The API version used is %s\n", software_version() ); 
>>>>> 
>>>>> my $registry = 'Bio::EnsEMBL::Registry';
>>>>> $registry->load_all();
>>>>> $registry->set_reconnect_when_lost(1);
>>>>> $registry->load_registry_from_db(
>>>>> -host => 'useastdb.ensembl.org', # alternatively ensembldb 'useastdb.ensembl.org'
>>>>> -user => 'anonymous'
>>>>> );
>>>>> 
>>>>> my $LD_BASE_RADIUS = 100000;
>>>>> my $chrom = "11";
>>>>> 
>>>>> my $var1 = "rs56391266";
>>>>> my $region_start1 = "60376622";
>>>>> my $region_end1 = "60376623";
>>>>> my $var2 = "rs11291652";
>>>>> my $region_start2 = "60360531";
>>>>> my $region_end2 = "60360531";
>>>>> 
>>>>> 
>>>>> my $pop_adaptor = $registry->get_adaptor("human","variation","population");
>>>>> $pop_adaptor->db->use_vcf(1);
>>>>> my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer'); #get adaptor for LDFeatureContainer object
>>>>> $ldFeatureContainerAdaptor->db->use_vcf(1);
>>>>> $ldFeatureContainerAdaptor->max_snp_distance($LD_BASE_RADIUS);
>>>>> 
>>>>> my $variation_adaptor = $registry->get_adaptor( 'human', 'variation', 'variation' );
>>>>> $variation_adaptor->db->use_vcf(1);
>>>>> 	
>>>>> my $population_name = "1000GENOMES:phase_3:EUR";
>>>>> my $population = $pop_adaptor->fetch_by_name($population_name); #get population object from database
>>>>> 	
>>>>> my $variation1 = $variation_adaptor->fetch_by_name("$var1");
>>>>> my $vfref1 = $variation1->get_all_VariationFeatures();
>>>>> my $query_variation_feature1;
>>>>> my $found=0;
>>>>> foreach my $vf1 (@{$vfref1}) {
>>>>> if ($chrom eq $vf1->seq_region_name) {
>>>>> 	if ($region_start1 eq $vf1->seq_region_start && $region_end1 eq $vf1->seq_region_end) {
>>>>> 	    $found=1;
>>>>> 	    $query_variation_feature1 = $vf1;
>>>>> 	}
>>>>> }
>>>>> }
>>>>> if ($found==0) {die;}
>>>>> 
>>>>> my $variation2 = $variation_adaptor->fetch_by_name("$var2");
>>>>> my $vfref2 = $variation2->get_all_VariationFeatures();
>>>>> my $query_variation_feature2;
>>>>> $found=0;
>>>>> foreach my $vf2 (@{$vfref2}) {
>>>>> if ($chrom eq $vf2->seq_region_name) {
>>>>> 	if ($region_start2 eq $vf2->seq_region_start && $region_end2 eq $vf2->seq_region_end) {
>>>>> 	    $found=1;
>>>>> 	    $query_variation_feature2 = $vf2;
>>>>> 	}
>>>>> }
>>>>> }
>>>>> if ($found==0) {die;}
>>>>> 
>>>>> my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_VariationFeatures([$query_variation_feature1, $query_variation_feature2],$population); #retrieve all LD values in the region
>>>>> 
>>>>> foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
>>>>> if ($r_square->{r2} >= 0.1){ 
>>>>> 	print "variant 1: ".$r_square->{variation1}->variation_name." ".$r_square->{variation1}->seq_region_start." ".$r_square->{variation1}->seq_region_end."\n";
>>>>> 	print "variant 2: ".$r_square->{variation2}->variation_name." ".$r_square->{variation2}->seq_region_start." ".$r_square->{variation2}->seq_region_end."\n";
>>>>> 	print "r^2 ".$r_square->{r2}."\n";
>>>>> }
>>>>> }
>>>>> 
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
    
    
More information about the Dev
mailing list