[ensembl-dev] API LD calculation questions

Anja Thormann anja at ebi.ac.uk
Mon Jul 18 16:55:12 BST 2016


Hi Andrew,

I think we fixed the bug you are describing in 2). Could you please pull the changes from ensembl-variation/release-84 branch and check if that is solving the problem?

For 1) yes, we are calculating LD only for bi-allelic variants at the moment. It would definitely be great to extend the computation to multi-allelic variants. We will keep you updated about any progress we are going to do with this.

Regards,
Anja


> On 17 Jul 2016, at 11:19, andrew126 at mac.com wrote:
> 
> Hi,
> 
> I am using API 84 on Ubuntu (64-bit).
> 
> 1)
> 
> Is it correct that LD will not be calculated for any variants having more than 2 alleles?  (It looks like being biallelic is a very hard coded requirement of calc_genotypes.c)
> 
> 2)
> 
> I am confused by some discrepancies I see with the Ensembl LD calculations vrs. other LD calculators.
> 
> As an example, imagine I want all variants in high LD with rs13181561 that occur in a particular slice.
> 
> LD calculations using the below perl API script find an r^2 of >0.6 between rs13181561 and rs372693892.
> 
> This is confusing because no other LD calculators find this result, and Ensembl's web page on the variant indicates that no sample genotypes are available:
> 
> http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=5:139447545-139448545;v=rs372693892;vdb=variation;vf=54723062 <http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=5:139447545-139448545;v=rs372693892;vdb=variation;vf=54723062>
> 
> Further, the script does NOT find rs7446197 as being in high LD with rs13181561 (other engines find it to have r^2 > 0.6).
> 
> I do not know if there is any complication because both rs372693892 and rs7446197 have the same genomic location?
> 
> Can you provide some guidance on this?
> 
> Please let me know if any other information would be useful.
> 
> Thanks very much.
> 
> Best,
> 
> Andrew
> 
> 
> 
> use strict;
> $|=1;
> use IPC::Run;
> use Bio::EnsEMBL::Registry;
> 
> use Bio::EnsEMBL::ApiVersion; 
> printf( "The API version used is %s\n", software_version() ); 
> 
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_all();
> $registry->set_reconnect_when_lost(1);
> $registry->load_registry_from_db(
>     -host => 'useastdb.ensembl.org <http://useastdb.ensembl.org/>',
>     -user => 'anonymous'
>     );
> 
> my $slice_adaptor = $registry->get_adaptor('human', 'core', 'slice');
> my $population_adaptor = $registry->get_adaptor('human', 'variation', 'population');
> my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer');
> $ldFeatureContainerAdaptor->db->use_vcf(1);
> $registry->set_reconnect_when_lost(1);
> 
> my $pop = "1000GENOMES:phase_3:AMR";
> my $chr = 5;
> my $low = 139445000;
> my $high = 139475000;
> 
> my $slice = $slice_adaptor->fetch_by_region('chromosome',$chr,$low,$high); 
> my $population = $population_adaptor->fetch_by_name($pop);
> my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_Slice($slice,$population);
> foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
>     if ($r_square->{r2} >= 0.05){ 
> 	if ($r_square->{variation1}->variation_name eq "rs13181561" || $r_square->{variation2}->variation_name eq "rs13181561") {
> 	    print "High r2 (".($r_square->{r2}).") between variations ", $r_square->{variation1}->variation_name, "-", $r_square->{variation2}->variation_name, " in $pop\n";
> 	}
>     }
> }
> 
> Output from script:
> 
> High r2 (0.692506) between variations rs11954057-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.625299) between variations rs372693892-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.900663) between variations rs111805170-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.686462) between variations rs10463977-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.264367) between variations rs28419191-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.375759) between variations rs7380062-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.903044) between variations rs13181561-rs13153461 in 1000GENOMES:phase_3:AMR
> High r2 (0.898459) between variations rs71574449-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.883522) between variations rs13181561-rs55792153 in 1000GENOMES:phase_3:AMR
> High r2 (0.360986) between variations rs73257323-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.898459) between variations rs9716069-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.702607) between variations rs763007006-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.533529) between variations rs34149439-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.096040) between variations rs114079768-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.286249) between variations rs10875554-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.853556) between variations rs34580585-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.702607) between variations rs36137978-rs13181561 in 1000GENOMES:phase_3:AMR
> High r2 (0.088433) between variations rs140049780-rs13181561 in 1000GENOMES:phase_3:AMR
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160718/3d3a62ce/attachment.html>


More information about the Dev mailing list