[ensembl-dev] API LD calculation questions

andrew126 at mac.com andrew126 at mac.com
Sun Jul 17 11:19:47 BST 2016


Hi,

I am using API 84 on Ubuntu (64-bit).

1)

Is it correct that LD will not be calculated for any variants having more than 2 alleles?  (It looks like being biallelic is a very hard coded requirement of calc_genotypes.c)

2)

I am confused by some discrepancies I see with the Ensembl LD calculations vrs. other LD calculators.

As an example, imagine I want all variants in high LD with rs13181561 that occur in a particular slice.

LD calculations using the below perl API script find an r^2 of >0.6 between rs13181561 and rs372693892.

This is confusing because no other LD calculators find this result, and Ensembl's web page on the variant indicates that no sample genotypes are available:

http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=5:139447545-139448545;v=rs372693892;vdb=variation;vf=54723062 <http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=5:139447545-139448545;v=rs372693892;vdb=variation;vf=54723062>

Further, the script does NOT find rs7446197 as being in high LD with rs13181561 (other engines find it to have r^2 > 0.6).

I do not know if there is any complication because both rs372693892 and rs7446197 have the same genomic location?

Can you provide some guidance on this?

Please let me know if any other information would be useful.

Thanks very much.

Best,

Andrew



use strict;
$|=1;
use IPC::Run;
use Bio::EnsEMBL::Registry;

use Bio::EnsEMBL::ApiVersion; 
printf( "The API version used is %s\n", software_version() ); 

my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_all();
$registry->set_reconnect_when_lost(1);
$registry->load_registry_from_db(
    -host => 'useastdb.ensembl.org',
    -user => 'anonymous'
    );

my $slice_adaptor = $registry->get_adaptor('human', 'core', 'slice');
my $population_adaptor = $registry->get_adaptor('human', 'variation', 'population');
my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer');
$ldFeatureContainerAdaptor->db->use_vcf(1);
$registry->set_reconnect_when_lost(1);

my $pop = "1000GENOMES:phase_3:AMR";
my $chr = 5;
my $low = 139445000;
my $high = 139475000;

my $slice = $slice_adaptor->fetch_by_region('chromosome',$chr,$low,$high); 
my $population = $population_adaptor->fetch_by_name($pop);
my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_Slice($slice,$population);
foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
    if ($r_square->{r2} >= 0.05){ 
	if ($r_square->{variation1}->variation_name eq "rs13181561" || $r_square->{variation2}->variation_name eq "rs13181561") {
	    print "High r2 (".($r_square->{r2}).") between variations ", $r_square->{variation1}->variation_name, "-", $r_square->{variation2}->variation_name, " in $pop\n";
	}
    }
}

Output from script:

High r2 (0.692506) between variations rs11954057-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.625299) between variations rs372693892-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.900663) between variations rs111805170-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.686462) between variations rs10463977-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.264367) between variations rs28419191-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.375759) between variations rs7380062-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.903044) between variations rs13181561-rs13153461 in 1000GENOMES:phase_3:AMR
High r2 (0.898459) between variations rs71574449-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.883522) between variations rs13181561-rs55792153 in 1000GENOMES:phase_3:AMR
High r2 (0.360986) between variations rs73257323-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.898459) between variations rs9716069-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.702607) between variations rs763007006-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.533529) between variations rs34149439-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.096040) between variations rs114079768-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.286249) between variations rs10875554-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.853556) between variations rs34580585-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.702607) between variations rs36137978-rs13181561 in 1000GENOMES:phase_3:AMR
High r2 (0.088433) between variations rs140049780-rs13181561 in 1000GENOMES:phase_3:AMR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160717/ca7d46b4/attachment.html>


More information about the Dev mailing list