[ensembl-dev] Error/warning when using get_r_square of LDFeatureContainer

Anja Thormann anja at ebi.ac.uk
Mon Feb 29 17:37:12 GMT 2016


Hi Johanne,

There are cases when the script cannot return LD values between two SNPs even though they are located on the same chromosome:
- A variant has a minor allele frequency close or equal to 0
- A variant does not have enough genotypes to calculate LD values
- Estimated r2 values are below 0.05 and have been filtered out

The last reason seems to be the case for rs6678275 and rs6656401. I just compared their genotypes for 1000GENOMES:phase_3:CEU and they differ for many samples:
rs6678275
http://www.ensembl.org/Homo_sapiens/Variation/Sample?db=core;r=1:193655603-193656603;v=rs6678275;vdb=variation;vf=104742563#373514_tablePanel

rs6656401
http://www.ensembl.org/Homo_sapiens/Variation/Sample?db=core;r=1:207518204-207519204;v=rs6656401;vdb=variation;vf=104722274#373514_tablePanel

So you are using the function correctly. But it can be the case that two SNPs don't have LD values.

Computing r2 values for all pairwise SNPs on one chromosome can take a very long time. It would make sense to split the chromosome into chunks and then compute LD values for this region. You can use fetch_by_Slice for this (http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1LDFeatureContainerAdaptor.html#ac5fe7e8519a161611200c278744d9b14 <http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1DBSQL_1_1LDFeatureContainerAdaptor.html#ac5fe7e8519a161611200c278744d9b14>). Here is a tutorial of how to define a region and pass the region on to the LDFeatureContainerAdaptor: http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#ld <http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#ld>.

It might also be worth to have a look at different software: PLINK can list all inter-chromosomal SNPs pairs. There is a warning that this will take a long time: http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml <http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml>.

Best,
Anja

> On 29 Feb 2016, at 09:00, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
> 
> 
> Dear Ensembl dev team,
> 
> I try to get rsquare between two SNPs (on the same chromosome). When I use the ldFeatureContainer function get_r_square, however, I get a warning and no return rsquare value. Here is the function signature from the documentation:
> 
> $r_square = $obj->get_r_square($vf1,$vf2,$population_id);
> 
> The script I am using is the following:
> 
> # START SCRIPT
> use strict;
> use warnings;
> use Bio::EnsEMBL::Registry;
> 
> my $start_run = time();
> 
> my $registry = 'Bio::EnsEMBL::Registry';
> 
> $registry->load_registry_from_db(
> -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org/>',
> -user => 'anonymous'
> );
> 
> # Connect to the databases:
> my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');
> my $ldfc_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'ldfeaturecontainer');
> my $pop_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'population');
> $variation_adaptor->db->use_vcf(1);
> 
> my $ld_population = $pop_adaptor->fetch_by_name('1000GENOMES:phase_3:CEU');
> 
> # Build tagSNP list of SNPs on same chromosome:
> my $tagSNP1 = 'rs6678275';
> my $tagSNP2 = 'rs6656401';
> my @variations;
> push (@variations, $variation_adaptor->fetch_by_name($tagSNP1));
> push (@variations, $variation_adaptor->fetch_by_name($tagSNP2));
> 
> my @all_features;
> foreach my $variation (@variations) {
> my @var_features = @{ $variation->get_all_VariationFeatures() };
> foreach my $vf (@var_features) {
> push (@all_features, $vf);
> }
> }
> 
> my $ldfc = $ldfc_adaptor->fetch_by_VariationFeatures(\@all_features, $ld_population);
> 
> # Print r2 between all SNPs:
> foreach my $vf (@all_features) {
> foreach my $vf2 (@all_features) {
> my $rsid = $vf->name;
> my $rsid2 = $vf2->name;
> my $start = $vf->start;
> my $start2 = $vf2->start;
> my $region = $vf->seq_region_name;
> my $region2 = $vf2->seq_region_name;
> 
> print "$region\t$rsid\t$start\n";
> print "$region2\t$rsid2\t$start2\n";
> 
> my $r2 = $ldfc->get_r_square($vf, $vf2);
> #print "\tr2=$r2\n";
> }
> }
> # END SCRIPT
> 
> I get this printout:
> 
> $ perl rsquareTagSNPs.pl out_tagsnps.txt 0.9
> 1
> rs6678275 193656103
> 1
> rs6678275 193656103
> 
> -------------------- WARNING ----------------------
> MSG: variation features have no pairwise ld information
> FILE: EnsEMBL/Variation/LDFeatureContainer.pm LINE: 207
> CALLED BY: rsquareTagSNPs.pl  LINE: 59
> Date (localtime)    = Mon Feb 29 09:47:44 2016
> Ensembl API version = 83
> ---------------------------------------------------
> 1
> rs6678275 193656103
> 1
> rs6656401 207518704
> 
> -------------------- WARNING ----------------------
> MSG: variation features have no pairwise ld information
> FILE: EnsEMBL/Variation/LDFeatureContainer.pm LINE: 207
> CALLED BY: rsquareTagSNPs.pl  LINE: 59
> Date (localtime)    = Mon Feb 29 09:47:44 2016
> Ensembl API version = 83
> ---------------------------------------------------
> 1
> rs6656401 207518704
> 1
> rs6678275 193656103
> 
> -------------------- WARNING ----------------------
> MSG: variation features have no pairwise ld information
> FILE: EnsEMBL/Variation/LDFeatureContainer.pm LINE: 207
> CALLED BY: rsquareTagSNPs.pl  LINE: 59
> Date (localtime)    = Mon Feb 29 09:47:44 2016
> Ensembl API version = 83
> ---------------------------------------------------
> 1
> rs6656401 207518704
> 1
> rs6656401 207518704
> 
> -------------------- WARNING ----------------------
> MSG: variation features have no pairwise ld information
> FILE: EnsEMBL/Variation/LDFeatureContainer.pm LINE: 207
> CALLED BY: rsquareTagSNPs.pl  LINE: 59
> Date (localtime)    = Mon Feb 29 09:47:44 2016
> Ensembl API version = 83
> ---------------------------------------------------
> 
> Am I using the function on the wrong kind of ldfc? Or have I misunderstood the documentation? The ultimate goal of my script is to find r2 between all SNPs/rsIDs I pass to the script which are on the same chromosome.
> 
> Best,
> Johanne Håøy Horn
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160229/e17c3492/attachment.html>


More information about the Dev mailing list