[ensembl-dev] RsIDs consistently fail with variation API, v83

Johanne Håøy Horn johannhh at ifi.uio.no
Sun Mar 6 12:32:21 GMT 2016


Dear ensembl team,

I have gotten quite some help from you previously with the variation api. I wish to expand variations to include the variants they are in LD with. My script works for the most part, but I am unable to run it on some of my rsID files. I have broken the files into increasingly smaller parts, but the runs still fails. Could you help me with what I am doing wrong?

Some of the issue seems to be connected to the chromosome the rsID is on. In many of the failing files of rsIDs, this is what the script produce (example taken from out_0_Asthma3.txt, full output file attached to this email):
6 rs3129969 32414832 r2=0.538176
6 rs3763312 32408571 r2=0.074106
6 rs553071835 32487947 r2=0.058328
CHR_HSCHR6_MHC_DBB_CTG1 rs3129890 32421214
CHR_HSCHR6_MHC_MANN_CTG1 rs3129890 32485127
CHR_HSCHR6_MHC_MCF_CTG1 rs3129890 32521873
CHR_HSCHR6_MHC_QBL_CTG1 rs3129890 32403957
CHR_HSCHR6_MHC_SSTO_CTG1 rs3129890 32453108
CHR_HSCHR6_8_CTG1 rs3129890 32393290
6 rs9272346 32636595

In stead of getting a line of <chromosome> <rsID> <position> <r2> for the variation, the same SNP is repeated multiple times with a chromosome/region of a different format than what I get in the scripts that do not fail.

I have attached some of the GWAS disease files of rsIDs I wish to expand, but which seem to produce errors, and some error printouts I have gotten is pasted below. Two output files belonging to the erroneous input files, with out_0_<inputfilename> as filename is also attached.

Is it something related to the rsIDs I am using? Are you able to run the script with these input files?
I noticed that longer files of rsIDs were more prone to fail, as the connection to the database was broken. Therefore, I have divided the files into multiple small files. For most of the GWAS data, this worked fine, and I get output files on the format I want. But I am now left with 10-15 files of 10-20 rsIDs which fail with various error messages. I have tried calling my script with them multiple times, and as they fail consistently, I do not think it is my internet connection that is the problem.

_START TERMINAL PRINT ERROR 1_
inputfile Asthma3.txt
outputfile out_0_Asthma3.txt
DBD::mysql::st execute failed: Lost connection to MySQL server during query at /Users/Johanne/src/ensembl/modules//Bio/EnsEMBL/DBSQL/BaseAdaptor.pm line 482, <OUT> line 3739.

-------------------- EXCEPTION --------------------
MSG: Detected an error whilst executing SQL 'SELECT  v.variation_id, v.name AS v_name, v.class_attrib_id AS v_class_attrib_id, v.source_id AS v_source_id, v.somatic AS v_somatic, v.flipped AS v_flipped, v.ancestral_allele AS v_ancestral_allele, vs.moltype AS vs_moltype, vs.name AS vs_name, s2.name AS vs_source_name, v.minor_allele, v.minor_allele_freq, v.minor_allele_count, v.clinical_significance, v.evidence_attribs
FROM (( (variation v, source s1)
  LEFT JOIN variation_synonym vs ON v.variation_id = vs.variation_id )
  LEFT JOIN source s2 ON vs.source_id = s2.source_id )
 WHERE v.name = ?  AND

        s1.source_id = v.source_id
     AND v.display = 1
': DBD::mysql::st execute failed: Lost connection to MySQL server during query at /Users/Johanne/src/ensembl/modules//Bio/EnsEMBL/DBSQL/BaseAdaptor.pm line 482, <OUT> line 3739.

STACK Bio::EnsEMBL::DBSQL::BaseAdaptor::generic_fetch /Users/Johanne/src/ensembl/modules//Bio/EnsEMBL/DBSQL/BaseAdaptor.pm:483
STACK Bio::EnsEMBL::Variation::DBSQL::VariationAdaptor::fetch_by_name /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/VariationAdaptor.pm:497
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::_ld_calc /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:865
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_Slice /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:177
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_VariationFeature /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:246
STACK toplevel expandSNPs.pl:42
Date (localtime)    = Sat Mar  5 18:03:25 2016
Ensembl API version = 83
---------------------------------------------------
rs3117098
rs3117098
rs3117098
rs3117098
rs3117098
rs9268516
rs9268516
rs9268516
rs9268516
rs9268516
rs9268516
rs9268516
rs9268516
rs9268516
rs3129890
rs3129890
rs3129890
rs3129890
rs3129890
rs3129890
rs3129890
rs9272346
_END TERMINAL PRINT ERROR 1_

_START TERMINAL PRINT ERROR 2_
inputfile Atopicdermatitis3.txt
outputfile out_0_Atopicdermatitis3.txt
connect: Network is unreachable
rs1050654
_END TERMINAL PRINT ERROR 2_

_START TERMINAL PRINT ERROR 3_
inputfile textfiles/Atopicdermatitis3.txt
outputfile textfiles/out_0_Atopicdermatitis3.txt
Can't call method "get_all_VariationFeatures" on an undefined value at expandSNPs.pl line 33, <IN> line 3.
rs1050654
rs3853601
rs3853601
rs3853601
rs3853601
rs3853601
rs3853601
rs3853601
rs3853601
_END TERMINAL PRINT ERROR 3_

_START SCRIPT_
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $inputfile = $ARGV[0];
my $outputfile = $ARGV[1];
my $r_limit = $ARGV[2];
my $start_run = time();

open (IN, "<$inputfile");
open (OUT, ">$outputfile");

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org<http://ensembldb.ensembl.org>',
-user => 'anonymous'
);

# Connect to the databases:
my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');
my $ldfc_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'ldfeaturecontainer');
my $pop_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'population');
$variation_adaptor->db->use_vcf(1);

my $ld_population = $pop_adaptor->fetch_by_name('1000GENOMES:phase_3:CEU');

# Loop through all SNPs available and find SNPs in LD
while(<IN>) {
chomp;
my $variation_name = $_;
my $variation = $variation_adaptor->fetch_by_name($variation_name);
my @var_features;

if ($variation) {
@var_features = @{ $variation->get_all_VariationFeatures() };
} else {
print 'failing variation name: ', $variation_name, "\n";
next;
}

foreach my $vf (@var_features) {
my $rsid = $vf->name;
print $rsid, "\n";
my $start = $vf->start;
my $region = $vf->seq_region_name;
print OUT "$region\t$rsid\t$start\n";
my $ldfc = $ldfc_adaptor->fetch_by_VariationFeature($vf, $ld_population);
my @ld_values = @{ $ldfc->get_all_ld_values() };
      foreach my $ld_hash (@ld_values) {
        my $r2 = $ld_hash->{r2};
        if ($r2 >= $r_limit) {
    my $variation_name1 = $ld_hash->{variation1}->variation_name;
    my $variation_name2 = $ld_hash->{variation2}->variation_name;
    my $pos1 = $ld_hash->{variation1}->seq_region_name;
    my $pos2 = $ld_hash->{variation2}->seq_region_name;
    my $start1 = $ld_hash->{variation1}->start;
    my $start2 = $ld_hash->{variation2}->start;

    if ($variation_name1 eq $rsid) {
    print OUT "$pos2\t$variation_name2\t$start2\tr2=$r2\n";
    } else {
    print OUT "$pos1\t$variation_name1\t$start1\tr2=$r2\n";
    }
        }
      }
}
}

print "done\n";
close OUT;
close IN;
my $end_run = time();
my $run_time = $end_run - $start_run;
print "Job took $run_time seconds\n";
_END SCRIPT_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Asthma3.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: out_0_Asthma3.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0001.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0001.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Atopicdermatitis3.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0002.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0002.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: out_0_Atopicdermatitis3.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0003.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0003.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Ulcerativecolitis3.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0004.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0004.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Ulcerativecolitis5.txt
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0005.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160306/9b6edf6d/attachment-0005.htm>


More information about the Dev mailing list