[ensembl-dev] Missense SNP's and frequencies

Jens Christian Nielsen jcfnielsen at gmail.com
Mon Mar 11 11:52:18 GMT 2013


For a list of genbank accession numbers i wanna extract all missense
variations and their frequencies. Right now my script extracts all snp's
from the slice ($slice), but how can I restrict it to only print the snp's
that lead to a change in the protein sequence? Also, i want it to return
the frequencies of the snp's?

use Bio::EnsEMBL::Registry;
my $reg = 'Bio::EnsEMBL::Registry';
$reg->load_registry_from_db(-host => 'ensembldb.ensembl.org', -user =>
'anonymous');
my $gene_name = shift;
my $ga = $reg->get_adaptor('Human', 'Core', 'Gene');
my $sa = $reg->get_adaptor('Human', 'Core', 'Slice');
my $vfa = $reg->get_adaptor('Human', 'Variation', 'VariationFeature');

my $genes = $ga->fetch_all_by_external_name($gene_name);
while (my $gene = shift @{$genes}) {
  my $chr   = $gene->seq_region_name;
  my $start = $gene->seq_region_start;
  my $end   = $gene->seq_region_end;
  my $region = sprintf "%s:%d-%d", $chr, $gene->start, $gene->end;
  print join("\t", ($gene->stable_id, $region, $length,
$gene->external_name, $gene->description) ), "\n";
  my $slice = $sa->fetch_by_region('chromosome', $chr, $start, $end);
  my @vfs = @{$vfa->fetch_all_by_Slice($slice)};
  for my $vf (@vfs) {
    print
      $vf->variation_name, ' has alleles ', $vf->allele_string,
      ' located at ', $slice->seq_region_name, ':',
      $vf->seq_region_start, '-', $vf->seq_region_end, "\n";
  }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130311/0af8e882/attachment.html>


More information about the Dev mailing list