[ensembl-dev] Enumerating repeat types for get_all_RepeatFeatures()

John Marshall jm18 at sanger.ac.uk
Mon Jan 16 14:14:20 GMT 2012


I recently noticed from the documentation that Bio::EnsEMBL::Slice::get_all_RepeatFeatures() can optionally take a repeat type argument to limit the types of repeat feature returned:

	Example: @repeat_feats = @{$slice->get_all_RepeatFeatures(undef,'LTR')};

This is very convenient for one of my scripts, which, for any requested species, dumps out a selection of repeat features arranged by type.  However there doesn't appear to be a way to enumerate all the extant repeat types for a given species -- while the type names appear to be fairly standardised, some species are missing a few of them and I wouldn't want to assume that I had seen all possibilities.

I solved this in my script with the following hackish function:

# Returns the set of all types of repeat features in $adaptor's species.
sub fetch_all_repeat_types {
  my ($adaptor) = @_;

  my ($sth, @types, $repeat_type);
  $sth = $adaptor->prepare("SELECT DISTINCT repeat_type FROM repeat_consensus");
  $sth->execute();
  $sth->bind_columns(\$repeat_type);
  while ($sth->fetch()) { push @types, $repeat_type }
  return @types;
}

but of course this makes the script susceptible to schema changes, and ideally such a function would instead be available as Bio::EnsEMBL::DBSQL::RepeatConsensusAdaptor::fetch_all_repeat_types() or similar.  (Admittedly this is probably opening a can of worms as there may well be many other items that could use similar enumeration functions...)

Any thoughts as to the right way to do this?

Thanks,

    John



More information about the Dev mailing list