DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Implement Ruby String Class Prev / Pred / Prev! / Pred! - Opposite Of Next / Succ Methods

08.26.2006
| 6178 views |
  • submit to reddit
        The Ruby String class includes the "next" function - also called "succ" for "successor" - which advances a string to the next logical item in, usually, an alphanumeric progression. For example, "9" goes to "10" goes to "11"; or "a" goes to "b" goes to "c". There is no opposite call in the standard String class. Searcing on the Web reveals that conventional wisdom regards this as difficult! It's certainly impossible to cover every case, since both "09".next and "9".next return "10", so going backwards from "10", it isn't possible to know whether to return "09" or "9".

Despite such difficulties, it's not actually that hard to make something that covers the vast majority of cases if you accept that it can't ever be perfect. The code below makes a very good job of returning, using "prev", to a string upon which "next" had been called. Extensive comments describe the implementation and its limitations.

class String
  # The opposite of String::next / String::succ. It is impossible to be a
  # *complete* opposite because both "9".next = "10" and "09".next = "10";
  # if going backwards from "10" there's no way to know whether the result
  # should be "09" or "9". Where the first ranged character is about to
  # underflow and the next character is within the same range the result
  # is shrunk down - that is, "10" goes to "9", "aa" goes to "z"; any non-
  # range prefix or suffix is OK, e.g. "+!$%10-=+" goes to "+!$%9-=+".
  # Items in the middle of a string don't do this - e.g. "12.10" goes to
  # "12.09", to match how "next" would work as best as possible.
  #
  # The standard "next" function works on strings that contain *no*
  # alphanumeric characters, using character codes. This implementation
  # of "prev" does *not* work on such strings - while strings may contain
  # any characters you like, only the alphanumeric components are operated
  # upon.
  #
  # Should total underflow result, "nil" will be returned - e.g. "00".prev
  # returns 'nil', as does "a".prev. This is done even if there are other
  # characters in the string that were not touched - e.g. "+0.0".prev
  # also returns "nil". Broadly speaking, a "nil" return value is used for
  # any attempt to find the previous value of a string that could not have
  # been generated using "next" in the first place.
  #
  # As with "next" sometimes the result of "prev" can be a little obscure
  # so it is often best to try out things using "irb" if unsure. Note in
  # particular that software revision numbers do not necessarily behave
  # predictably, because they don't with "next"! E.g. "12.4.9" might go to
  # "12.4.10" for a revision number, but "12.4.9".next = "12.5.0". Thus
  # "12.5.0".prev = "12.4.9" and "12.4.10".prev = "12.4.09" (because the
  # only way to make "12.4.10" using "next" is to start at "12.4.09").
  #
  # Since 'succ' (successor) is an alias for 'next', so 'pred'
  # (predecessor) is an alias for 'prev'.
  #
  def prev(collapse = false)
    str        = self.dup
    early_exit = false
    any_done   = false
    ranges     = [
                   ('0'[0]..'9'[0]),
                   ('a'[0]..'z'[0]),
                   ('A'[0]..'Z'[0]),
                   nil
                 ]

    # Search forward for the first in-range character. If found check
    # to see if that character is "1", "a" or "A". If it is, record
    # its index (from 0 to string length - 1). We'll need this if
    # underflows wrap as far as the found byte because in that case
    # this first found byte should be deleted ("aa..." -> "z...",
    # "10..." -> "9...").

    first_ranged = nil

    for index in (1..str.length)
      byte = str[index - 1]

      # Determine whether or not the current byte is a number, lower case
      # or upper case letter. We expect 'select' to only find one matching
      # array entry in 'ranges', thus we dereference index 0 after the
      # 'end' to put a matching range from within 'ranges' into 'within',
      # or 'nil' for any unmatched byte.

      within = ranges.select do |range|
        range.nil? or range.include?(byte)
      end [0]

      unless within.nil?
        case within.first
          when '0'[0]
            match_byte = '1'[0]
          else
            match_byte = within.first
        end

        first_ranged = index - 1 if (byte == match_byte)
        first_within = within
        break
      end
    end

    for index in (1..str.length)

      # Process the input string in reverse character order - fetch the
      # bytes via negative index.

      byte = str[-index]

      within = ranges.select do |range|
        range.nil? or range.include?(byte)
      end [0]

      # Skip this letter unless within a known range. Otherwise note that
      # at least one byte was able to be processed.

      next if within.nil?
      any_done = true

      # Decrement the current byte. If it is still within its range, set
      # the byte and bail out - we're finished. Flag the early exit. If
      # the byte is no longer within range, wrap the character around
      # and continue the loop to carry the decrement to an earlier byte.

      byte = byte - 1

      if (within.include? byte)
        str[-index] = byte
        early_exit  = true
        break
      else
        str[-index] = within.last

        # If we've just wrapped around a character immediately after the
        # one found right at the start ('0', 'a' or 'A') then this first
        # ranged character should be deleted (so "10" -> "09"

        if (first_ranged != nil and first_within.include?(byte + 1) and (first_ranged - str.length) == -(index + 1))
          str.slice!(-(index + 1))
          early_exit = true
          break
        end
      end

    end # From outer 'for' loop

    # If we did process at least one byte but we did not exit early, then
    # the loop completed due to carrying a decrement to other bytes. This
    # means an underflow condition - return 'nil'.

    if (any_done == true and early_exit == false)
      return nil
    else
      return str
    end
  end

  # As (extended) String::pred / String::prev, but modifies the string in
  # place rather than returning a copy. If underflow occurs, the string
  # will be unchanged. Returns 'self'.
  #
  def prev!
    new_str = prev
    self.replace(new_str) unless new_str.nil?
    return self
  end

  alias pred  prev
  alias pred! prev!

end

To prove it does an OK job, try some test code:

# Test code - create strings of random lengths between 1 and 16
# characters inclusive, including randomly selected characters from
# '0' to '9', 'a' to 'z' and 'A' to 'Z' inclusive. Check that 'next'
# then 'prev' returns to the original string and vice versa; ensure
# that the in-place functions ('!' variants) behave the same way as
# their transient counterparts and ensure that the aliases work too.
#
# Pass a number of iterations (that is, the number of random strings
# to generate).
#
def run_tests(iterations = 1000)
  puts "\n"
  puts "Running tests...\n"

  iterations.times do
    chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a
    @newpass = ""
    len = rand(16)+1
    1.upto(len) { |i| @newpass << chars[rand(chars.size-1)] }

    next if @newpass.prev.nil?

    @passcp1 = @newpass.dup; @passcp1.prev!
    @passcp2 = @newpass.dup; @passcp2.pred!

    if (
         @newpass != @newpass.next.prev or @newpass != @newpass.prev.next or
         @newpass != @newpass.succ.pred or @newpass != @newpass.pred.succ or
         @newpass.prev != @passcp1 or
         @newpass.pred != @passcp2
       )

      puts "Failed on '#{@newpass}'\n"
      puts " with --> '#{@newpass}' vs '#{@newpass.next.prev}' vs '#{@newpass.prev.next}'\n"
      puts "   or --> '#{@newpass.prev}' vs '#{@passcp1}'\n"
      puts "   or --> '#{@newpass.pred}' vs '#{@passcp2}'\n"

      raise "Tests failed!"
    end
  end

  puts "Tests completed successfully.\n"
end

# Example call to test code.

run_tests(10000)
    

Comments

Snippets Manager replied on Mon, 2009/05/25 - 11:41pm

RegExp offers us a shorter, faster and easier to read (?) alternative: class String # TODO: unicode support def pred # '0Aa\0' are the "zero" characters, '9Zz\377' the "one-less" characters. # The basic idea is to find the trailing zeros, borrow (the counterpart to # carry) 1 from the previous non-zero and set the zeros to one-less. There # are two main cases: strings with alphanumerics and pure punctuation # strings. # For strings with alphanumerics, ignore all punctuation. The string will # either end with a non-zero followed possibly by some zeros, or will have # only zeros. In the former case, borrow 1 from the non-zero and set the # trailing zeros to one-less. In the latter case, the string will have a # predecessor if the first two zeros it contains are 'AA' or 'aa', in which # case the first 'A' or 'a' can be removed and the others set to one-less. # If the string is non-alphanumerics followed by a zero, it will also have # a predecesor (an all-punctuation one). # For punctuation strings, we have to borrow for trailing nulls. An all # null string will thus result in underflow. The only other case to consider # is if decrementing results in an alphanumeric, which is not a valid # predecessor for a punctuation string. match=false if /[0-9A-Za-z]/.match(self) preds = [ [/[1-9B-Zb-z](?:[^0-9A-Za-z]|[0Aa])*$/, # borrow proc { |m| match=true m.tr('a-zA-Z0-9', 'za-yZA-Y90-8') }], [/^([^0Aa]*)([Aa])(\2.*)$/, # eg aaA0 -> zZ9 proc { |m| match=true $1 + $3.tr('a-zA-Z0-9', 'za-yZA-Y90-8') }], [/^([^0-9A-Za-z]*)([0Aa])$/, proc { |m| match=true m[-1] -= 1 m }] ] preds.each { |re, repl| new_str = self.sub(re, &repl) return new_str if match } else # no alphanumeric characters new_str = self.sub(/[^\0]\0*$/) { |m| m[0]-=1 if /^[^9Zz]/.match(m) match=true m.tr("\0", "\377") end } return new_str if match end return nil end def pred! new_str = pred self.replace(new_str) unless new_str.nil? return self end alias prev pred alias prev! pred! end This implementation is slightly different in behavior than the original, loop based implementation. It doesn't strip leading 0s and it decrements non-alphanumeric strings. On my MacBook running Ruby 1.8.7, the RegExp based String#pred was ~10x faster overall when profiled.
  %   cumulative   self              self     total
t time   seconds   seconds    calls  ms/call  ms/call  name
  3.80    70.79      4.71    10000     0.47     4.13  String#prevloop
  3.61    75.27      4.48    10000     0.45     4.11  String#predloop
...
  2.95    87.11      3.66    20000     0.18     0.41  String#predre
  2.78    90.56      3.45    20000     0.17     0.38  String#prevre
'predloop' is the original looping pred; 'predre' is the RegExp based pred. An earlier version of predre, more compatible with prevloop, was even faster:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  8.59   243.37     35.91    59840     0.60     3.73  String#prevloop
  4.22   284.16     17.63    29911     0.59     3.73  String#predloop
...
  4.35    57.56      3.34    49814     0.07     0.16  String#predre
  3.87    60.53      2.97    39886     0.07     0.16  String#prevre
To test the correctness, I used a slightly altered version of the original test. Four tests were performed: str.pred.succ == str, str.succ.pred == str, str.pred == str.pred! and str.pred == str.predloop. I also added edge cases. The only tests that failed were by design: predloop stripped a leading 0 and predre didn't, or predre decremented a non-alphanumeric string.
Running tests...
"09" == "10".pred != "10".predloop == "9"
"--," == "---".pred != "---".predloop == "---"
"--`" == "--a".pred != "--a".predloop == nil
"@" == "A".pred != "A".predloop == nil
"/" == "0".pred != "0".predloop == nil
"`" == "a".pred != "a".predloop == nil
...
Finally, here's the test code. require 'pred' require 'predloop' def testErrMsg(str, rval, rexpr, lval=nil, lexpr=nil) if lval.nil? return '%s != "%s".%s == %s' % [lval.inspect, str, rexpr, rval.inspect] else return '%s == "%s".%s != "%s".%s == %s' % [lval.inspect, str, lexpr, str, rexpr, rval.inspect] end end def test(newpass) passcp = newpass.dup; passcp.pred! fails=0 if newpass != newpass.succ.pred puts testErrMsg(newpass, newpass.succ.pred, 'succ.pred') fails += 1 end if newpass != newpass.pred.succ puts testErrMsg(newpass, newpass.pred.succ, 'pred.succ') fails += 1 end if newpass.pred != passcp puts testErrMsg(newpass, passcp, 'pred!', newpass.pred, 'pred') fails += 1 end if newpass.pred != newpass.predloop puts testErrMsg(newpass, newpass.predloop, 'predloop', newpass.pred, 'pred') fails += 1 end return fails end def run_tests(iterations = 1000) puts "\n" puts "Running tests...\n" fails=0 %w(aa z 00 10 a0 b0 Aa AAa Aaa 0A <> ++b10++ 54.32.10 32.a.0 32.e.0 32.a.10 32.e.10 --- --a).each do |newpass| next if newpass.prev.nil? fails += test(newpass) end chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a iterations.times do @newpass = "" len = rand(16)+1 1.upto(len) { |i| @newpass << chars[rand(chars.size-1)] } next if @newpass.prev.nil? fails += test(@newpass) end if (fails > 0) print fails, " tests failed.\n" else puts "Tests completed successfully.\n" end end # Example call to test code. run_tests(10000)

Snippets Manager replied on Mon, 2009/05/25 - 11:41pm

To put the problem another way, String#succ isn't invertible. No matter the implementation String#prev, there will always be some str such that "str.succ.prev != str". The other problem (underflow) is that succ maps into, not onto, the set of strings. What we can achieve is an invertible prev, a prev such that "str.prev.succ == str", as long as str is the successor to something.