FasterGenerator (#66)

Ruby includes a useful Generator library for switching internal iterators to external iterators. It is used like this:

ruby
require 'generator'

# Generator from an Enumerable object
g = Generator.new(['A', 'B', 'C', 'Z'])

while g.next?
puts g.next
end

I've heard complaints in the past that this library is slow. One reason is that it was implemented with continuations, which have performance issues in the current version of Ruby. "Was" is the keyword there though, because I've just learned that Generator was recently re-implemented. I learned some good tricks reading the new version, so let's try fixing Generator ourselves. (No peeking at the new version!)

This week's Ruby Quiz is to write FasterGenerator, your own re-implementation of Generator with an eye towards working faster. (This is a small library. My version is 54 lines.) It is possible to go even faster than the new implementation, with certain trade-offs:

### Construction ###

Rehearsal -----------------------------------------------------------
Current Generator 0.280000 0.320000 0.600000 ( 0.598148)
Old callcc Generator 0.630000 1.070000 1.700000 ( 1.701587)
James's FasterGenerator 0.000000 0.000000 0.000000 ( 0.003935)
-------------------------------------------------- total: 2.300000sec

user system total real
Current Generator 0.710000 0.640000 1.350000 ( 1.356412)
Old callcc Generator 0.220000 1.540000 1.760000 ( 1.762054)
James's FasterGenerator 0.000000 0.000000 0.000000 ( 0.003240)

### next() ###

Rehearsal -----------------------------------------------------------
Current Generator 15.960000 0.110000 16.070000 ( 16.116409)
Old callcc Generator 5.840000 34.560000 40.400000 ( 40.452677)
James's FasterGenerator 0.040000 0.000000 0.040000 ( 0.038548)
------------------------------------------------- total: 56.510000sec

user system total real
Current Generator 15.910000 0.100000 16.010000 ( 16.063431)
Old callcc Generator 5.810000 34.430000 40.240000 ( 40.300504)
James's FasterGenerator 0.030000 0.000000 0.030000 ( 0.036539)

It you want to see the class documentation, you can find it here:

Generator Documentation

If you want to make sure your implementation is correct, you can use these tests straight out of the current implementation:

ruby
require 'test/unit'

class TC_Generator < Test::Unit::TestCase
def test_block1
g = Generator.new { |g|
# no yield's
}

assert_equal(0, g.pos)
assert_raises(EOFError) { g.current }
end

def test_block2
g = Generator.new { |g|
for i in 'A'..'C'
g.yield i
end

g.yield 'Z'
}

assert_equal(0, g.pos)
assert_equal('A', g.current)

assert_equal(true, g.next?)
assert_equal(0, g.pos)
assert_equal('A', g.current)
assert_equal(0, g.pos)
assert_equal('A', g.next)

assert_equal(1, g.pos)
assert_equal(true, g.next?)
assert_equal(1, g.pos)
assert_equal('B', g.current)
assert_equal(1, g.pos)
assert_equal('B', g.next)

assert_equal(g, g.rewind)

assert_equal(0, g.pos)
assert_equal('A', g.current)

assert_equal(true, g.next?)
assert_equal(0, g.pos)
assert_equal('A', g.current)
assert_equal(0, g.pos)
assert_equal('A', g.next)

assert_equal(1, g.pos)
assert_equal(true, g.next?)
assert_equal(1, g.pos)
assert_equal('B', g.current)
assert_equal(1, g.pos)
assert_equal('B', g.next)

assert_equal(2, g.pos)
assert_equal(true, g.next?)
assert_equal(2, g.pos)
assert_equal('C', g.current)
assert_equal(2, g.pos)
assert_equal('C', g.next)

assert_equal(3, g.pos)
assert_equal(true, g.next?)
assert_equal(3, g.pos)
assert_equal('Z', g.current)
assert_equal(3, g.pos)
assert_equal('Z', g.next)

assert_equal(4, g.pos)
assert_equal(false, g.next?)
assert_raises(EOFError) { g.next }
end

def test_each
a = [5, 6, 7, 8, 9]

g = Generator.new(a)

i = 0

g.each { |x|
assert_equal(a[i], x)

i += 1

break if i == 3
}

assert_equal(3, i)

i = 0

g.each { |x|
assert_equal(a[i], x)

i += 1
}

assert_equal(5, i)
end
end

The Generator library also includes a SyncEnumerator class, but it is written to use Generator and will work fine with a new version, as long as it is API-compatible.


Quiz Summary

There were two kinds of solutions to this problem. The first were Array-based solutions. The logic here is that we can do the iteration, shove all the results in an Array, and then hand them out one-at-a-time. Here is Christoffer Lerno's solution, doing exactly that:

ruby
class MyGenerator
attr_reader :index
def initialize(enum = nil)
if enum then
@array = enum.to_a
else
@array = Array.new
yield self
end
@index = 0
end
def current
raise EOFError unless next?
@array[@index]
end
def next
value = current
@index += 1
return value
end
def next?
return @index < @array.length
end
def rewind
@index = 0
self
end
def each(&block)
@array.each(&block)
end
def yield(value)
@array << value
end
def pos
return @index
end
def end?
return !next?
end
end

Even though Generator supposedly works off of the each() method, it claims to accept an Enumerable argument. If we go on that assumption, there should be a to_a() method which will return all the objects each() iteration would. (If you are uncomfortable with this logic, you could iterate and add them to an Array by hand.) You can see Christoffer putting this shortcut to work in initialize(), filling @array with a simple call to to_a().

The rest of the methods are trivial. An @index is saved pointing into the @array, which can be incremented or reset to walk the objects.

The good news: this is wicked fast in comparison to the old or even the new Generator library. The bad news: it doesn't work for all cases. First, an iterator does not have to be finite, and trying to shove something like a lazily evaluated infinite stream in an Array is going to go poorly. Furthermore, iteration may be sensitive to external factors like when it is executed. In those cases, running the full iteration up front may produce different results.

For these reasons the standard Generator library requires a more general solution. However, I would still tuck the Array-and-index trick away in the back of your mind, as it may just come in handy some day. If you can be sure you aren't dealing with any of the edge cases mentioned above and sure you can spare the memory, using an Array and an index is simple and fast.

For all other times, you have the standard Generator library.

We know that the library use to work with continuations, but now it has a new implementation. The new technique was the other one used by the solutions. It goes like this: launch a Thread to iterate over the elements, stop the Thread just before each iteration, and wake it up each time you need a new item.

Let's see how Jacob Fugal implemented that (with a small patch from Ross Bamford). Here's the setup:

ruby
require 'thread'

class FasterGenerator
def initialize( enum=nil, &block )
@position = 0
@values = []
@done = false
@mutex = Mutex.new
@block = block

if enum
@block = proc{ |g|
enum.each{ |x| g.yield x }
}
end
end

# ...

This almost looks like the Array-and-index trick again, when you see @position and @values, but this version isn't going to slurp the values all at once. @done is just a flag to tell us when we have exhausted the values and @mutex is a locking tool from the thread library.

Now @block is a little more interesting. Generator can be called in two ways, either with an Enumerable object, or with a block that yield()s values to the Generator. The code above merges the two cases, by providing a block that yield()s through each() when the Enumerable object is given.

That means the next piece of the puzzle is the yield() method:

ruby
# ...

def yield( x )
@mutex.synchronize{ @values << x }
Thread.stop
end

# ...

We're starting to see threading work here. This method grabs the lock, adds the object to @values, drops the lock, and halts the Thread that fetched the object. The lock is to ensure that multiple Threads don't stomp on the internal @values queue at the same time and the halting is the pause before each iteration I mentioned earlier.

So, let's see see what current() and next() look like:

ruby
# ...

def current
collect_value
raise EOFError if eof?
@values[@position]
end

# ...

def next
x = current
@position += 1
x
end

# ...

Seems like all the magic here is hidden in the collect_value() method and everything else is just indexing. Here's the magic method:

ruby
# ...

private
def collect_value
@thread ||= Thread.new {
Thread.stop
@block[self]
@mutex.synchronize{ @done = true }
}
Thread.pass until @thread.stop?
@thread.run if @thread.status and need_value?
Thread.pass while need_value?
end

def need_value?
@mutex.synchronize{ not @position < @values.size and not @done }
end

# ...

collect_value() makes sure we have a Thread running, or creates it if needed. The Thread is trivial: halt (to pause before the first iteration), run the block, grab the lock, toggle the @done flag, and release the lock. Once we are sure there is a Thread, we can pass() the calling Thread until the generator Thread gets a chance to run() and reaches its stop()ing point.

need_value?() is just a check to make sure we don't have one queued, and we're not finished with iteration.

The only challenge left is knowing when we are at the end?() of iteration:

ruby
# ...

def next?
return (not end?)
end

def end?
collect_value
return eof?
end

# ...

private

# ...

def eof?
@mutex.synchronize{ not @position < @values.size and @done }
end
end

The secret to end?() is that we must make sure we have a value queued, if possible. That's because we pause just after each item is added. We might be in the very last pause before iteration would end. Calling collect_value() will find the next item or cause the @done flag to be set. Either way, we then have enough information to properly answer the question.

Here are the rest of the methods for Jacob's solution:

ruby
# ...

def pos
@position
end

# ...

def rewind
@position = 0
self
end

def each
self.rewind
while self.next?
yield self.next
end
end

def index
pos
end

# ...

There shouldn't be any surprises left in there.

Some other solutions that used this same technique were faster, or even slower. This seems primarily related to which tools were used to control Thread synchronization. Mutex, for example, is fairly slow while `Thread.critical = true` is quick and probably all you need in this instance.

Let me send out a big thank you to all who tried this quiz and an even bigger thank you to Ross Bamford for helping me out with the timings.

Tomorrow we have a deep meditation on metaprogramming...