Word Search (#107)

by Daniel Finnie

Today's quiz would've been most useful in elementary school, where over half of the homework assignments were word search puzzles. The concept of these puzzles is simple enough that an elementary school student could understand it: given a box of letters, find a line containing the letters of a specified word in order.

For example, find the words ruby, dan, rocks, and matz in the following text:

U E W R T R B H C D
C X G Z U W R Y E R
R O C K S B A U C U
S F K F M T Y S G E
Y S O O U N M Z I M
T C G P R T I D A N
H Z G H Q G W T U V
H Q M N D X Z B S T
N T C L A T N B C E
Y B U R P Z U X M S

The correct answer in the correct output format:

+ + + R + + + + + +
+ + + + U + + + + +
R O C K S B + + + +
+ + + + + + Y + + +
+ + + + + + + + + M
+ + + + + + + D A N
+ + + + + + + T + +
+ + + + + + Z + + +
+ + + + + + + + + +
Y B U R + + + + + +

Notice that the words can go backwards and diagonally, and can intersect one another. Searching is case insensitive.

The word search solver should accept input entered by the user after running the program, i.e., not exclusively through STDIN or a file, by entering the puzzle line by line, pressing return after each line. A blank line indicates the end of the puzzle and the start of the comma separated words to find. The following example shows how a user would enter the above puzzle, with descriptive text from the program removed.

$ ./wordsearch.rb
UEWRTRBHCD
CXGZUWRYER
ROCKSBAUCU
SFKFMTYSGE
YSOOUNMZIM
TCGPRTIDAN
HZGHQGWTUV
HQMNDXZBST
NTCLATNBCE
YBURPZUXMS

Ruby, rocks, DAN, matZ

Now, by itself, this quiz is fairly simple, so I offer an additional challenge. Write a beautiful, extensible, and easily-modifiable program without looking at the extra credit before starting. When you're done, try implementing extra credit options using less than 5 or 6 (reasonable) lines of code.

* An output format superior to the one given. The output format given
should remain the default unless both formats don't differ on a
textual basis. That should sound cryptic until pondered, I can't
give too much away!
* Allow for "snaking" of answers, in other words, the letters
composing a word don't have to be in a straight line.
* An option to give a hint, i.e., "The word ruby traverses the bottom
left and bottom right quadrants."
* Decide what to do with accented letters.
* Allow for wildcard letters.


Quiz Summary

I had a few moments this weekend and sat down to work this problem. I was quite surprised to find it tougher than I had expected. I fiddled around for about an hour trying to find a solution I felt was elegant, but never really got there. Luckily, we have many submitters smarter than me.

Almost all of the solutions this week took different approaches and they are all quite interesting. This doesn't seem to be one of those problems we have a standard approach for. Some solutions did a boring search using X and Y coordinates; others started by building a map of points in the puzzle to letters or even letters to points; one even works by performing board transformations. I'm going to show the basic X and Y search in this summary, but the other method were equally interesting and well worth a look.

Let's talk about the solution from Bob Showalter, which begins like this:

ruby
class WordSearch

class Board < Array

def to_s
collect {|s| s.split(//).join(' ')}.join("\n")
end

end

# ...

Here we see an Array subclass with a trivial modification. Board objects are just two-dimensional Arrays that know how to draw themselves in the quiz output format.

Let's move on to the setup code:

ruby
# ...

attr_reader :board, :solution

# creates a new, empty solver
def initialize
@board = Board.new
@solution = Board.new
end

# resets the solution
def reset
@solution.clear
@board.each {|row| @solution << row.gsub(/./, '+')}
end

# ...

Here we have the initialization for two Board objects. One will hold the actual board, or puzzle object, and the other the solution-in-progress. reset() gives us hints at the solution object structure, which is cleared to a board full of plus characters.

The board is loaded with the following code:

ruby
# ...

# checks that the board contains only letters and that it has a uniform
# rectangular shape
def validate
@board.size > 0 or raise "Board has no rows"
@board.grep(/[^A-Z]/).empty? or raise "Board contains non-letters"
w = @board.collect {|row| row.size}.uniq
w.size == 1 or raise "Board rows are not all the same length"
w.first > 0 or raise "Board has no columns"
end

# parses the board by reading lines from io until a blank line (or EOF)
# is read.
def parse(io = ARGV)
@board.clear
while line = io.gets
line = line.strip.upcase
break if line == ''
@board << line
end
validate
reset
end

# ...

validate() is just a reality check for the board. It verifies that we have some rows, those rows contain letters, there are columns in each row, and in fact the same number of columns. That check is run towards the end of parse(), which just reads line by line filling the board object. Note that reset() is also called to prepare the solution object.

The real work comes in the methods that search for words:

ruby
# ...

# search for word. returns number of times found. solution is updated
# with all occurences.
def search(word)
found = 0
0.upto(board.size-1) do |y|
0.upto(board[y].size-1) do |x|
[-1, 0, 1].each do |dy|
[-1, 0, 1].each do |dx|
next if dx == 0 and dy == 0
found += 1 if search_for(word.strip.upcase, x, y, dx, dy)
end
end
end
end
found
end

# search for word in board starting at position (x,y) and moving in
# direction (dx,dy). returns true if found, false if not found.
def search_for(word, x, y, dx, dy)
return false if x < 0 or
x >= board.first.size or
y < 0 or
y >= board.size
return false if board[y][x] != word[0]
prev = solution[y][x]
solution[y][x] = board[y][x]
return true if word.length <= 1
found = search_for(word[1,word.length-1], x + dx, y + dy, dx, dy)
solution[y][x] = prev unless found
found
end

# ...

The search() method manages the hunt for a single term. It's a trivial brute-force find from every starting square in all eight directions. The method maintains and returns a count, for all occurrences found during the search. The actual letter match is handed off to search_for() though.

In search_for() a recursive search is used to find letter by letter. The tricky part here is that the solution object is modified as we search, assuming we will find the word. This means that an extra walk isn't needed to update the solution later, but it also means the code must revert the changes to the solution object if the word isn't found. Those are the gymnastics you see in the second half of the method.

The next two methods wrap an interface around these calls:

ruby
# ...

# creates a new puzzle by parsing the board from io. see
# WordSearch#parse
def self.parse(io = ARGF)
obj = new
obj.parse(io)
obj
end

def to_s
solution.to_s
end

end

# ...

The class method parse() constructs an object and parses the board from the passed IO object. The instance method to_s() can then be used to show final results, after one or more calls to search().

The solution ends with code to kick-off the process:

ruby
# ...

# parse the board first
p = WordSearch.parse

# parse the words until a blank line is read
words = []
while line = ARGF.gets
line = line.strip.upcase
break if line == ''
words += line.gsub(',', ' ').split
end

# submit each word and show how many times it was found
for word in words.sort.uniq
n = p.search(word)
puts word + ' was ' + (n == 0 ? 'not found' :
n == 1 ? 'found once' :
"found #{n} times")
end

# show the solution
puts p

Here we see the WordSearch object constructed and the board parsed out of the input. The remainder of the input is divided into words and search() is called for each one. A found count is printed for each word as the search is made, then the final solution is displayed.

+ + + M + + + + + +
+ + + + Y + + + + +
T H A N K S + + + +
O + + + + + + + + +
+ L L A + T H E + +
+ + + + C + + + + +
+ + S O L V E R S +
+ + + + e + + + + +
+ + + + V + + + + +
+ + + + E + + + + +
+ + + + R + + + + +

Tomorrow we have a new word game, so stay tuned...