Goedel (#147)

by Hugh Sasse

In the book "Starburst" by Frederik Pohl ISBN 0-345-27537-3, page 56, without really spoiling the plot, some characters complain about the verbosity of communications and encode a message by Gödelizing it (detailed on page 58).

The encoding works by taking each successive character of a message and raising each successive prime to some function of that character, and multiplying these powers of primes together. So for example we could use the ASCII code + 1 to allow for nulls to be encoded. Then "Ruby\n" would end up as:

(2 ** R) * (3 ** u) * (5 ** b)....

10992805522291106558517740012022207329045811217010725353610920778
28664749233402453985379760678149866991742205982820039955872246774
86029159248495553882158351479922840433375701904296875000000000000
00000000000000000000000000000000000000000000000000000000000000000
000000

The idea is partly to obscure the message by the amount of factorization needed. This quiz is to write a program to Gödelize a message, and a program to deGödelize it.

The funtion used to map characters described in the book is "A" => 1, "B" => 2, etc and an example is given where spaces are 0. Nothing further is said about punctuation, or lower case. The message sent in the book is:

msg = (3.875 * (12 ** 26)) +
(1973 ** 854) + (331 ** 852) +
(17 ** 2008) + (3 ** 9707) + (2 ** 88) - 78

which it turns out has lots of 0 powers in it, so I strictly don't need the ASCII + 1 I've used in my example, I could use just ASCII, and the nulls would not increase the size of the resulting number. This further means that if a list of characters is sent in decreasing frequency order with the message, the most frequent could be encoded as 0 and the number would be that much smaller. In English it is likely to be an "e" or " " which ends up coded as 0.

Interesting things arising from this:

1 Finding the power once a prime is selected
2 Getting the list of primes in the first place
3 encoding of characters, as mentioned above
4 representing the number that results from encoding.


Quiz Summary

This quiz is really just an optimization problem. It's pretty trivial to do a conversion to and from a Goedel number. See Eric Lavigne's under-30-lines solution for a great example of this. The challenge arises when the message gets big enough that finding all of the factors takes significant time.

Eric I. figured out how to cut quite a few corners on the decoding process, so I want to take a look at his code.

Before we get into the actual encoding and decoding process though, let's talk a bit about primes. Obviously, we need a source of prime numbers to do our work. Ruby does ship with a standard mathn library that includes a Prime class. Many solutions did put that library to good use. There are two downsides to that approach though: mathn is a pure Ruby library and the Prime class implementation in Ruby 1.8 is not very clever. Both of these slow us down.

To get around that, Eric built a drop-in replacement for the Prime class that cheats. It simply reads the numbers from a huge list you can download, skipping any calculation effort. This turns out to be faster for our needs. Here's the code:

ruby
# Generates a stream of prime numbers as they're read from a sequence
# of files with names such as "primes1.txt", "primes2.txt", and so
# forth. Such files can be downloaded from:
# http://primes.utm.edu/lists/small/millions/

class Prime
def initialize
@current_file = 0
@io = open_next_file
@current_primes = []
@current_index = 0
end

def next
load_next_primes until value = @current_primes[@current_index]
@current_index += 1
value
end

private

def load_next_primes
while true
while line = @io.gets
if line =~ /^\s*\d+(\s+\d+)*\s*$/
@current_primes = line.split.map { |e| e.to_i }
@current_index = 0
return
end
end
@io.close
open_next_file
end
end

def open_next_file
@current_file += 1
filename = "primes%d.txt" % @current_file
begin
@io = open(filename)
rescue
raise "ran out of primes because couldn't open file \"%s\"" %
filename
end
end
end

As you can see, this is simple stuff. The class just opens a file called primes1.txt when initialize()d (see open_next_file()). As needed, lines are read from this file, split() and converted into Integers, and tucked away inside an Array (see load_next_primes()). Primes are then just handed out from this Array (see next()) and when the supply is exhausted new lines are read. When we run out of lines, the code will move on to a primes2.txt file.

The site linked to in the comment has the first 15 million primes available in files like this. That more than covers the needs of this code, so this turns out to be a simple but effective cheat to save time.

With a zippy Prime class defined, we are ready to get down to the real work:

ruby
require 'primes' # or the standard mathn library

# Put the coder in a separate class, so we have the potential to use
# other coders, such as the one from the Starburst novel.
class RubyQuizCoder
def encode(char)
char[0] + 1
end

def decode(number)
(number - 1).chr
end

def max_code
127
end
end

# ...

Here we see the require for the code that we just examined. Note that this code will work fine with a mathn require as well though.

The class defined here is the simple encoding described in the quiz. As the comment indicates, pulling this code into the class makes it easy to swap out with other encoding schemes.

The work horse methods for the solution are encode() and decode(), of course. Here's the easy one:

ruby
# ...

def encode(input, primes, coder)
goedel_value = 1

input.each_line do |line|
0.upto(line.size - 1) do |i|
char = line[i, 1]
encoding = coder.encode char
next if encoding.nil? # skip characters without encoding
goedel_value *= primes.next ** encoding
end
end

puts goedel_value
end

# ...

The code works its way line by line and character by character through the input. Each character is encoded using the RubyQuizCoder class we saw earlier, used as an exponent for a prime based on its position, and finally multiplied into the overall Goedel value. When all of the characters have been dealt with, the overall value is printed as a result.

The reverse operation is harder to digest, because it's where the optimizations are hiding:

ruby
# ...

# Attempt to decode quickly by trying to perfectly divide by
# prime**(2**6), prime**(2**5), prime**(2**4), ..., prime**(2**0) and
# then adding the powers of 2 for which the division worked without a
# remainder. For example, if a number were divisible by prime**101,
# then it's also divisible by prime**64 * prime**32 * prime**4 *
# prime**1 since 64 + 32 + 4 + 1 = 101. So, we'll have to divide the
# large number exactly 7 times per prime no matter what the exponent.
# Note: 7 assumes that the encoding results in no value greater than
# 127.
def decode(input, primes, coder)
goedel_value = input.gets.to_i
max_two_expnt = (Math.log(coder.max_code) / Math.log(2)).to_i
factors = (0..max_two_expnt).map { |i| [2**i, nil] }

while goedel_value > 1
current_prime = primes.next
encoded = 0

factors[0][1] = current_prime
(1..max_two_expnt).each do |i|
factors[i][1] = factors[i - 1][1] ** 2
end

factors.reverse_each do |expnt, factor|
quotient, remainder = goedel_value.divmod(factor)
if remainder == 0
encoded += expnt
goedel_value = quotient
end
end

char = coder.decode(encoded)
putc char unless char.nil?
end
end

# ...

The biggest trick in here is the use of factorization to narrow down the divisions needed. Until the overall value hits one, each prime is pulled in turn and factored into the possible divisors. Each of those numbers is then tried in reverse order. Those that divide evenly are added to the encoded character count and drop the overall count accordingly. After all of the factors have been tried, the character count is passed through our RubyQuizCoder object and the resulting character is printed.

This strategy results in a constant number of divisions for each character and in most cases, those divisions should be significantly less than the brute force approach.

The rest of the code just provides an interface to these routines:

ruby
# ...

def usage
STDERR.puts "Usage: %s -e[ncode]|-d[ecode] [file]" % $0
exit 1
end

# process command-line args and figure out which method to call

task = nil
input = nil
ARGV.each do |arg|
case arg
when /^-+e/ : task = :encode
when /^-+d/ : task = :decode
else if input : usage
else input = open(arg)
end
end
end

input = STDIN if input.nil?
primes = Prime.new
coder = RubyQuizCoder.new

case task
when :encode : encode(input, primes, coder)
when :decode : decode(input, primes, coder)
else usage
end

This is just some basic argument parsing code. It hunts for a -e or -d switch to figure out if we are encoding or decoding. It also opens a file of input for the first non-switch argument or defaults to STDIN. Failing to select a mode or providing multiple input parameters triggers the usage message and an exit() call. Otherwise, the selected routine is called with the input, a prime generator, and the coder.

356592611993533159357704171943707065506245018107843654420869995255779400
940327637098004636805658940369248254005741705095861927174094085632357462
907907043655503052325570584135928519171701028451925211424184061134912519
346213124822184698766352106671144335797309339814960788404357329829880747
370816749488671200451708065007982023372864606432786154466561150882383337
665739304198857199016703678239557259404453555953430721034149637848167371
696741427087919115033228735590823604859393603457456530272005167282069777
767057670186595718538513692305202904620672162211361327059254731120977063
134449493970743739621694338383775782689258152219852727482221819916188854
728922544056620806588370831318213885828889837200379775053853865140225412
662845287202726609004691108889366849057859320331241672049374448503639685
391414041859040062271216363732269502663137649053220830210776147948360695
030849925289805667675539825659032822205396840604174306471211962857755270
446984071785349996985198175348758225276208258987173267939423506126205306
366658339817220773597781191289736402029252868902666231417359731588146165
666320778680777303480755853750057521054497117887073508511901428397946093
449376833443738252027373639560059173037124844900113551258953525303486104
043722150604562552368672756555770801627883622940792114968157184742775843
032899157541783062241525730231786371568328435169052449375068965888786768
418148739617709373626622156165087821359521473101982925802046545744126160
160591686651704019368238740807898239650093187094408149981270808624486401
576666800438277327766454315294642692798530476745070050857038213692995319
240862517113963583832414299967798488646237585020392202957001143544708578
006399968115609032521439107622793677586436726234995703756213467526362754
053665223526567818472971465689078159087061330179433352620768373562074252
272173230499663950656487104976363578669988246793426315517241324669161225
95456614469899787029059315850805248000000000000000000000000000000000

Tomorrow we will unreverse some backwards math...