Monday, January 9th, 2012...17:13

AI Class

Jump to Comments

Towards the end of last year I signed up for the “basic” track of the Stanford Engineering Introduction To Artificial Intelligence class, mainly out of curiosity, but also because the instructors (Sebastian Thrun and Peter Norvig) and I share the same employer. I intended to follow along at a leisurely pace, unsure if I would be able to commit the time necessary to watch all the lectures and go through the homework problems. However, during the first week I found the homework assignments unavailable to “basic” track participants until after they were due (this was later changed) so I switched to the “advanced” track and made a concerned effort to adhere to the class schedule.

It’s similar in style to the Khan Academy, and it definitely took my understanding of Bayes’ Rule, Hidden Markov Models, etc. to another level. Sure there are tons of books and tutorials out there, but I feel there’s a limit to what one can learn without practice. One form of practice is to create a toy programming project, the main downside of which is never knowing if you’re doing it “right”. Contests ala Netflix Prize are another which at least provides better/worse feedback. The main benefit of online courses though is one can practice solving problems and know whether or not an answer is correct.

I didn’t participate in any of the discussions, though I did find the group website useful for clarification on a few of the problem statements. Even so, there were still a couple of questions throughout the homework and exams which I misinterpreted, leading to an incorrect answer; however, the largest cause of incorrect answers seemed to be my inability to punch numbers into a calculator correctly…I guess I haven’t gotten any better at that since I was last in school :S

An unexpected realization was how much I value flexibility in scheduling. There were definitely a few weeks I had to scramble in order to make it through the videos and homework before the deadline. Funny how the addition of one more deadline per week caused me so much grief. I was interested in the material, but some of it took quite a bit of time to get through (especially once I slowed down after rushing through a couple of homework assignments and doing poorly).

While one could certainly write programs to solve some of the quiz/homework/exam problems, all of them were solvable with pen and paper. The lack of problems requiring writing code was one of the top complaints. They held an AI competition; I didn’t participate as I feared that would take up too much of my time. However, towards the end of the class, the instructors provided an optional quiz which required writing some code: Optional Natural Language Processing Problems. This seemed like a much less time consuming set of questions and I would also get instant feedback.

The first problem was easily solved with brute force. My basic idea was to try every possible rotation and see which one yielded the most words which appear in an English dictionary.

#!/usr/bin/python

import sys
import string

# Problem text, all lowercase, no punctuation.
SENTENCE = 'esp qtcde nzyqpcpynp zy esp ezatn zq lcetqtntlw tyepwwtrpynp hld spwo le olcexzfes nzwwprp ty estd jplc'

# Parse the system's English dictionary into a set.
def WordsToSet():
  word_set = set()
  for w in open('/usr/share/dict/words'):
    word_set.add(w.strip().lower())
  return word_set

def p1(word_set):
  # Setup character position lookup table.
  chars = string.ascii_lowercase
  num_chars = len(chars)
  letter_positions = dict([(c, i) for i, c in enumerate(chars)])

  # Track the rotation that yields the most real words.
  max_word_count = 0 
  best_i = 1 
  sentences = {}

  # Try each rotation.
  for i in xrange(1, 25):
    temp_word_count = 0 
    decoded = []

    # Build up a decoded sentence.
    for char in SENTENCE:
      if char == ' ':
        decoded.append(char)
      else:
        decoded.append(chars[(letter_positions[char.lower()] - i) % num_chars])
    sentences[i] = ''.join(decoded)

    # See how many words in the sentence exist in the word set.
    for word in sentences[i].split(' '):
      if word in word_set:
        temp_word_count += 1
    if temp_word_count > max_word_count:
      best_i = i 
      max_word_count = temp_word_count

  # Output our best guess.
  print best_i, sentences[best_i]

def main(argv):
  word_set = WordsToSet()
  p1(word_set)

if __name__ == '__main__':
  main(sys.argv)

The second problem proved to be a bit more difficult. It couldn’t be brute forced since there were 18 “strips” of characters and thus 18! possible combinations (6,402,373,705,728,000 so if one could generate 1 every millisecond, it’d take 200,000+ years and that doesn’t even factor in the evaluation of whether the combination is gibberish). To make life easier, I first got rid of all the punctuation, lowercased all the characters, and converted the “strips” of characters into a Python list of lists. My idea was to pick the first strip, try the remaining 17 to see which one “fit” best before or after strip 0, then try the remaining 16 on the joined strips, etc. For best “fit” I would try to maximize the number of legitimate words which could be composed by the left or right most series of characters (whitespace delimited).

Initially it seemed like counting a partial word match with fewer than 3 characters wouldn’t be useful e.g. “er” appears in tens of thousands of words. That wasn’t good enough to keep my program on track though, as it would get a few strips aligned correcting and then veer off track creating sequences of characters that were found in many words, but ended up being gibberish in this example. So I had to augment the solution to detect when a sequence of characters should appear at the beginning of a word (whitespace before the sequence, but not after) or when a sequence of characters should be matched exactly (whitespace before and after the sequence). In this case, I didn’t have to detect whether a sequence of characters should appear at the end of a word (whitespace after the sequence, but not before), but it would be trivial to add.

Another necessary feature of my model was the ability to short-circuit if more than 1 row of characters would result in zero possible words. At first I had the threshold set at more than 0, but “shannon”, a last name as it turns out, is not in the dictionary. The code below takes about 90 seconds to solve the problem and could easily be optimized to reduce the number of loops or lookups in the dictionary, but I think those changes obscure the heart of the algorithm. Below is the code as well as a copy of the output generated by the program as it solved the problem.

#!/usr/bin/python

import sys
import string

STRIPS = [
  ['de',' ',' f','cl','nf','ed','au',' i','ti',' ','ma','ha','or','nn','ou',' s','on','nd','on'],
  ['ry',' ','is','th','is',' b','eo','as',' ',' ','f ','wh',' o','ic',' t',' ',' ','he','h '],
  ['ab',' ','la','pr','od','ge','ob',' m','an',' ','s ','is','el','ti','ng','il','d ','ua','c '],
  ['he',' ','ea','of','ho',' m',' t','et','ha',' ',' t','od','ds','e ','ki',' c','t ','ng','br'],
  ['wo','m ','to','yo','hi','ve','u ',' t','ob',' ','pr','d ','s ','us',' s','ul','le','ol','e '],
  [' t','ca',' t','wi',' m','d ','th',' a','ma','l ','he',' p','at','ap','it','he','ti','le','er'],
  ['ry','d ','un','th',' ','io','eo','n ','is',' ','bl','f ','pu','co','ic',' o','he','at','mm'],
  ['hi',' ',' ','in',' ',' ',' t',' ',' ',' ',' ','ye',' ','ar',' ','s ',' ',' ',' ']
]

def WordsToSet():
  word_set = set()
  for w in open('/usr/share/dict/words'):
    word_set.add(w.strip().lower())
  return word_set

# Determine whether word is, starts with or contins prospective_word.
def WordMatch(word, prospective_word, exact, startswith):
  if exact and prospective_word == word:
    return 1
  elif startswith and word.startswith(prospective_word):
    return 1
  elif not exact and not startswith and prospective_word in word:
    return 1
  return 0

def ProspectiveWord(i, j, row, prefix):
  # Construct a new result based on the current result row, the ith row and,
  # jth column of STRIPS and whether the new set of characters should be
  # positioned before or after the current result row.
  if prefix:
    prospective_result = STRIPS[i][j] + row
  else:
    prospective_result = row + STRIPS[i][j]

  words = prospective_result.split(' ')
  prospective_word = ''
  exact = False
  startswith = False
  if prefix:
    if words[0] == '':  # There was a leading space.
      prospective_word = words[1]
      if len(words) > 2:  # And a space after prospective_word.
        exact = True
      else:
        startswith = True
    else:
      prospective_word = words[0]
  else:
    if words[-1] == '':  # There was a trailing space.
      prospective_word = words[-2]
      if len(words) > 2:  # And a space before prospective_word.
        exact = True
    else:
      prospective_word = words[-1]
      if len(words) > 1:
        startswith = True
  return prospective_word, exact, startswith

def p2(word_set):
  # Set the first column of STRIPS as the current result.
  result = [row[0] for row in STRIPS]
  # The order of j's, not used to solve the problem, but useful to print at the
  # end. i.e. the order of the STRIPS.
  strip_order = [0]
  # Keep track of the unused j's i.e. columns of STRIPS
  remaining_strips = set([j for j in xrange(1, len(STRIPS[0]))])

  while len(remaining_strips) > 0:
    # Make one pass placing column j of STRIPS before the current result and
    # then another pass placing it after.
    for prefix in [True, False]:
      # Keep track of the best j and highest possible probablistic word count.
      best_j = 0
      best_word_count = 0
      for j in remaining_strips:  # Columns
        word_count = 0
        # Keep track of the number of rows without probable words. Rows which are
        # not checked i.e. without a prospective word of length > 2 or where
        # STRIPS[i][j] is a space (doesn't add any new information) do not count.
        rows_without_words = 0
        for i, row in enumerate(result):  # Rows
          # More than one row without probable words means this column is likely
          # in the wrong position.
          if rows_without_words > 1:
            break
          if STRIPS[i][j] == ' ':  # No new information, continue to next row.
            continue
          prospective_word, exact, startswith = ProspectiveWord(i, j, row, prefix)
          # Too short a word/word fragment, continue to next row.
          if len(prospective_word) < 3:
            continue

          # Total up all the prospective word matches and update the state.
          temp_word_count = 0
          for word in word_set:
            temp_word_count += WordMatch(word, prospective_word, exact, startswith)

          if temp_word_count == 0:
            rows_without_words += 1
          else:
            word_count += temp_word_count

        # More than one row without probable words means this column is likely
        # in the wrong position.
        if rows_without_words > 1:
          continue

        if word_count > best_word_count:
          best_word_count = word_count
          best_j = j

      if best_j != 0:
        if prefix:
          strip_order.insert(0, best_j)
        else:
          strip_order.append(best_j)
        for i in xrange(len(result)):
          if prefix:
            result[i] = STRIPS[i][best_j] + result[i]
          else:
            result[i] = result[i] + STRIPS[i][best_j]
        remaining_strips.remove(best_j)
        print 'strip_order: ', strip_order
        for row in result:
          print row
          print ' '

def main(argv):
  word_set = WordsToSet()
  p2(word_set)

if __name__ == '__main__':
  main(sys.argv)

Here’s what the output looks like so you can see it building up the solution:

strip_order: [6, 0]
aude
eory
obab
the
u wo
th t
eory
thi

strip_order: [6, 0, 15]
aude s
eory
obabil
the c
u woul
th the
eory o
this

strip_order: [3, 6, 0, 15]
claude s
theory
probabil
of the c
you woul
with the
theory o
in this

strip_order: [3, 6, 0, 15, 11]
claude sha
theory wh
probabilis
of the cod
you would
with the p
theory of
in this ye

strip_order: [3, 6, 0, 15, 11, 13]
claude shann
theory whic
probabilisti
of the code
you would us
with the pap
theory of co
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18]
claude shannon
theory which
probabilistic
of the code br
you would use
with the paper
theory of comm
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2]
claude shannon f
theory which is
probabilistic la
of the code brea
you would use to
with the paper t
theory of commun
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14]
claude shannon fou
theory which is t
probabilistic lang
of the code breaki
you would use to s
with the paper tit
theory of communic
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17]
claude shannon found
theory which is the
probabilistic langua
of the code breaking
you would use to sol
with the paper title
theory of communicat
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5]
claude shannon founded
theory which is the b
probabilistic language
of the code breaking m
you would use to solve
with the paper titled
theory of communicatio
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7]
claude shannon founded i
theory which is the bas
probabilistic language m
of the code breaking met
you would use to solve t
with the paper titled a
theory of communication
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4]
claude shannon founded inf
theory which is the basis
probabilistic language mod
of the code breaking metho
you would use to solve thi
with the paper titled a m
theory of communication
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4, 12]
claude shannon founded infor
theory which is the basis o
probabilistic language model
of the code breaking methods
you would use to solve this
with the paper titled a mat
theory of communication pu
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4, 12, 10]
claude shannon founded informa
theory which is the basis of
probabilistic language models
of the code breaking methods t
you would use to solve this pr
with the paper titled a mathe
theory of communication publ
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4, 12, 10, 8]
claude shannon founded informati
theory which is the basis of
probabilistic language models an
of the code breaking methods tha
you would use to solve this prob
with the paper titled a mathema
theory of communication publis
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4, 12, 10, 8, 16]

theory which is the basis of
probabilistic language models and
of the code breaking methods that
you would use to solve this proble
with the paper titled a mathemati
theory of communication publishe
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4, 12, 10, 8, 16, 1]
claude shannon founded information
theory which is the basis of
probabilistic language models and
of the code breaking methods that
you would use to solve this problem
with the paper titled a mathematica
theory of communication published
in this year

strip_order: [3, 6, 0, 15, 11, 13, 18, 2, 14, 17, 5, 7, 4, 12, 10, 8, 16, 1, 9]
claude shannon founded information
theory which is the basis of
probabilistic language models and
of the code breaking methods that
you would use to solve this problem
with the paper titled a mathematical
theory of communication published
in this year

Comments are closed.