Penguin

Infomer is Isomer's IRC bot. Infomer's claim to fame is being a better infobot than infobot. It does this by exploiting more knowledge about the english language and using it to learn. Infomer is written in Python.

Infomer is currently in pieces as I try to rewrite it to be more modular and more easily extendable.

A lot of controversy arose from Infomer on #wlug, due to it being used for abuse, specifically:

  1. The fact that when the bot does say something, it can go on for pages and pages at a time
  2. The fact that the bot often interjects with random crud from time to time
  3. People teaching it silly facts and then getting delight as it recited them.

In answer:

  1. This is mostly for debugging and testing, it's very hard to see what a bot is learning if it will only repeat other facts. This has been removed from the new version of the bot and new interfaces are being examined.
  2. The bot interjecting random crud I'm hoping to fix by making the bot smarter with what it replies and smarter at figuring out when it should say something. For instance when someone asks a question wait a few seconds, if noone else has spoken, only then reply.
  3. Well, this obviously shows that people are enjoying playing with the bot, not much you can code around this with.

As an experiment I'm placing some of infomer's code in the wiki for people to update. This code isn't executed directly, I have to cut and paste it into infomer every so often, but I'm curious if/how people will update it.

Parse_line

This function takes a line and splits it up into sentances, doing some munging along the way. It also should figure out if this is a directed message, and if so who it is directed to. For example:

 <Isomer> Infomer: you suck

should figure out the sentance "you suck" is directed at Infomer. It doesn't have to figure out what "you" is, parse_Sentance does that. It should however figure out split up sentances and convert them into multiple sentances. eg:

 <Isomer> Isomer is cool, and is funky

should be split up into "Isomer is cool" "Isomer is funky" (with who being "Isomer" and the target being None). Punctuation in general should be stripped.

 def parse_line(who,text):
    tm = x = re.match(r"(?P<string>\w+)\: ([\w\' ]+)", text)

    ntext = ""
    first = 1

    if not tm:
        target = None
        line = text

    else:
        target = tm.group(1)
        line = tm.group(2)

    # Now with the target out of the way, begin stripping prefixes and conjunctions etc.

    delims = ["?", ".", "and", ","]

    for d in delims:
        line = line.replace(d, ".")

    sentences = line.split(".")

    for s in sentences:

        words = s.split(" ")

        for p in prefixes:
            if p in words:
                words.remove(p)

        if first == 1:
            first = 0

        ntext = " ".join(words)

        parse_sentence(who, target, ntext)

Parse_sentence

This function's job is to take a sentance and clean it up, replacing "you" with "Isomer is", removing little words, and rearranging any sentances to make more sense to the bot. For example this function should be able to take:

 (who=Isomer,target=Infomer) "You are a very stupid bot"

and turn it into

 (who=Isomer,target=Infomer) "Infomer is very stupid"
 (who=Isomer,target=Infomer) "Infomer is a bot"

 def parse_sentence(speaker, target, sentence):

        # target is who a sentence was aimed at.

 # replacements is a list of tuples. Each tuple is (matchlist), (replacement)
 # It's ok to have a replacement that is further expanded by another rule.
 # use lower-case since we map user text to lower-case for the comparison :)

        replacements = [
                        # abbreviations - case sensitivity?
                        (["you're"],            ["you", "are"]),
                        (["I'm"],               ["I", "am"]),
                        (["It's"],              ["It", "is"]),
                        (["it's"],              ["it", "is"]),
                        (["I", "am"],           [speaker, "is"]),
                        (["my"],                [speaker + "'s"]),
                        ]

        if target != None:
            replacements.extend(
                        [
                        (["you", "are"],        [target, "is"]),
                        (["are", "you"],        ["is", target]),
                        (["your"],              [target + "'s"]),
 ### bad idea?          (["you"],               [target]), # catch-all
                        ]
                                )
        unparsed_tokens = sentence.split()
        parsed_tokens = []

        while len(unparsed_tokens) > 0: # assume len() is evaluated each time
            for pair in replacements:
                made_expansion = 0

                term_len = len(pair[0])
                if len(unparsed_tokens) >= term_len and \
                   map(str.lower,unparsed_tokens[:term_len]) == pair[0]:
                    # replace match with replacement
                    unparsed_tokens = pair[1] + unparsed_tokens[term_len:]
                    made_expansion = 1
                    break

            if made_expansion == 0:
                # we couldn't make any expansions at this point...
                parsed_tokens.append( unparsed_tokens.pop(0) )


        parse_phrase(speaker, parsed_tokens)

parse_phrase

parse_phrase takes a sentance and calls the database primatives on it.

 def parse_phrase(who,text):
      for i in questions:
          if i==text[0].lower():
              obj=Object(text[2:])
              return get_fact(obj,text[1])

      first=len(text)
      first_verb=None
      for i in verbs:
          if i in text and text.index(i)<first:
              first=text.index(i)
              first_verb=i

      # Not a recognised statement
      if first_verb==None:
          return ""

      # split into two halves and a verb, eg:
      #  Perry's hair is very cool -> (Perry's Hair,is,Very Cool)
      lhs = text[:first]
      verb = text[first]
      rhs = text[first+1:]
      if " ".join(lhs).lower() in fake_lhs:
          return ""

      obj=Object(lhs)
      add_fact(obj,verb,parse_definition(verb,rhs))

      return ""

Misc functions

This function removes a prefix from a sentence

 def remove_prefix(text):
      prefixes = [
          ["and"], ["also"], ["ah"], [ahh"], ["anyway"], ["apparently"],
          ["although"], ["but"], ["bah"], ["besides"], ["no"], ["yes"],
          ["yeah"] ]
      flag=1
      while flag==1:
          flag=0
          for i in prefixes:
              if map(str.lower,text[:!len(i)])==i:
                  text=text[len(i)+1:]
                  flag=1
      return text

An object to hold information about an uh, object.

 # An class to hold an object's information
 class Object:
     def __init__(self,tokens):
         self.tokens=[]
         token=""
         # Join up words into tokens
         # eg:  Isomer's Left Foot's sole -> (Isomer,Left Foot,sole)
         for i in tokens:
             token=token+" "+i
             if len(token)>2 and token[-2:]=="'s":
                 token=token[:-2]
                 self.tokens.append(token.strip())
                 token=""
         # This intentionally adds empty token when it is ""
         self.tokens.append(token.strip())

     def __repr__(self):
         return `self.tokens`

Figures out if this is a special type of information

 def parse_definition(verb,text):
      if text[0].lower() in ["a","an","the"]:
          return (ISA,text[1:])
      if " ".join(text[:2]).lower()=="known as":
          return (AKA,text[2:])
      if " ".join(text[:3]).lower()=="also known as":
          return (AKA,text[3:])
      return (NORMAL,text)

TODO

Parse:

 Infomer is a very stupid bot

as

 Infomer is very stupid
 Infomer is a bot

requires knowledge of adverbs 'n stuff. adverbs can mostly be detected by checking for "ly" on the end of words.

Add:

 tell nick that message

when nick next says something say who wanted to me to tell you that message