Home
Main website
Display Sidebar
Hide Ads
Recent Changes
View Source:
Infomer
Edit
PageHistory
Diff
Info
LikePages
Infomer is [Isomer|PerryLorier]'s [IRC] bot. Infomer's claim to fame is being a better infobot than infobot. It does this by exploiting more knowledge about the english language and using it to learn. Infomer is written in [Python]. Infomer is currently in pieces as I try to rewrite it to be more modular and more easily extendable. A lot of controversy arose from Infomer on #wlug, due to it being used for abuse, specifically: # The fact that when the bot does say something, it can go on for pages and pages at a time # The fact that the bot often interjects with random crud from time to time # People teaching it silly facts and then getting delight as it recited them. In answer: # This is mostly for debugging and testing, it's very hard to see what a bot is learning if it will only repeat other facts. This has been removed from the new version of the bot and new interfaces are being examined. # The bot interjecting random crud I'm hoping to fix by making the bot smarter with what it replies and smarter at figuring out when it should say something. For instance when someone asks a question wait a few seconds, if noone else has spoken, only then reply. # Well, this obviously shows that people are enjoying playing with the bot, not much you can code around this with. As an experiment I'm placing some of infomer's code in the wiki for people to update. This code isn't executed directly, I have to cut and paste it into infomer every so often, but I'm curious if/how people will update it. !!Parse_line This function takes a line and splits it up into sentances, doing some munging along the way. It also should figure out if this is a directed message, and if so who it is directed to. For example: <pre> <Isomer> Infomer: you suck </pre> should figure out the sentance "you suck" is directed at Infomer. It doesn't have to figure out what "you" is, parse_Sentance does that. It should however figure out split up sentances and convert them into multiple sentances. eg: <pre> <Isomer> Isomer is cool, and is funky </pre> should be split up into "Isomer is cool" "Isomer is funky" (with who being "Isomer" and the target being None). Punctuation in general should be stripped. <verbatim> def parse_line(who,text): tm = x = re.match(r"(?P<string>\w+)\: ([\w\' ]+)", text) ntext = "" first = 1 if not tm: target = None line = text else: target = tm.group(1) line = tm.group(2) # Now with the target out of the way, begin stripping prefixes and conjunctions etc. delims = ["?", ".", "and", ","] for d in delims: line = line.replace(d, ".") sentences = line.split(".") for s in sentences: words = s.split(" ") for p in prefixes: if p in words: words.remove(p) if first == 1: first = 0 ntext = " ".join(words) parse_sentence(who, target, ntext) </verbatim> !!Parse_sentence This function's job is to take a sentance and clean it up, replacing "you" with "Isomer is", removing little words, and rearranging any sentances to make more sense to the bot. For example this function should be able to take: <pre> (who=Isomer,target=Infomer) "You are a very stupid bot" </pre> and turn it into <verbatim> (who=Isomer,target=Infomer) "Infomer is very stupid" (who=Isomer,target=Infomer) "Infomer is a bot" def parse_sentence(speaker, target, sentence): # target is who a sentence was aimed at. # replacements is a list of tuples. Each tuple is (matchlist), (replacement) # It's ok to have a replacement that is further expanded by another rule. # use lower-case since we map user text to lower-case for the comparison :) replacements = [ # abbreviations - case sensitivity? (["you're"], ["you", "are"]), (["I'm"], ["I", "am"]), (["It's"], ["It", "is"]), (["it's"], ["it", "is"]), (["I", "am"], [speaker, "is"]), (["my"], [speaker + "'s"]), ] if target != None: replacements.extend( [ (["you", "are"], [target, "is"]), (["are", "you"], ["is", target]), (["your"], [target + "'s"]), ### bad idea? (["you"], [target]), # catch-all ] ) unparsed_tokens = sentence.split() parsed_tokens = [] while len(unparsed_tokens) > 0: # assume len() is evaluated each time for pair in replacements: made_expansion = 0 term_len = len(pair[0]) if len(unparsed_tokens) >= term_len and \ map(str.lower,unparsed_tokens[:term_len]) == pair[0]: # replace match with replacement unparsed_tokens = pair[1] + unparsed_tokens[term_len:] made_expansion = 1 break if made_expansion == 0: # we couldn't make any expansions at this point... parsed_tokens.append( unparsed_tokens.pop(0) ) parse_phrase(speaker, parsed_tokens) </verbatim> !!parse_phrase parse_phrase takes a sentance and calls the database primatives on it. <verbatim> def parse_phrase(who,text): for i in questions: if i==text[0].lower(): obj=Object(text[2:]) return get_fact(obj,text[1]) first=len(text) first_verb=None for i in verbs: if i in text and text.index(i)<first: first=text.index(i) first_verb=i # Not a recognised statement if first_verb==None: return "" # split into two halves and a verb, eg: # Perry's hair is very cool -> (Perry's Hair,is,Very Cool) lhs = text[:first] verb = text[first] rhs = text[first+1:] if " ".join(lhs).lower() in fake_lhs: return "" obj=Object(lhs) add_fact(obj,verb,parse_definition(verb,rhs)) return "" </verbatim> !!Misc functions This function removes a prefix from a sentence <verbatim> def remove_prefix(text): prefixes = [ ["and"], ["also"], ["ah"], [ahh"], ["anyway"], ["apparently"], ["although"], ["but"], ["bah"], ["besides"], ["no"], ["yes"], ["yeah"] ] flag=1 while flag==1: flag=0 for i in prefixes: if map(str.lower,text[:!len(i)])==i: text=text[len(i)+1:] flag=1 return text </verbatim> An object to hold information about an uh, object. <verbatim> # An class to hold an object's information class Object: def __init__(self,tokens): self.tokens=[] token="" # Join up words into tokens # eg: Isomer's Left Foot's sole -> (Isomer,Left Foot,sole) for i in tokens: token=token+" "+i if len(token)>2 and token[-2:]=="'s": token=token[:-2] self.tokens.append(token.strip()) token="" # This intentionally adds empty token when it is "" self.tokens.append(token.strip()) def __repr__(self): return `self.tokens` </verbatim> Figures out if this is a special type of information <verbatim> def parse_definition(verb,text): if text[0].lower() in ["a","an","the"]: return (ISA,text[1:]) if " ".join(text[:2]).lower()=="known as": return (AKA,text[2:]) if " ".join(text[:3]).lower()=="also known as": return (AKA,text[3:]) return (NORMAL,text) </verbatim> !!TODO Parse: <pre> Infomer is a very stupid bot </pre> as <pre> Infomer is very stupid Infomer is a bot </pre> requires knowledge of adverbs 'n stuff. adverbs can mostly be detected by checking for "ly" on the end of words. Add: <pre> tell ''nick'' that ''message'' </pre> when ''nick'' next says something say ''who'' wanted to me to tell you that ''message''
3 pages link to
Infomer
:
YaMumCall
Slashdot
GianPerrone