Home

Programming a Chat Bot to Understand a Sentence

by Gary J. Shannon

Started Sept. 17, 2008
Last Updated Sept. 17, 2008

World Models

If I say to you, "The post office is three blocks north of here." there's little doubt that you would easily understand what I've just told you. But what, exactly, is going on during this exchange? To begin with, I am standing on Oak Street at the corner of 8th Avenue, and, being familiar with the city, I have a mental picture of my immediate neighborhood. Whether that picture is verbal, visual, or procedural is irrelevant. I might close my eyes and visualize the post office as being three blocks north on the mental map I see, or I might know, procedurally, that by walking three blocks north I will encounter the post office, even without an actual visual model. I can "count" the blocks, procedurally, by noting to myself something like "Well, first I pass the Taco place next to my barbershop, and then the concert hall, and then the post office, so let me see, that's um, one, two, three blocks." In either case, I have constructed, in my brain, a model of the city, and I have a means of examining that model to extract certain information from it.

Now if I also tell you that the library is 4 blocks south of here, and if you understand what I told you, you will be able to tell me that the library is 7 blocks south of the post office. However, while you might be able to perform this feat with a combination of deductive reasoning and simple arithmetic, it would also be possible to come to this conclusion by examining your mental map and simply counting the number of blocks you "see" on you mental map.

From this we could hypothesize that "to understand" a statement is to have incorporated the information in the statement into one's mental model of the world. That, of course, presupposes that the listener has a mental model of the world to begin with. Therefor, in order for a computer program to "understand" any statement given to it, it must first have a mental model of the world. In other words, the program must support, in some fashion or another, a simulation of at least some portion of the real world.

My own background includes programming aerodynamic simulations for pilot training, and before that, simulating worlds in computer gaming applications. In both of those cases, when it was required to know something about the state of the world, or the objects in the world, the answer was found by interrogating the simulation directly. If, in the course of playing the game, you put your sword down on the table in the great hall, the query "Where is my sword" is answered by literally asking the sword to reveal it's "location" attribute. Likewise the question "What is on the table in the great hall?" is answered, not by any chain of deductive reasoning, but by simply asking the table to reveal it's contents. In no case does the program have to "remember" "facts" that it was told. Whater facts were given to the bot are now incorporated into the bot's own mental model of the world.

Using simulation instead of deductive processes, one can perform the indicated actions in the mental model, and observe the results directly. For example, consider the following exchange:

PERSON: I had three apples and I gave one to John.
BOT: (Places three apples into person.inventory)
BOT: (Moves one apple from person.inventory to John.inventory)
BOT: OK.
PERSON: How many apples do I have now?
BOT: (Queries person.inventory and counts number of apples present)
BOT: Two.

Incomplete Information and Virtual Containers

Some objects in the simulation belong to the class of objects called "containers" in that they are able to contain other objects. A person's inventory, a table, or a box are examples of container objects. When the person playing the simulation with computer (hereafter referred to as the "player character" or "PC") makes a direct statement like "I put my keys in my pocket." The simulation can respond by moving the simulated keys into the simulated container object pc.pocket (the player character's pocket). Later if the PC asks "Where are my keys?" the answer is found directly from the state of the simulation.

But what happens in case the PC says something like: "There was a book and an apple on the table, and I took something from the table and gave the other thing to John." In this case the simulation does not know which object to move to which container. To handle situations such as this there is a special kind of container called a virtual container. A virtual container is always linked with at least one other virtual container. The contents of a virtual container are taken to be possible contents rather than actual contents. Thus if a virtual container holds "an apple" and "a book", when the container is queried for its contents it replies "Either an apple OR a book.", indicating the uncertainty that the container has about its contents.

Since every object has to actually be somewhere in the simulation, there must be enough linked virtual containers to hold all the objects. If we discover that a certain object is not in one of the linked virtual containers, then it must, necessarily be in one of the others, and if an object is found to definately be in one of the virtual containers, then it can be removed from all the other linked containers, since it cannot be in two places at once. That may sound a bit confusing, but consider this example:

PC: On the table there was an apple, a book, and a cheese sandwich.
BOT: (Places apple, book, and sandwich into table.inventory.)
BOT: OK.
PC: John took something from the table.
BOT: (Creates John.virtual_box and table.virtual_box and links them togther.)
BOT: (Moves apple, book, and sandwich from table into both virtual containers.)
BOT: (Marks John.virtual_box as holding 1 object. Marks table.virtual_box as holding 2 objects.)
BOT: OK.
PC: What is on the table?
BOT: (table.virtual_box contains any 2 of the objects in its contents.)
BOT: Any two of these: an apple, a book, or a sandwich.
BOT: (table.virtual_box contains, by definition whatever is not contained in John.virtual_box.)
BOT: Whatever John is not holding.
PC: Alice took the book from the table.
BOT: (Removes book from all virtual containers and places it into Alice.inventory.)
BOT: (Marks table.virtual_box as holding 1 object.)
BOT: OK.
PC: What is on the table?
BOT: (queries table.virtual_box which contains "apple" and "sandwich".)
BOT: Either an apple or the sandwich.
PC: John does not have the apple.
BOT: (Remove apple from all virtual containers and place into table.inventory.)
BOT: (Virtual container John.virtual_box now contains only one object.)
BOT: (Move single object, sandwich, from virtual to John.inventory.)
BOT: (Discard empty virtual container and observe that certainty has been acheived by voluteering:)
BOT: Then John must have the sandwich.

This "logic puzzle" was solved without any recourse to deductive reasoning. The kind of puzzles that might typically be given to a chat bot in order to test its understanding could easily be "solved" by this method of simple simulation.

Knowing vs. Guessing

In addition to assigning values to attributes of objects, we would like to know something about how reliable that information is. On the one hand we might say "I think John's car is either blue or green." and on the other hand, "No, I'm sure it's red. I can see it in his driveway." Each piece of information given to the bot, or each piece of information provided by the bot itself, must have a degree of certainty attached to it. Typical values might include {UNKNOWN, ASSERTED, ASSUMED, UNCERTAIN, ... }. Each class might even have certain assumed values for certain attributes which will be assigned as defaults to new instances of that class. Thus when the word "apple" is mentioned, an instance of the class "apple" is created, and, among other things, it is "ASSUMED" that apple.color = red. This is vastly different from having been specifically told that this particular apple is red.

PC: I have an apple.
BOT: (Creates instance of class "apple" and places it in pc.inventory.)
BOT: OK.
PC: What color is my apple?
BOT: (apple.color has an "ASSUMED" value, rather than an "ASSERTED" value.)
BOT: (Bot is uncertain, but can assume default value of "red".)
BOT: I don't know. I guess maybe red.
PC: My apple is green.
BOT: (Assigns "green" to apple.color with certainty = "ASSERTED".)
BOT: OK.
PC" What color is my apple?
BOT: (apple.color has an "ASSERTED" value of "green", therefore bot is certain of answer.)
BOT: It's green.

It is also possible that the value assigned to an attribute is not a single value, but a list of values. For example, John might have two brothers so that the John.brother attribute has the list { Edward, Michael }. Likewise, assumed values might also be lists as in the example:

PC: I have a dog.
BOT: (Creates instance of object of class "dog" and places it in pc.inventory.)
BOT: (Assigns default value to dog.name = { Fido, Rover, Spot } as "ASSUMED".)
BOT: OK.
PC: What is my dog's name?
BOT: (dog.name has assumed list. Express uncertainty, and suggest from list.)
BOT: I don't know. I guess maybe Fido, or Rover. Maybe Spot?
PC: My dog's name is Maximillian.
BOT: (Replaces list value in dog.name with "Maximillian" as ASSERTED.)
BOT: OK.
PC: What is my dog's name?
BOT: (dog.name is ASSERTED)
BOT: It's Maximillian.

Curiosity Killed the Bot (or did Curiosity Fill the Slot?)

A natural consequence of the ability to understand, is the need to resolve bits of information that are not fully understood. In the course of natural conversation, if I say to you, "I have a dog." you would not be likely to respond as the bot above did with "OK." Rather, as a curious human being, you might be prompted to ask for more information, such as "Oh, what kind of dog is it?".

You don't ask this question because you were "programmed" to ask questions, but rather, as the consequence of a genuine desire to know. Having been given the information "I have a dog." you realize that there are significant gaps in your understanding of exactly what it is that I am claiming to have. Motivated by the desire to fill out these missing bits of information, you move the coversation forward by asking reasonable questions. In other words, you have certain attribute slots in your mental model of "dog", and you seek to fill those slots.

It is not merely enough that there are empty slots that need to be filled, but those slots must be prioritized in some manner. When discussing someone's pet dog, the question "What kind of dog is it?" is likely to come up long before the question "What is the registration number on its dog license tag?" Likewise, a curious bot, one which seeks to fill empty slots, must be given relative priorities for each of those empty slots.

The bot should also volunteer information whenever it indirectly fills a previously unknown slot. For example, one attribute of dog might be whether you like that particular dog or not. This slot might be filled internally once you know what kind of dog it is, as then that information could be volunteered to move the conversation forward:

PC: I have a dog.
BOT: (Creates instance of object of class "dog" and places it in pc.inventory.)
BOT: (Assigns default value to dog.name = { Fido, Rover, Spot } as "ASSUMED".)
BOT: (Assigns default value to dog.breed = { Collie, Pit Bull, OTHER } as "ASSUMED".)
BOT: (Any missing information? Yes - dog.name, dog.breed, dog.I_like.)
BOT: (Breed has highest priority)
BOT: What is it, Collie, Pit Bull, or what?
PC: Beagle.
BOT: (Assigns value dog.breed = "beagle" as "ASSERTED").
BOT: (Template for "beagle" contains default value dog.I_like = "yes". Volunteer this.)
BOT: Oh, I really like beagles.

Already we can begin to see how a little bit of understanding and a touch of curiosity would cause a bot to become not only more relevant with its replies, but more proactive in holding up its end of the conversation.

Parsing for Understanding

In order to connect the simulation to the real world, we must be able to parse incomming sentences, and to generate responses. The generation of responses is by far the easier of the two tasks, so that will be saved for a little later. For now, we will take a closer look at parsing.

Since the end result of parsing must be that actions take place in the simulation, each English language sentence must be translated into a sequence of basic simulation commands. Typically, informal conversational English has sentences and, frequently, sentence fragments that are unusally simple, and often in violation of one or more basic grammatical rules. The process of parsing I propose is not based on a formal grammar. Any human being, and by extension, any reasonably good Turing bot, should be able to easily understand sentence fragments in context, and meaningful, but grammatically incorrect sentences like "Them dogs is mean.", and "Think you small am I?"