Home

Building a Context Network from a Narrative
An Approach to Chatbot Question Answering

by Gary J. Shannon

Started Nov. 2, 2008
Last Updated Nov 7, 2008
Page One: Overview
Page Two: Details of Parsing
Page Three: World Models and Reasoning by Analogy
Page Four: Exploring Some Complications
Page Five: Nuts and Bolts 1: Part of Speech Tagging
Page Six: Nuts and Bolts 2: From Tags to Nodes
Page Seven: Chunking and Knowledge Units

A VERY Rough Sketch of the Basic Idea

As a conversation proceeds, or as a story is told, the chatbot should be building up a deeper understanding of the topics under discussion. It is not enough for a chatbot to be able to understand a single sentence. That sentence must be understood within the context of the entire conversation if the responses of the chatbot are to be at all convincing.

With that in mind, the parse tree of a sentence should be replaced by a parse graph of the entire conversation. This network will depict the items discussed so far and their relationships. As an example, consider the following few sentences from a children's reader. As each sentence is parsed that parse is added to the complete network, making connections where they seem appropriate. To read out the meaning of the graph, begin at "start" and follow the heavy line straight down.

  1. For weeks the children had been looking forward to summer vacation.
  2. They could hardly wait for the long summer days.
  3. Then there would be no more hurrying off after breakfast.
  4. No more homework in the evening!
  5. There would be nothing to do but play and enjoy themselves from morning until night.
  6. At least, that was how it seemed to Bill and Martha Strong and the Baker twins and Sally Green, the little group of friends who lived at the far end of town.
  7. As they had gone to and from school together the last few days, they had talked of nothing but vacation and what good times they were going to have.
  8. It would be just too wonderful.

The boxes containing question marks ??? are boxes that the chatbot might want to fill. In other words, these are the bot's curiosity bump, motivating the bot to hold up its end of the conversation by asking pertinent questions.

This is just a rough sketch, and a lot of detail has been omitted for the sake of simplicity. The more complete graph would include labeling of all edges, and something to indicate whether a given edge was explicitly stated, or just deduced from other clues. Unsatisfied edges would also give the bot "curiosity" about how to fill those edges, which could lead to questions appropriate to the conversational context.

Answering Questions

Since the context network contains everything that the bot has been told, it should be possible for the bot to answer questions about what it has been told. Consider the question:

Where do the kids live?

Beginning at the box containing "children" (we have "kids=children" in the thesaurus), we step from box to box looking for the verb "live" and its "where" argument. Presumably we would also include whatever synonyms for "live" seem likely in some kind of thesaurus. The graph below shows the path taken by the successful search in yellow, and all the dead-end paths in red. Notice that the heavy lines are not traversed, because those only represent temporal the relationship among the sentences.

The network also allows for answering more general, or open ended questions like: "Tell me everything you know about the kids." To answer such a question, the bot begins with the "kids" node and spreads out in all directions, paraphrasing the knowledge represented by each node-edge-node triple that it encounters. Thus it might answer: "The kids are Bill, Martha, the Grant twins and Sally. They live at the far end of town. They are looking forward to summer vacation. ..." and so on.

Frames and Scripts

One approach to artificial intelligence uses data structures called frames. A frame is an object with a certain number of "slots" that must be filled. For example, for a frame representing some physical object, the slots might be such attributes as size, weight, color, etc.

When the slots of a frame need to be filled in some specific chronological order then the frame is called a "script". The classic example of a script is a trip to the restaurant. There is an expected series of steps, or slots, beginning with waiting to be seated, placing your order, waiting for your food, eating, paying the check, and so on. The script is created by a human programmer and made available to the AI program.

However, if the above graph-building technique is used, the bot can be walked through an example of the script, (i.e., "taken" on a trip to the restaurant) and then told to remember the resulting graph. This graph can then become a model for occasions that seem to follow a similar set of steps. As long as a good set of slots are associated with each raw verb and each raw noun, the slots of the frame or script will fall into place of their own accord as the bot builds the frame or script in the form of the context network.

The essential difference between a script and a frame now becomes a matter of which set of edges is followed. The vertical, or temporal edges define the script, and the horizontal edges define the frame. Attribute questions like "Where do the kids live?" are answered on the horizontal edges, and temporal questions like "What did the kids do after playing checkers?" would be answered on the vertical edges.

If, at some future date, the bot is taken on another virtual trip to a different restaurant, it can follow its existing script, adding to the data nodes any deviations from the first example trip. From this data a way might be found for the bot to automate its own generalization of what a typical trip to the restaurant is like.

Breaking Down the Process

In order to get from the sentence to the final network will require a very detailed set of procedures for each step in the process. It will also require a thesaurus to do any necessary preprocessing of the text, and a dictionary to look up the words to determine their part of speech and their expected network links. One example of thesaurus preprocessing might be replacing "his" with "he's" so that the parser can properly separate the pronoun from its possessive suffix. After that step, "he's book" looks just like "John's book" and the parser doesn't need to deal with special cases. The thesaurus would also have the job of replacing idiomatic and slang phrases with their basic English equivalent: "chew the fat" -> "talk", "turned him down" -> "refused his request", "give me a hand" -> "help me", etc.

Each dictionary entry will contain different information depending on whether the word is a noun, a verb, an adjective, or some other part of speech, and depending also on how many different parts of speech a particular word might be. A verb might contain a list of the types of links that that verb expects. Take the verb "looked", for example. This verb can mean that someone looked at something: "He looked at the sky.", or that something had a particular attribute (adjective): "He looked angry.", or that something looked like some other thing: "He looked like crap.", or that something looked like some embedded sentence: "He looked like he wanted to laugh."

Since the word "looked" in any given sentence can only fill one of those roles, the links are mutually exclusive. The network might be initially built with all four of those kinds of links in the "looked" node, but as soon as one of those links is satisfied, the others are discarded. Other verbs would have other kinds of links depending on how they are normally used. For example, "tempted" would have a link for "to", to reflect usage like "tempted to run".

Nouns could have link in the dictionary connecting them to things which were already known about those nouns. For example, the word "softball" might link to the existing general knowledge network as a type of "ball", that is used in a "game" called "softball" played by "teams", and so on. That network is pre-built and can be linked to by the new network that represents the current sentences. In fact, that general knowledge network could also be built conversationally with the bot, by telling the bot that "softball" is a type of "ball", and so on, and then telling the bot to remember what it had been told.

Words in the dictionary could also include synonyms linked to each word so that if the bot were told "My running shorts are neon pink", when asked the color of the shorts, it might reply "hot pink", using "hot" as a synonym for "neon" in this context, and adding a bit lifelike ability to paraphrase. It might, for example, also be told of a tee shirt, and then refer to it as a T-shirt, more evidence that the bot is not simply parroting the string of characters it was given.

A Worked Example

It's all well and good to dream about getting from an input sentence to a completed network graph, but for that dream to become a reality it has to be demonstrated that there exists some definite set of procedural steps that can transform the sentence into the network. Using the following example taken from an elementary school reader, we will show how the thesaurus, dictionary, and rule set might be used to perform that transformation.

He looked so silly in his neon pink running shorts and shapeless tee shirt printed with the name of his hospital's softball team that Jessica was tempted.

We assume that we already have a dictionary and thesaurus capable of recognizing the words in this sentence:

Thesaurus
---------

his -> he's

Dictionary
----------

's - PO
and - CN
he - NP link[is]->N
hospital -NP 
in - PR
Jessica - NP
looked - V (overloaded: [NP looked ADJ] or [NP looked (at NP)] or [NP looked (like NP|S)])
name - NP
neon pink - ADJ [color] = pink [type]->ADJ:neon|hot
of - OF
printed with - PR 
running shorts - NP = shorts [type]->ADJ:running
shapeless - ADJ [attr]
silly - ADJ
so - PR
softball - NP = ball [type]->softball; [attr]->soft
softball team - NP = team [type]->softball
tee shirt - NP - shirt [type]->ADJ:tee|T
that - PR
the - ART
was tempted - V (NP was tempted (to V))

    

Notice that some "words" in the dictionary are actually multiple words. This is done here primarily to simplify the example. Creating rules to construct these phrases is quite straightforward in some cases, and in other cases, the multi-word idiomatic usages would actually appear in the final dictionary. Verb tenses would obviously be determined algorithmically, not stored as every possible tense of every possible verb, as in this abbreviated sample dictionary.

Rather than include links into an existing knowledge network (which doesn't exist yet) the world-knowledge of the bot is just hard-coded into the sample dictionary. This is for illustration purposes only. In reality, the links would be to a database external to the dictionary itself.

The goal is to arrive at this network graph for the above sentence: (NOTE: This graph was produced using the open source graph drawing software Graphviz.)

Or expressed in text form as nodes and links:

NP(1): he
	link: [is] -> ???
	from: [belong to] <- NP(2)
	from: [belong to] <- NP(3)
	from: [belong to] <- NP(5)
	from: [by] <- V(1)
NP(2): shorts
	link: [type] -> ADJ(1)
	link: [color] -> ADJ(2)
	link: [belong to] -> NP(1)
	from: [in] <- ADJ(7)
NP(3): shirt
	link: [type] -> NP(10)
	link: [attr] -> ADJ(5)
	link: [printed with] -> NP(4)
	from: [in] <= ADJ(7)
	link: [belong to] -> NP(1)
NP(4): name
	from: [printed with] <- NP(3)
	link: [of] -> NP(6)
NP(5) hospital
	link: [belong to] -> NP(1)
	from: [belong to] <- NP(6)
NP(6): team
	link: [type] -> NP(8)
	link: [belong to] -> NP(5)
	from: [of] <- NP(4)
NP(8): softball
NP(9): Jessica
	from: [is] <- V(2)
NP(10): tee shirt| T-shirt
	from: [type] <- NP(3)
ADJ(1): running
	from: [type] <- shorts
ADJ(2): pink [color]
	link: [type] -> ADJ(3)
	from: [color] <- NP(2)
ADJ(3): neon | hot
	from: [type] <- ADJ(2)
ADJ(5): shapeless
	from: [attr] <- NP(3)
ADJ(7): silly
	link: [in] -> NP(2)
	link: [in] -> NP(3)
	from: [attr] <- V(1)
V(1): looked
	link: [by] -> NP(1)
	link: [attr ADJ|at NP] -> ADJ(7)
	link: [so that] -> V(2)
	link: [by] <- V(1)
V(2): was tempted
	link: [by] -> V(1)
	link: [is] -> NP(9)
	link: [to] -> ???
	from: [so that] <- V(1)
    



Page One: Overview
Page Two: Details of Parsing
Page Three: World Models and Reasoning by Analogy
Page Four: Exploring Some Complications
Page Five: Nuts and Bolts 1: Part of Speech Tagging
Page Six: Nuts and Bolts 2: From Tags to Nodes
Page Seven: Chunking and Knowledge Units