Building a Context Network from a Narrative
An Approach to Chatbot Question Answering
Page Four: Exploring Some Complications
by Gary J. Shannon
Started Nov. 9, 2008
Last Updated Nov 9, 2008
Page One: Overview
Page Two: Details of Parsing
Page Three: World Models and Reasoning by Analogy
Page Four: Exploring Some Complications
Page Five: Nuts and Bolts 1: Part of Speech Tagging
Page Six: Nuts and Bolts 2: From Tags to Nodes
Page Seven: Chunking and Knowledge Units
Some Complications
This page is an aside, a preliminary exploration of the some of the many complications which exist in translating a sentence into a knowledge network. It will touch on various topics in natural language processing that will need to be more completely addressed in order to attempt to build a chatbot based on these principles.
The complications will not be discussed in any particular order, and no final solution to any of them is offered here. This page should be considered as preliminary in the extreme, and should be treated as my "stream of consciousness" journal as I mull over these problems.
Hypothetical, Provisional, Counter-Factual and Context Dependent "Truth"
Not everything the bot is told can be taken at face value and assumed to be true. Some things are true only in certain limited contexts, some things are are believed to be true by someone, and may or may not be true in the larger context, and some things can be considered true, provisionally, or hypothetically in order to advance some line of thought.
The statement "Unicorns have one horn." is only true with the context of a fantasy or fairy tale, yet the chatbot should understand statements like "Unicorns do not exist and they have one horn." Likewise the bot should be able to answer questions like "Does Bilbo live in Middle Earth?", yet recognize the incongruity of questions like "How far is Middle Earth from Paris?" Facts about Middle Earth need to be compartmentalized from facts about the real world. Yet certain real-world facts still apply in Middle Earth. Hobbits have mothers and fathers (within the fantasy context), and they are born and die like other mortals. Some real-world generalizations clearly can be imported into the fantasy realm without difficulty, while others cannot.
Another application of compartmentalization of context-specific facts would be necessary in discussing historical settings. Plato existed in the real world, not in a fantasy world, yet Plato must be compartmentalized from the real world of the present so that the incongruity of statements like "Plato took the train to Boston." can be recognized. (This assumes we are not discussing a modern character who happens to have the name "Plato.", a fact which the bot should also be able to recognize.)
Another class of assertions is based on what one or more people believe to be true, whether it is true or not. For example, "John thinks Mary is in New York." Now what if the bot is asked the question: "Is Mary in New York?" The only proper answer is "I don't know." Or the bot might also reply with something like "I don't know, but John thinks she is."
The third class involves things which are assumed, provisionally, to be true for the sake of argument. "If Mary has my book, then she should give it back." The bot should generate the correct response to "Does Mary have my book?" and not confuse an hypothetical assertion with a factual assertion.
As a random starting point, let's take a closer look at case two. The most obvious approach, given the methods outlined in the previous pages, would be something like this:

The obvious problem with such an analysis is that the "reasoning engine" of the bot knows only how to follow arcs to find connections between the arguments in the question: "Mary" and "New York" in order to determine if the relationship in the question exists. Clearly the relationship does exist in the graph, and so the question would be incorrectly answered in the affirmative.
A different approach would be for the parser to recognize the dependent nature of the embedded assertion "Mary is in New York", and compound the two verbs "thinks" and "is" into a single verb node: "believed to be":

Now the search of nodes comes up with the fact that Mary "is believed to be" in New York, but not the false conclusion that she is in New York. This requires no change to the reasoning engine, only a change in how the knowledge is represented in the graph.
The advantage of this method can be seen in the slightly more complex sentence: "John thinks Mary is in New York, but I know she's in Denver." The original method would conclude that Mary is both in New York, and simultaneously in Denver. The alternate approach gives us this unambiguous result:

Here the positive "I know" is taken at face value to indicate that the following assertion is known to be true. Issues revolving around the trustworthiness of the informant will not be explored at this time.
It may also be useful, though I'm not sure how, to compound verbs of "knowing" (or "seeing", or "hearing", etc) as well. I'm not sure what this would accomplish, but it's something to think about, as shown in the graph below:

A final approach (although there may be others I haven't thought of yet) would be a variation on the first one. The copula node "is" is kept separate, but is replaced with a "qualified is", which is only regarded as true within the context specified by the "context" link. That would look like this:

Now the erroneous conclusion that Mary is in New York will not be reached simply because "?is" does not match "is". The reasoning engine would have to be modified to include knowledge of the special properties of qualified verbs, and that their truth is dependent on what is found at the other end of the "context" link.
This method has the additional advantage that it could handle fictional "facts" as well. The "qualified is" in "Bilbo is a hobbit." could have a context link that points to "[Middle Earth] -of-> [fictional works] -of-> [J.R.R.Tolkien]" or some similar contextual structure that defined the domain in which "is" is taken to be true.
The Donkey Sentence
There is a class of sentences called "donkey sentences" which seem to be problematic when attempts are made to translate them into formal logic. See this interesting Wikipedia article for more information. Using my approach to graphing the meaning of a sentence such sentences are no problem at all. The sentence is simply graphed in the normal way and included as part of the prototype for "farmer", (and "donkey") so that whenever an instance of "farmer" (or of "donkey") is invoked, the graph of the sentence is inherited from the prototype.
What is unusual about the graph is that it contains two verbs which are marked with the question mark to indicate that they are of unknown truth. In this case, however, the context in which one verb can be taken as true is the truth of the other verb. Thus, if we learn that the farmer does own a donkey, then the truth of that assertion provides the context in which the assertion that the farmers beats the donkey is also true. In other words, if the question mark is ever removed from "? belong to" then any existing "then" links are followed, and their question marks are also removed. Thus what was a hypothetical statment in the "farmer" prototype, becomes a true statement in this particular instance of "farmer".
Here's what that graph looks like:

Pronoun Resolution
Another problem is the binding of a pronoun to its proper noun, whether within a single sentence, or within a longer discourse. Some bindings will be syntactically unambiguous as in: "John saw the ball and picked it up." Others are inherently ambiguous from a syntactic perspective, yet semantics guides the selection of binding, as in: "If your kitten doesn't like cold milk, put it in the microwave." As far as pure syntax is concerned, this could, with equal validity, be a suggestion to put the milk in the microwave, or to put the kitten in the microwave. Semantics, however, tells us that the meaning of "put the kitten in the microwave" is not socially acceptable, and that interpretation could be disregarded. (Note, however, that sometimes misleading the listener as to pronoun binding could be intentional for the purpose of creating humor.)
Finally, there are sentences which are intrinsically ambiguous, and the listener has no choice but to ask for clarification: "John and Fred were standing in the park when he hit him." The bot must, if it is smart enough to recognize the problem, ask "Who hit whom?"
For the present project we will simply use the algorithm detailed in this paper: [Hobbs, Jerry R., 1976. "Pronoun Resolution". Research Report 76-1, Department of Computer Sciences, City College, City University of New York. August 1976].
Page One: Overview
Page Two: Details of Parsing
Page Three: World Models and Reasoning by Analogy
Page Four: Exploring Some Complications
Page Five: Nuts and Bolts 1: Part of Speech Tagging
Page Six: Nuts and Bolts 2: From Tags to Nodes
Page Seven: Chunking and Knowledge Units