Building a Context Network from a Narrative
An Approach to Chatbot Question Answering
Page Six: Nuts and Bolts 2: From Tags to Nodes
by Gary J. Shannon
Started Nov. 20, 2008
Last Updated Nov 22, 2008
Page One: Overview
Page Two: Details of Parsing
Page Three: World Models and Reasoning by Analogy
Page Four: Exploring Some Complications
Page Five: Nuts and Bolts 1: Part of Speech Tagging
Page Six: Nuts and Bolts 2: From Tags to Nodes
Page Seven: Chunking and Knowledge Units
Baby Steps
Before we can translate a tagged sentence into a graph with nodes and edges, we need to decide which words in the sentence will become nodes and which will become edges. Take a look at this sample sentence, and the way the tagger tags each word:
Yesterday John said he thought the baby would be walking by Friday, but she's already walking this morning.
STRT RB=Yesterday NN=John VBD=said PRP=he VBD=thought DT=the NN=baby XW=would X2=be VBG=walking IN=by NN=Friday PUN=, CC=but PRP=she X2='s RB=already VBG=walking DT=this NN=morning FST=.
Peeking ahead to what we want the completed graph to look like, we can compare each node and edge to each word and start to classify word types according to what role they will play in the final graph.

The first thing we notice is that nouns, verbs, adjectives and adverbs each get their own individual node in the graph. The links consist of everything that's left; prepositions, conjunctions, and implied verb argument relationships. Drawing the graph becomes a simple(?) matter of identifying the content words and identifying the links between them. Also notice the node "? will walk". The question mark indicates that this node does not represent a verified factual statement, but a statement that is possible, provisional, or true only in some context, as explained earlier under counter-factual statements.
Taking Apart a Sentence
Ideally, each sentence would have a single verb, a single subject, and anywhere from zero to two objects, making it easy to graph. There would be a single oval node for the verb surrounded by anywhere from one to three additional noun or adjective nodes. For example: "John gave the book to Mary."

In real life, however, the typical sentence will be somewhat more complex than that. Nouns may have adjectives attached, verbs might be modified by adverbs, additional objects might be present, connected to the verb by prepositions, and there might be multiple verbs in the sentence, connected together by conjunctions indicating their causal or temporal relationships.
Before the sentence can be easily graphed, however, the sentence has to be taken apart and recast in the form of as many separate simple sentences as necessary to complete the concept. A simple sentence will be considered to be any sentence with a single verb along with its subject and objects. Here, for example, is an example of how a complex sentence might be broken into two simple sentences:
The boy who has my book won't give it back.
[The boy] [who has my book] [won't give it back].
[The boy] [who has my book] [ ]
[The boy] [ ] [won't give it back].
The boy has my book. The boy won't give it back.
Once the sentence has been broken down, the two verbs can be linked together by an arc showing that they are two parts of a single thought. Then each verb can be linked to its own subject and object nodes by the appropriate kind of links. Nouns, verbs, adjectives and adverbs all get their own separate nodes, and the links are either labeled by the relationship implied by the sentence structure, or by the preposition that ties the node to another node.
One problem that comes up in the above sentence is the adverb "back". Somewhere in the process of parsing the sentence it has to be realized that "give ... back" is an idiomatic way of saying "give ... TO proper owner or possessor" so that the correct "give .... TO" relationship link can be drawn. Exactly how that happens is something yet to be explored in detail. Clearly there will be some kind of "idiom processor" stage that irons out those details with some sort of dictionary of idioms.
For now, let's just jump ahead and look at what nodes and links need to be created from the two sentence pieces above:
The boy has my book.
NODES:
NN1: the boy
NN2: book
NN3: me
VB1: have
LINKS:
VB1->NN1 "who" - who has something?
VB1->NN2 "what" - what does he have?
NN2->NN3 "belong to" - to whom does it belong?
The boy will not give my book to me.
NODES:
VB2: "will not give"
LINKS:
VB1->VB2 "next" - what else to we have to say?
VB2->NN1 "who" - who will not give?
VB2->NN2 "what" - what will they not give?
VB2->NN3 "to" - to whom will they not give it?
Or in graphical form:

Notice that the three words "will not give" are all counted as a single verb. This is done to simplify the graph, by condensing all the negatives, and various tense, mood and aspect auxiliaries into a single verbal concept. Internally, there might well be a separate negative node, a separate tense node, and so on, hanging off the verb node. That remains to be decided.
Chunking and Knowledge Units
The individual pieces into which the sentence is broken are called knowledge units and we call the process of breaking down the sentence, chunking, after the somewhat similar procedures in traditional natural language processing called noun phrase chunking and verb phrase chunking. The next page will explore the details of the construction of a knowledge unit, and the following page will begin exploring the algorithm by which sentences are chunked.
Page One: Overview
Page Two: Details of Parsing
Page Three: World Models and Reasoning by Analogy
Page Four: Exploring Some Complications
Page Five: Nuts and Bolts 1: Part of Speech Tagging
Page Six: Nuts and Bolts 2: From Tags to Nodes
Page Seven: Chunking and Knowledge Units