A proposed solution to the problem of levels in error-message generation.
THE PROBLEM OF LEVELS Most computer users are not familiar with how their systems work beyond a certain level of operations. Consequently, there is often a problem with error-message generation in interactive software systems. Such problems often occur at a program execution level unknown to the user. Consider the case of an airline reservation clerk who uses a software package specifically designed for that job. To find out who is booked on flight 173 from Heathrow on October 1, 1986, the clerk might type a command such as
Listbookings (Heathrow, 173, 10/1/86).
In response, an error message is issued by the system:
Buffer overflow, cannot complete the creation of file 173100186.TEMP. Reset buffer to 256 records.
This message will not make much sense to the user, whose intention is to view a listing of passengers booked on a certain flight.
What buffer is being talked about? Why did the system have to create a file called 173100186.TEMP? The "friendly" system even told the user what to do to fix the error: Reset buffer to 256 records. But how does one "reset" the buffer? To answer these questions, the user needs to know many technical details about the implementation of the system: The system searched for the records with matching keys for the specified airport, flight number, and date. Those records were stored in a temporary file with a name reflecting the flight number and date; hence the file name 173100186.TEMP. This temporary file was to be stored in a special buffer where the screen display program could find it. But the disk was full, and the display file could not be stored. Resetting the buffer would mean defining a new upper bound on the number of recores in that file.
The situation described here is imaginary. Nevertheless it represents the general flavor of messages generated by many actual systems. For example, the message below is generated by 4.2BSD UNIX while formatting files on the terminal (see  for a more detailed discussion of this message):
Longjmp botch: core dumped.
IBM PC-DOS2.10 may signal the following message when it is turned on:
Bad or missing <filename>; or the following message in response to a print command:
File creation error.
Many computer users have probably experienced these kinds of error messages. The problem with these messages is that, among many other things, they are signaled by a lower level operation that has been invoked by the user command without the user being aware of it. Figure 1 illustrates this concept schematically. A user types the command P.sub.i., which invokes lower level processes such as P.sub.J.. P.sub.j may in turn invoke other lower level processes, and so on. An error is detected at the lowest level by P.sub.1., and P.sub.1 signals an error message.
For the user, an error message generated by lower level operations cannot be expected to make sense. As far as the user is concerned, the command he or she has issued is an indivisible operation. The user is not always aware of the existence or purpose of lower level processes. Furthermore, although the combined effects of lower level operations may satisfy the user's command, there is a distinction between the user's intention and the intention of individual operations at the lower levels. For example, the user's intention may be to print a file, whereas a lower level operation invoked by the print command attempts to create a temporary print file. Despite this distinction in intentions, the error message generated by the create command is designed with the assumption that the user intended to create a file.
Is it possible for lower level operations to generate messages that make sense at the user's level of intention? The answer is almost always no unless the lower level operation "knows" the user's intention or the message says
An internal failure has occurred (IBM PC-DOS2.10).
The difficulties in error-message generation in such cases arise because a lower level operation can be invoked by many different user commands that serve radically different user intentions. In this sense, a more realistic representation of a system may be the one shown in Figure 2 rather than in Figure 1. Figure 2 shows two levels of intentions, but there may be many such levels. An operation at level i may invoke many processes at level i - 1. Conversely, a process at level i - 1 may be invoked by many different operations at level i. When designing an error message for a lower level operation, it is difficult to find the magic phrase that is informative and makes sense in the context of every possible user intention.
"LET'S TALK ABOUT IT" STRATEGY
AS THE ANSWER
The problem of levels in response message generation has also been recognized by other researchers [2, 5], who have proposed a "let's talk about it" strategy (LTAI) as a possible solution although they did not discuss how to implement it.
In this strategy the system first generates a message at the highest level of intention and indicates there has been a difficulty. Lewis and Norman comment:
Let the user be given tools to explore the problem to whatever depth is desired. Let the user be able to trace down the levels, to see where the original mismatch occurred, how that level was reached, and the state of system at each level. The programmer can explore in detail: The experienced user can explore until the basic problem is clear, and the uninterested can remain uninterested. . . . This kind of constructive interaction with the system has the possibility of solving the levels problem along with the error message problem. 
The purpose of this article is to describe how an LTAI strategy can be implemented. To test the results of this research, a prototype system was designed to simulate a desk calculator. The following scenario is typical of the kind of dialogue that the system is capable of carrying out. If the user typed the + command, forgetting that there is only one value in the memory, the system would initially generate the following message:
I cannot add two values, because there is only one value.
The user could initiate a dialogue to trace down the levels of program execution by typing why?:
System: To add two values, I needed to get the two values, but there is only one value.
System: To get the two values, I needed to get the first value, which I did.
System: Then I needed to get the second value, but there is no more value.
The user has the option to quit this dialogue at any stage by typing OK. Assuming it is required to trace the levels down as far as possible, the message That is all indicates that the erro level has been reached. This will be illustrated later in a longer version of the preceding dialogue.
One might convincingly argue that in the case of the + operation there is no need to generate error messages: A default value of zero may be used for the second operand. This is indeed the case for most desk calculators. The purpose of an implemented system, however, is not to fix errors, but to report them. It is designed to test whether or not the LTAI strategy works in the way the theoretical research indicated it would.
In the proposed model, error messages are constructed automatically from the description of the application package being used. Hence, the problem of designing and maintaining a large number of messages is eliminated. The response generator needs to be designed only once, and can be used for any application package with a formal system description. Provision of a system description is required while the system is being built. Finally, use of the proposed model provides uniformity in the style of response messages among the application packages that use it.
THE MESSAGE GENERATION MODEL
As noted earlier, an error message signaled at any level is designed with the assumption that the receiver is the caller of that operation. There is, of course, nothing wrong with this assumption if the command is activated by the user. The trouble occurs when the user is not the caller of that operation.
In the proposed model, lower level operations do not signal any messages to the user. The user only receives messages from an operation P.sub.1 that he or she has invoked. In the case of a dialogue in the LTAI strategy, messages are generated by the highest level operation being traced by the user.
Conceptually, this mechanism works as follows: Referring back to the invocation tree in Figure 1, when a lower level operation P.sub.1 detects an error, E.sub.1, for instance, it signals this error to tis caller, P.sub.k. P.sub.k receives this error signal, interprets its significance for its own purposes, and if necessary, signals another error to its caller P.sub.j, and so forth. Eventually, the error signal propagates up to the user command P.sub.1, which generates a message for the user. Now this message will make sense to the user, because it is generated by the user command P.sub.1.
In the case of the LTAI strategy, an error message will be generated by a lower level operation after the user has traced down to this level and the purpose of that operation has been explained. By repeating the process of telling the user the intention of the lower level processes and the cause of error at each level, it is expected that the error messages will make sense to the user all the way down to the lowest level.
This outlines the basic idea behind the message generation model. Figure 3 shows the fundamental components of the model. In this figure, the desk calculator description consists of three modules: The first module contains a set of invocation trees, one tree for each user command. Figure 4 shows the invocation tree for a + command. The second module is a table of pieces of text that describes what each operation is supposed to do. The entries in this table are indexed according to index numbers associating the command names in the invocation trees. For example, the entries for the index numbers in the invocation tree of Figure 4 are shown in Figure 5. The third module is another table that contains text describing error conditions, one entry for each error. Figure 6 shows the entries for all possible error conditions in the invocation tree of Figure 4. All of these data structures are constructed from a formal system description supplied for the programmer. A programmer description method suitable for this purpose is proposed in .
The context stack is used to shore currently active operation names. At the bottom there is always a command invoked by the user. Newer operations are pushed as they are invoked by higher levels. An operation is popped from the stack after its successful completion. Thus, the context stack provides a means of monitoring the system's state, and is used in message generation in order to determine where in the exact path of the invocation tree the error has occurred.
The message generator performs the task of error propagation and message generation. Details of these processes are described later. In order to keep these mechanisms independent of the application program, these processes are implemented separately as an independent unit. The message generator performs the same routine no matter what application package is used. Therefore, it is also possible to time-share it among many application packages as long as a system description is provided for each package.
To illustrate how these mechanisms work in the LTAI strategy, let us assume that the user has typed the + command forgetting that there is only one value in the calculator's memory. The first thing + does is push its name in the context stack. Then, + invokes its lower level operations (immediate children in the invocation tree) in the order prescribed by its algorithm (shown left-to-right). This is a repeated for each operation called, all the way down to the lowest levels. when an operation completes its task successfully, it pops its name from the context stack before terminating. An operation is considered successful when all of the children in the invocation tree have terminated successfully.
When an error is detected in the lowest level of program execution, the current state of the context stack is preserved. The set of nodes in the context stack constitutes a path in the invocation tree, from the root down to the terminal node where the error has occurred. This path is called the error path. In the case of the + operation, an error is detected inside getbelow by the operation testifempty. Getbelow is called by getoperands, which is called by +. The context stack in Figure 7a shows the nodes in the error path when this erro occurs.
The erros detected by testifempty is empty, but before generating a message, this error is converted into another error in the user's level of intention. The error-conversion mechanism will be discussed later. In this case the following are the stages in error conversion: In the context of getbelow, the error empty corresponds to the erro nomore. Therefore, empty is converted into nomore. Similarly, in the context of getoperands, the error nomore corresponds to onlyone; and finally, in the context of + the error onlyone corresponds to itself (i.e., it is preserved). This is the error in the user's level of intention that is used in message generation.
In the implementation of the message generation mechanism, the names of errors are inserted into an error list every time they are signaled to higher levels. The first error inserted into the list corresponds to the lowest level operation, and the last corresponds to the user command. The last error name, together with the user command at the bottom of the stack, determines the initial error message generated. figure 7b shows the error list corresponding to the operations in the error path for the case considered here.
In generating a message from an error, a basic template is used. The template for the highest level errors is in the form
I CANNOT <operation>, because <error>.
The operation and error messages are replaced by pieces of text explaining the operation invoked, and the error signaled. These are found in the tables for operation and error descriptions, that is, the modules 2 and 3 in the system description. For the error case, the message obtained would be
I CANNOT ADD TWO VALUES, BECAUSE THERE IS ONLY ONE VALUE.
Once the user has signaled an error message, he or she can start a dialogue by typing why?. During a dialogue a pointer T is used to indicate the current operation name in the invocation tree. In constructing in error message, the current operation name pointed to by T is used as the focus of the dialogue. That is, the error message first explains the purpose of the current operation and then explains the error detected in the context of that operation. Initially T points to the root of the invocation tree. Therefore, the message generated uses the root of the invocation tree as the context of its initial message.
During a dialogue the user can type why? or then? to change the focus of the dialogue, that is, to move the pointer T between the nodes of the invocation tree. Every time T moves to a new node, a new message is generated in the context of that node. Figure 8 illustrates the possible pointer movements in a binary tree, depending on the user query, discussed below.
Why" moves the pointer T to the leftmost child of a node pointed to by T. If T is pointing to a terminal node that is in the error path, then T remains in its original place. In this case a response of THAT IS ALL is printed. If, however, T points to a terminal node that is not in the error path, then the effect of why? is the same as the effect of then?, described next.
Then? moves the pointer T to the right sibling of the node pointed to by T, subject to the following exceptions: (1) if there is no right sibling, T moves to the right sibling of the parent of the node pointed to by T. If the parent has no right sibling, then the parent of the parent is considered, and so forth. If none of the ancestors has a right sibling, then T remains in its original place. In this case a THAT IS ALL message is printed. (2) If the node pointed to by T is one of those in the error path, the effect of then? is the same as the effect of why? above.
Depending on the user query, a certain template is used in the message constructed. These templates are as follows: why? To <parent-op> I NEEDED TO <current-op>, BUT <error> WHICH I DID Then? THEN I NEEDED TO <current-op>, BUT <error> WHICH I DID
In the why? template, the parent-op refers to the parent of the node pointed to by T. Current-op is the node pointed to by T. Vertical bars represent different options in the message generated. IF the node pointed to by T is in the error path, then the BUT <error> option is used. Otherwise, no error has occurred in the operation pointed to by T, and therefore, the WHICH I DID option is used. Similar comments are valid for the then? template. The following dialogue illustrates the use of these templates, together with the data structures in Figures 4-6:
Sytem: TO ADD TWO VALUES, I NEEDED TO GET THE TWO VALUES, BUT THERE IS ONLY ONE VALUE.
Sytem: TO GET THE TWO VALUES, I NEEDED TO GET THE FIRST VALUE, WHICH I DID.
System: THEN I NEEDED TO GET THE SECOND VALUE, BUT THERE IS NO MORE VALUE.
System: TO GET THE SECOND VALUE, I NEEDED TO CHECK IF THERE IS ANY VALUE IN THE WORK SPACE, BUT THE WORK SPACE IS EMPTY.
System: THAT IS ALL.
In the first two messages, the why? template is used. The third message uses the then? template. In the fourth message, the why? template is used although the user asked then?. This is because the pointer T was pointing to an operation in the error path before the user asked then?. Therefore, the effect of this query is the same as the effect of why?.
For any givern error, there are many different dialogues possible. The number of dialogues is equal to the number distinct paths between the root of the invocation tree and the terminal node where the error has occurred. The actual path followed during a dialogue depends on the sequence of why? or then? queries the user types. For example, if an error occurs in the right-most terminal node in the tree of Figure 8, 10 different dialogues are possible, since there are 10 distinct paths from the root to this node.
In general, it is well known from graph theory that the number of distinct paths in a graph increases as an exponential function of the number of nodes in it. For example, by adding one more level (i.e., 16 nodes) to the graph of Figure 8, the number of distinct paths from the root to the rightmost terminal node increases to 260. Generating such a large number of dialogues would only require adding 16 new lines of explanatory text for the new operations and a proportional number of error text lines.
ERROR HANDLING AND PROPAGATION
In this section error handling is considered a part of a system design discipline for software reliability. In a system consisting of many levels of procedure calls, the notion of "atomicity" of operations becomes a fundamental design concept. That is, as far as higher levels are concerned, internal details of lower level operations are irrelevant. A higher level operation only needs to know the input parameters required by the lower level operations, and the results they return.
Earlier software engineering researchers characterized an atomic operation as follows: An atomic operation is one that, during its execution, does not communicate with other processes defined at the same level of abstraction. Performing an atomic operation causes no indirect state change as a side effect of its execution. An atomic operation is not aware of the existence of any other active processes, and no other process is aware of its activity. The reader may refer to  for a detailed discussion of atomic operations.
With this formalism in mind, the notion of levels of intention readily corresponds to levels of abstraction in software engineering terms. This concept is similarly represented in Figure 2. Here a plane represents a level of abstraction. The circles at the plane represent atomic operations at this level, and the arrows between atomic operations represent mappings between two levels. It can be seen that many atomic operations at level i - 1 may be "fused" into a single atomic operation at level i. Alternatively, an atomic operation at level i -- 1 may be called by many atomic operations at level i.
In this formalism the notion of error is completely eliminated. It is called an exception. This is because the term error implies a mistake or negligence on the part of the programmer that will cause the program to crash. An exception, on the other hand, implies a potential error that is anticipated by the programmer. The term exception handling refers to taking care of potential errors in a program in a systematic way. Errors cannot be tolerated in programs, whereas exceptions are inevitable facts of life.
In order to devise mechanisms for exception handling, it is necessary to better understand just what an exception is. In general, exception refers to an object state inconsistent with the purpose of a computation. It is important to realize that an exceptional state of an object is defined in relation to an operation applied to this object. No object state can be said to be exceptional in isolation. for example, an empty stack may cause an exception in the context of a POP operation, whereas in the context of a PUSH operation it would be considered a normal state.
In a modular program, command definitions may also include definitions of alternative operations to be run when an atomic operation fails; that is, it cannot be executed either because the object states are not consistent with the purpose of the operation or because one of its constitutent atomic operations fails. These alternatives may involve calling a back up operation, or conveying the exception to the higher level. Thus, handling an exception object state means considering it as a valid state for the activation of a backup operation and carrying out the execution accordingly. This is true even if the backup operation decides to half the execution of a program, and resignal the exception to the higher level, possibly as a different exception. This process continues until the end user is reached.
To implement these mechanisms, a generic form of an exception handler looks like the following: Call P.sub.j except when e.sub.1 : Q.sub.1 e.sub.2 : Q.sub.2 e.sub.n : Q.sub.n end except
Here P.sub.j is one of the operations called by P.sub.i. Various exceptions that may be signaled by P.sub.j are e.sub.1., e.sub.2.,..., e.sub.n. the corresponding alternative operations for these exceptions are Q.sub.1., Q.sub.2.,..., Q.sub.n. Any of these alternative operations may either consist of a procedure call, having a structure similar to the one above, or a statement such as signal (e'),
which causes P.sub.1 to resignal an exception e to its caller. By repetitive resignaling of exceptions, an exception at the lowest level of abstration can be propagated to the user's level.
Some programming language such as Ada or CLU provide mechanisms to implement this exception-handling methodology. For other programming languages, it is possible to simulate this mechanism using the language features provided. In the desk calculator program, written in Pascal, the above exception-handling mechanism was simulated using CASE statements.
How does all this relate to the LTAI strategy? We have already seen that a context stack and an error list must be maintained in the proposed model of Figure 3. Now the task of propagating errors, managing the context stack, and constructing the error list may be carried out by the application program itself. In the case of an error, the context stack and the error list may be passed to the message generator to be used in a dialogue with the user. The task of error propagation does not demand any extra work by the programmer since reliable programs must always contain mechanisms for dealing with errors. Nevertheless, the tasks of maintaining the context stack and constructing the error list are extra tasks that need to be carried out by the application program.
Rather than the programmer writing extra code for these tasks, it is possible to modify the compiler so that it is done automatically: For every procedure in the system, the compiler can insert a PUSH statement, as the first executable statement, that will push the name of the procedure to the context stack. Similarly, a POP statement can be inserted as the last executable statement of every procedure. This takes care of the task of maintaining the context stack.
To construct the error list, the compiler can again insert appropriate code, so that, every time an error is signaled to the higher level, the name of the error is inserted on the error list. Once such a compiler is implemented, the message generator becomes compatible with any program that is compiled by that compiler. One of my students is now extending a small subset of Pascal, called Pascal-S to incorporate these extensions.
SUMMARY AND FURTHER RESEARCH AREAS
The problem of levels in error-message generation is important, since messages generated by lower level operations are often of little help to the user. Based on Lewis and Norman's recommendation , a model has been developed that enables the user to trace down to the levels of program execution to find out the source of the error. The proposed model is essentially based on two concepts: (1) atomic operation, which is the foundation of a technique to locate, detect, and handle errors; and (2) error abstractions, which can be used to tailor the semantics of errors to the user. Based on these concepts, implementation of a prototype message generation system is described.
One important advantage of the proposed model is its simplicity. My undergraduate students were able to implement it as a programming assignment. Also, the possible number of dialogues increases exponentially as the number of entries in system description increases linearly. This means that the relatively small number of phrases are combined in various ways to obtain many different sentences in a large number of possible dialogues. Therefore, the memory space used by system descriptions is well justified.
There are important problems that require further research. One of these is the development of generalized protocols for user-computer dialogues in an LTAI strategy. In the prototype system, a dialogue can be achieved based on changing the focus by using the two different queries, why? and then?. No experimental data, however, are provided to test the preferability of alternative or additional protocols.
Another important problem is how to handle messages received from the lower level system where the user's application package is embedded. In an actual implementation, it may be necessary to suspend these messages, or alter their meaning to the user level, or generate a message such as the one on page 14. The on-line decision of which course of action to select, as well as its implementation, is an important problem.
Closely related to error-message generation is the problem of partially executed commands. Conceptually, the intention is served entirely by the set of lower level operations invoked by the user command. In the case of an error, the intention may have been served only partially, and it may be necessary to reverse the partial effects caused by lower level operations. Tools provided to reverse the effect of partially executed commands would be of invaluable aid to the user.
Finally, on-line help facilities that enable the user to understand how to fix the system, after the basic problem is clarified, are necessary. The user should be given the option to start a dialogue, through which the system can suggest alternative courses of action to reach the intended goal. On-line generation of explanation text that shows what the system can and cannot do is presented in . Much more research, however, is needed before a satisfactory set of tools is available to the user of interactive software.
|Printer friendly Cite/link Email Feedback|
|Publication:||Communications of the ACM|
|Date:||Nov 1, 1987|
|Previous Article:||Markup systems and the future of scholarly text processing.|
|Next Article:||The partial metrics system: modeling the stepwise refinement process using partial metrics.|