DEBL: a knowledge-based language for specifying and debugging distributed programs.
The semantics of a distributed program can be separated into two parts: inter-pricess communication (IPC) and individual process semantics. The IPC is regarded as a more difficult but little understood area since each process by itself is just a sequential program.
The IPC aspect of distributed programs is our focus. Our goal is to have the debugging procedure automatically and distributively executed at each local computer system, and have execution error(s) isolated to process levels. Each potential faulty process can then be examined and tested independently and concurrently. A specification language, Distributed Event-Based Language (DEBL), that allows individual process behavior to be described in terms of local IPC events is designed for this purpose. A knowledge-base construct is also integrated into it. DEBL is based on a formal temporal logic system, Temporal Event Logic (TEL), to ensure rigorous deduction. That is, the specification expression of DEBL will be transformed into a corresponding logic formula of TEL. Interpreting the specificaiton expression is equivalent to evaluating the corresponding logic formula. DEBL provides a structured and user-oriented description tool and TEL supplies a formal base. Together, they can be the inference engines of a knowledge-based system where DEBL serves as the front-end processor of TEL.
DEBL provides an event hierarchy to conform to a programmer's logical concepts. Processes are specified in terms of individual's IPC activities. Temporal logic is used for expressing time-related properties. The knowledge based has the same structure used by the user-defined process to provide a uniform view of the system structure. Instead of interactively controlling the program execution, our approach is a retrospective one. That is, we examine the consistency between the process specification and the trace of execution. No further coordination between computer systems is required. The results are forwarded to a control agent to determine possible faulty processes. Figure 1 illustrates the system structure of our distributed knowledge-based debugging system.
This article gives an overview of DEBL. The detailed description of TEL, transformation rules from DEBL to TEL, and a complete description of a distributed knowledge-based debugging system can be found in . The basic concepts, syntactic structures, and informal semantics of DEBL will be described in this article. We will also briefly discuss TEL.
OVERVIEW OF DEBL
The basic event unit of DEBL is the primitive event. Specifically, it represents an interprocess communication activity. A primitive event takes zero time to complete. Two interprocess communication mechanisms are identified in DEBL, synchronous and asynchronous IPC. For instance, a primitive event such as (S, R, msg, SR) expresses that S is the sending process and R is the receiving process. The data being transmitted is of msg type. SR indicates that this is a synchronous communication. If we have an (S, R, msg, R) instead, then it will be an asynchronous communication and is a receiving action performed by R. On the other hand, if we have an (S, R, msg, S), it will be a sending action performed by S.
Conceptually, each process is an event stream and a distributed program is a graph of these event streams. The nodes in this graph are IPC events and the arcs depict the relation of sending and receiving IPC messages. The graph may consist of several independent, connected subgraphs. For example, a computer network is divided into isolated subnetworks when communication lines break down. For a totally isolated process, i.e., one that does not have any interprocess communication with other processes, its description will not exist, or will be NULL, in our system. During program execution, interprocess communication messages will be recorded. After that, these traces of IPC messages are used for matching against specification. A Backus Normal Form (BNF) definition of DEBL is provided in the appendix.
Process and Event Specification
Each process is described in terms of its possible interaction (or events) with other processes. Specifically, a process is a tree or event sequences. the ordering among these events are; (sequence) and $ (selection). Repetition is designated with $. Events may be aggregate events or primitive events. EOP, ANY, and NULL are three special events. NULL is a null event and ANY can be substitued with any of the primitive events except NULL. This is useful when users elect to examine only certain important aspects of the program like the detection of a unique inter-process communication pattern. For example, the third and fifth IPC messages should be directed to processes p1 and p2 such as in (ANY: ANY: call-p1; ANY; call-p2; * ANY). In this case, intermediate events are of no importance. EOP indicates the event of end-of-the-process. This is a special event which is generated by the local host system when a process has reached the end of its normal execution. An exclusion operator (-) is used to exclude specific primitive events from even description. For example, *(-(E1, E2)) expressed sequences of all possible combinations of primitive events without involving E1 or E2.
Aggregate events are constructed from primitive events and previously defined aggregate events. A process's behavior is described in three respects: pattern, which contains the process's possible evet sequences, property, which has their associated properties, and clue, which relates process descriptions with the knowledge base. Interaction between processes is implicitly specified in each process. In order to adapt to an individual user's perception, DEBL supports a hierarchy of logical levels of events. High-level intuition can be reflected in the form of aggregate events. In this regard, a process is an aggregate event which in turn may consist of more aggregate events. The process's event pattern and properties need to hold throughout its existence. This structure can be refined to a desired degree.
For example, in a full-duplex computer protocol program, a communication process will be talking with its host, counter-sender, and counter-receiver processes. It may be specified in DEBL as *(from_host * from_sender * to_receiver). Each aggregate event in the specification can be further specified in finer detail. With this information, the debugging system can isolate violation(s) of specifications on different logical levels, depending on user's current perception. If from_sender is further defined as (re_from; CRC_check; seq_check; (to_host * NULL)), the focus of debugging can be narrowed down even more. If there is an error in the event from_sender, the programmer can look into the corresponding implementation of that event. By representing debugging information in an environmental that is meaningful to programmers, this hierarchy can help programmers understand and locate programming error(s).
Syntactic Structure and Informal Semantics
Each of the aggregate events has four entries: name, pattern, property, and clue. Name is a unique identification. Pattern represents the possible ordering of the events used in the current definition. There are sequence, selection, and testing (if-then-else) operators to control the event ordering. The testing part may involve inquiry of previous action and current state. Usually properties or relations can be established among different aggregate events by forming another higher-level aggregate event comprised of those particular event definitions and referencing them accordingly.
Properly is an entry to certify certain properties that need to hold in the current aggregate event. An interval, which is a sequence of primitive events, may be specified by an aggregate event definition. The interval associated with the aggregate event starts when an instance of the event begins, and the interval ends when that instance of the event concludes. The clue entry is the bridge between user's program and the knowledge base. By combining a user's program with the events provided in the knowledge base, clue becomes the means for the users to access the knowledge base. Both properly and clue entries can be used as either a validity checker or an error identifier. The validity checker will check for the satisfaction of the formula, and the error identifier can be used to signal the existence of a particular error. If [omega.sub.1] expresses something that must hold (e.g., a liveness properly that the process eventually terminates) and [omega.sub.2] expresses a particular type of possible error (e.g., a safety property indicating which processes are not deadlocked), we can then form an expression like [omega.sub.1] [and] [is approx.] [omega.sub.2]. If [omega.sub.1] is not satisfied, we know the property was violated. If [is approx.] [omega.sub.2] is not satisfied, then we know the error did occur.
Conceptually, every object we are dealing with in DEBL is an interval. A process is an interval, a trace is an interval, and an aggregate event is also an interval. We describe an aggregate event, or the interval it represents, in three respects. The pattern entry specifies the possible event patterns of an aggregate event. It despicts a coarse picture of an aggregate event. The property entry then refines this coarse picture with additional qualification, i.e., the necessary properties of the interval, which are temporal formulas. Finally, the clue entry adds more restriction to further qualify the aggregate event by asserting properties of the knowledge base.
Temporal Event System (TEL) is a predicate temporal logic system that employs a linear time, event-based model and is based on the interval concept. Time is modeled as a series of discrete events. TEL is a modified version of System DX [5, 10] which is a linear time, state-based temporal logic system, with the interval concept carried over from Tempura , an interval temporal logic system.
Interval temporal logic has been widely discussed and applied tp various areas (e.g., [9, 11]), and it is a natural scheme for our system as well. We mix temporal logic with DEBL to enhance the expressive power of DEBL to express certain important temporal properties like eventually and infinetely frequent which are impossible to express in other languages like EDL , ECSP  or TSL [L]. In DEBL, aggregate events are interpreted with respect to intervals. Each instance of an aggregate event forms an interval or "time zone." All the properties and temporal specifications are checked against events within this interval. Nested time intervals are a natural implication of DEBL's aggregate event structure.
Several internal temporal operators are used such as *(next), *(eventually), *(repeat match), and *(always). These operators are necessary for expressing time-related properties and for having greater expressive power. Precise definitions of these temporal operators are given in . Only an intuitive appeal of these temporal operators is explained here. * is used to specify the property that should be true after the first p-event of an interval. * is used to state that a subinterval of the current interval satisfies the property specified by it. * can be used to express that the property specified will be true repeatedly in all the subintervals of one possible permutation of the current interval. * is used to specify the property that will be true in all possible subintervals of the current interval.
AN EXAMPLE: READER/WRITER PROBLEM
In this example, we have assumed a reliable, order-preserving communication link between processes. This can always be accomplished through providing "lower layers" of software to carry out the task. Within this example, delimiters such as () and  are used interchangeably, when there is no ambiguity, in the definition of aggregate events.
Description of the Example
We have two readers R1 and R2, two writers W1 and W2, one common buffer B, and one controller C. Each is a process that may run on different processors. In this example, synchonous communication is assumed as the IPC mechanism.
R1, R2, W1, and W2 are all to access the common buffer B. To ensure a consistent state of data in B, coordination between potential users is necessary. To achive this goal, R1, R2, W1, and W2 must check with the controller C before accessing B. C enforces its policy of coordination by accepting the messages from the desired process. The policy is as follows: writers are mutually exclusive of each other, and likewise for the reader and writer. However, R1 and R2 are allowed to access B together.
Ports (entries) of the processes are start-read, end-read, start-write, end-write, read, and write. The basis data unit exchanged between processes is msg. In process C, an aggregate event Conds together with other are defined to alleviate complexity and to provide a more intuitive and hierarchical description.
The logical expression at the clue entry in event Conds represents certain conditions that need to be validated. These conditions depend mainly on the characteristics of individual events.
Events of the Example
In this example, 12 primitive events are defined, all of which are synchronous message exchanges. For example, E11 means process R1 sends a message to port start-write of process C synchronously and E7 means that process W1 sends a message to port write of process B with the same scheme.
Primitive Events Definition
E11: (R1, C.start-read, msg, SR) E12: (R1, C.end-read, msg, SR) E21: (R2, C.start-read, msg, SR) E22: (R2, C.end-read, msg, SR) E31: (W1, C.start-write, msg, SR) E32: (W1, C.end-write, msg, SR) E41: (W2, C.start-write, msg, SR) E42: (W2, C.end-write, msg. Sr) E5: (R1, B.read, msg. SR) E6: (R2, B.read, msg, SR) E7: (W1, B.write, msg, SR) E8: (W2, B.write, msg, SR)
R1: *[E11; *E5; E12] R2: *[E21; *E6; E22] W1: *[E31; *E7; E32] W2: *[E41; *E8; E42] B: *[E5 ~ E6 ~ E7 ~ E8] C: *[W1C ~ W2C ~ *Conds]
where events R1C, W1C, Conds are defined in Figures 2, 3, and 4 respectively, R2C and W2C have definitions similar to R1C and W1C. mutex, which appears in the event Conds, is provided by the knowledge base and is used for automating properties checking when faul detection and localization procedure is carried out. Its purpose in this case is to verify that the events of interest are mutually exclusive during the lifetime of a current event instance. In Conds, countg is a function that returns the number of times of a specific event which has occurred.
COMPARISON TO OTHER WORK
The aggregate event construct in DEBL is motivated by EDL  and has a similar structure. Using externally observable events to model program behaviors approach has been used in many systems (such as CCS), but our primitive event format is based on . In , interval temporal logic was applied to debugging concurrent software. However, its query language is limited in expressive power and lacks the knowledge-base construct. Gupta  advocates localizing erro(s) to process(es) level for practical reasons that we think are valid in large-scale software systems.
The contribution of DEBL is that it can provide a uniform, rigorous, and programming-language-independent environment for distributed program specification and debugging. Debugging is carried out distributively. Programs can be specified by programmers using DEBL or readily available knowledge-base events. This knowledge base is integrated into DEBL to assist programmers developing program specification or help in debugging systems to locate programming error(s). Its structure is the same for specifying userhs programs. This supports a consistent view of both DEBL and knowledge-base structures which should make knowledge bases more available to users. The rigorous deduction power of DEBL will help automate processing the trace data. We believe DEBL can be a descriptive tool to provide a practical but still rigorous environment for developing distributed programs.
1. Baiardi, F., de Francesco N., and Vaglini, G. Development of a debugger for a concurrent language. IEEE Trans. Softw. Eng., SE-12, 4 (Apr. 1986), 547-553.
2. Bates, P.C., and Wileden, J.C. High-level debugging of distributed systems: The behavioral abstraction approach. J. Syst. Soft. 3, Elsevier Science Publishing Co., 1984, 255-264.
3. Cheng, W.S. A knowledge-based system for debugging distributed programs, Ph. D. dissertation, Department of Computing and Information Sciences, Kansas State University, 1988.
4. Gupta, N.K., and Serviora, R.E. An expert sysstem approach to real time system debugging. In IEEE Proceedings of the 1st Conference on A.I. Applications (Denver, Colo., Dec. 5-7, 1984), 336-343.
5. Hailpern, B.T. Verifying concurrent processes using temporal logic. Lecture Notes in Computer Science, 129, Springer-Verlag, 1982.
6. Helmbold, D., and Luckham, D. TSL: Task sequencing language. In Proceedings of the ADA International Conference (Paris, France, 1985).
7. Lamport, L. Time, clocks, and the ordering of events in a distributed sysstem. commun. ACM 21, 7 (July 1978), 558-565.
8. LeDoux, C.H. A knowledge-based system for debugging concurrent software, Ph. D. dissertation. University of California, Los Angeles, March 1986.
9. Moszkowski, B.C. Executing Temporal Logic Programs. Cambridge University Press, 1986.
10. Pnueli, A. The temporal semantics of concurrent programs, Semantics of Concurrent Computation. G. Kahn, Ed., Lecture Notes in Computer Science 70, Springer-Verlag, 1979.
11. Schwartz, R.L., and Melliar-Smith, P.M. An Interval Logic for High-Level Temporal Reasoning. In Proceedings of the 2nd Annual ACM Symposium on Principles of Distributed Computing (Montreal, Canada, August 17-19, 1983), pp. 173-186.
CR Categories and Subject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging--debugging aids, tracing; D.3.3 [Programming Languages]; Language Constructs--concurrent programming structures; D.4.7 [Operating Systems]: Organization and Design--distributed systems; I.2.4 [Artificial Intelligence]: Knowledge Representation Formalism and Methods--prediate logic, representation languages
General Terms: Design, Languages
Additional Key Words and Phrases: Hierarchical structure, logic
WAN-HONG S. CHENG is a member of the technical staff at AT&T Bell Laboratories. He has been involved in network design and development. His current research interests include data communications, computer networking, and distributed systems. Author's Present Address: AT&T Bell Laboratories, RM 1E-424; Crawfords Corner Road, Holmdel, NJ 07733.
VIRVIL E. WALLENTINE is a professor and department head of the Department of Computing and Information Sciences at Kansas State University. His research interests include operating systems, distributed systems, computer networks, and concurrent programming languages. Authors' Present Address: Department of Computing and Information Sciences, Kansas State University, Manhattan, KS 66506.
|Printer friendly Cite/link Email Feedback|
|Author:||Cheng, Wan-Hong S.; Wallentine, Virgil E.|
|Publication:||Communications of the ACM|
|Date:||Sep 1, 1989|
|Previous Article:||Automatic determination of grain size for efficient parallel processing.|
|Next Article:||Using a global name space for parallel execution of UNIX tools.|