N-fold inspection: a requirements analysis technique.
The URD plays a key role in the success or failure of its development. In paradigm, exploratory, or mission-critical, undetected faults in the URD can propagate through the life cycle with a result in of major recoding efforts or poor system integrity. The number one item of Boehm's top-ten list on metric says:
Finding and fixing a software problem after delivery is 100 times more expensive than finding and fixing it during the requirements and early design phases.
This insight has been a major driver in focusing industrial software practice on thorough requirements analysis and design, on early verification and validation, and on up-front prototyping and simulation to avoid costly downstream fixes .
Figure 1 shows how faults can be introduced and propagated at every phase of software development (i.e., during specification, design, and coding). Clearly, faults in products of early phases have significant impact on software development and need to be detected as quickly as possible. According to Fagan:
While defects become manifest in the end product documentation or code, most of them are actually injected as the functional aspects of the product and its quality attributes are being created; [SIC] during development of the requirements, the design and coding, or by insertion of changes .
Several methods can aid the detection of URD faults. These methods may apply only to certain phases of development or may imply an overall methodology precluding the use of certain phases. Four methods of reducing URD faults are listed here.
(1) Formal specification. A formal specification, such as VDM , can be developed to determine the consistency and completeness of the URD, but its primary function is to provide a formal baseline for downstream software development, such as design, coding, and testing. Automated tools such as RSL/REVS  can assist some of these tasks. Formal specification may expose some URD faults as the URD is translated into a formal specification, but undetected faults will be cast into a rigid document that may actually hinder eventual detection. Developing a formal specification directly from the URD is expensive, and it requires detecting the URD faults at the same time.
(2) Design and Implementation. This phase can proceed directly from the URD, bypassing requirements inspection and formal specification. In other words, downstream development activities can be used to detect URD faults. But, design and implementation used in this context are not cost-effective in detecting URD faults, as indicated in Boehm's number one item.
(3) Prototyping. Building a software prototype directly from the URD will help determine its feasibility and soundness. Since a prototype is usually an incomplete system, its usefulness is limited to detecting faults in those fragments of the URD where feasibility is an issue. Prototyping is probably best used together with other development techniques.
(4) Formal Inspection of URD. Before software development begins, the URD is examined manually using inspections. URD inspections give software engineers a chance to understand the URD and to detect its faults before the specification phase. If feasibility is an issue, URD inspections and prototyping can be done simultaneously. For mission-critical software, a formal specification is also developed to ensure reliability.
Inspection is a method of checking that can be performed on any of the intermediate products of software development (i.e., the URD, specification, design, or code). Fagan  defines an inspection as a formal process which is usually done manually by inspection teams. Claims are that it is an effective testing technique. For example, item nine on Boehm's top-ten list states:
Walkthroughs catch 60 percent of the errors.
The structured walkthrough (software inspection) has been the most cost effective technique to date for eliminating software errors. It also has significant side benefits in team building and in ensuring backup knowledge if a designer or programmer leaves the project .
Proponents of inspections emphasize its use on software design and coding. Code inspection methodologies are well developed. Specifically, checklists for code inspection are widely available, and the code author can readily participate in the inspection process. URD inspection is different. Checklist items are limited to consistency, completeness, and functionality [5, 7, 15]. The URD developer may be available on only a limited basis. Thus, URD inspection may not find URD faults as effectively as code inspection finds coding faults.
N-Fold inspection uses formal inspections but replicates these inspection activities using N independent teams. The same software artifact (e.g., a URD or functional specification document) is given to all N teams. An appointed moderator supervises the efforts of all N teams. Each team performs formal inspection using a checklist and analyzes the software artifact . Several teams may identify the same fault, but the moderator gathers all results of the independent inspection efforts and records each fault once in a database. After the faults are collected, they can be used to rework a new version of the software artifact.
We suggest using N-Fold inspection for the initial phase of software development for mission-critical software. N-Fold inspection is designed to detect faults in the URD as early as possible, indirectly increasing the reliability and quality of the URD's implementation. The primary use of N-Fold inspection is to identify faults that might not be detected by a single inspection team. This article discusses URD inspection using the N-Fold technique, but this technique can also be used during other phases of development,such as specification, design, or coding.
THE PILOT STUDY
To better understand N-fold inspection, we conducted a pilot study observing ten independent software development teams doing requirement analysis. Each conducted formal inspections on the same URD, a description of the requirements of a real-time railroad traffic control system. To learn how teams locate URD faults, we collected data about the number and kinds of faults detected by each team. Specifically, we wanted to know how much the complexity of a fault hindered its detection, how many faults a single team could find, and especially, the increase in fault detection that is gained by using more than one team to inspect the URD.
While this study represents only one case, the results are encouraging. Faults detected by different teams overlapped little, and the number of URD faults found by any one team was small compared to the total number of faults detected. Thus, N-Fold inspection is effective in locating URD faults early, since two or more teams doing URD inspection uncover significantly more URD faults than a single team.
The study participants were: the user who developed the URD; a moderator who is responsible for coordinating the efforts of the team and collecting data; and forty software engineers organized into ten independent development teams.
In addition to monitoring the progress of each development team, the moderator served as a liaison between the user and all ten development efforts as shown in Figure 2. The moderator was also responsible for collecting the study data. The team members were formed from a mix of graduate and undergraduate students; half of them have worked professionally in software development. The user's primary duty was to develope the URD but was also available on a limited basis for meeting the teams.
Each team carried out formal inspection of the URD. Using techniques similar to those described by Fagan [11, 12], teams read the URD with a critical eye to detect faults. Each team looked for URD faults, assessed feasibility, and tried to arrive at an understanding of the user's needs. Team inspection meetings helped to identify glaring faults in the URD, and additional meetings with the user unearthed more faults. At the end of the inspection phase, each team delivered a document that described faults in the URD to the moderator. To assure the independence of the team's efforts, the moderator and the user were instructed to meet the teams individually, without discussing the efforts of other teams.
The User Requirement Document (URD)
During the synthesis of the URD for our study, we privately consulted with two major vendors (Digital Concepts Inc. and Union Switch and Signal) of railroad centralized traffic control (CTC) software.
The URD used in this study is modeled after a request for computer automation of a typical North American railway. (1) The document is written in English by a user trained in railroad engineering. The hardware functionality is shown in Figure 3. The author of the URD is experienced in railroad traffic control technology and has served as a project leader in the development of a major railroad CTC system. The user was asked to deliver a reliable document, which he believes contains at most only one or two minor faults. (2)
Each team assembled a list of URD faults as they were detected. The moderator met with a representative from each team to spend time validating each of the faults discovered. The moderator catalogued each URD fault in a database that contained the following information:
(1) A description of the fault (a short paragraph)
(2) URD cross-referencing information (line and page number),
(3) Miscellaneous classification information (fault repair, fault type,
severity), (3) and
(4) Which team(s) found the fault and what method(s) were used to find the fault (e.g., individual reading, team inspection meeting, or meeting with the moderator).
The database was used for coordinating version control among the ten working copies of the URD (one per team). Each team was given a response to the faults it reported but received no information regarding faults reported by other teams. At the end of the inspection phase (i.e., at the end of the study) each of the ten teams had compiled its own updated version of the URD.
Validating the Data
After URD inspection, faults detected by all ten teams were catalogued. It was not clear, however, whether or not all faults in the URD had been found. To find as many of the remaining URD faults as possible, we asked the teams to continue to develop from their revised URDs using rigorous specification, design, and verification and validation (V&V). To ensure maximum likelihood of finding additional faults, both specification and design diversity  were used during these steps.
Specifically, the different methodologies use for specification were: structured analysis , structured english , MSG , and Ada PDL . For design, only ADA and MSG were used. Structured analysis is a popular specification methodology. This technique uses graphical diagrams with a data dictionary. The data dictionary is used to index and cross reference the names of objects throughout the diagrams. It provides techniques for reducing incompleteness and inconsistencies within the specification. Structured English is a less formal specification technique. Borrowing constructs from programming languages, structured English provides a uniform format for expressing system specifications in English . MSG, developed by Berzins and Gray , uses an object-based model and provides a variety of built-in-types to support the development of general purpose specifications and designs. Although Ada is intended to be used as a programming language, its major syntactic elements provide a skeletal framework for encoding specifications and designs supporting step-wise refinement. For a description of Ada and its use in formal specification, see  and . Each team performed rigorous independent verification after the formal specification phase and again after the design phase. Each specification and design methodology was used in the framework of clearly delineated phases. The URD faults found by each team during specification and design were catalogued in addition to the URD faults found during inspection. (4)
It is possible that undetected faults still remain in the URD. By keeping track of URD faults that were discovered after the requirements inspection phase, later phases could be compared with the inspection phase.
STUDY DATA ANALYSIS
This section examines N-Fold inspection in light of the study data we collected. Three phases of URD fault detection are recorded in the study data: inspection, specification, and design. The following analysis and discussion give insight to the nature of faults teams found during the inspection as compared with other phases of development.
Looking at the collective efforts of all ten teams, most URD faults were found during the inspection phase. A total of ninety-two URD faults were identified collectively by all ten teams after the inspection, specification, and design phases. Of these ninety-two faults, seventy-seven were found by at least one team during the inspection phase. The grand totals indicate that inspection can locate most URD faults. Unfortunately, each individual team's performance does not compare favorably with this.
A single team found very few URD faults compared with the total number of faults found during URD inspection. During the inspection phase, the average team found only twenty-five faults. Figure 4 shows the number of faults found by each team during inspection.
Proceeding with software development after only one team does a URD inspection is inviting disaster. Sixty-three known URD faults remain to be detected during downstream phases. These faults are cast into a rigid specification document and now must be detected during the specification and design phase lest the faults become incorporated into the code.
The sparsity of URD faults found by a single team deserves further investigation. In this study, not many teams found the same fault. Figure 5 is a scatter plot, showing the number of teams that found each fault. On the average, fewer than four of the ten inspection teams identify the same fault. It is precisely this difficulty which N-Ford inspection tries to overcome. The biased detection manifested by the inspection teams of our study is overcome by using N independent teams instead of a single team. The combined efforts of all N teams thus produces a more reliable investigation of the URD.
The Difficulty of Detecting Faults
The average team is most successful finding URD faults during inspection, yet some faults are found until the specification of design phase. This implies inspection is good but not good enough. We need to address the reason that these faults are not detected during the inspection phase:
(1) Fault Complexity. Some URD faults are obvious mistakes and are easy to detect immediately, while other faults may involve complex interactions. The ramifications of more complex faults may not be discovered through static checking. Formal specification and design provide the rigor necessary to bring these faults to surface.
(2) Inspection Deficiency. The team inspection process is flawed. A team's demographic features or environmental factors may prevent the team from performing a complete URD inspection. A team may have an inherent bias for detecting only certain types of faults or only faults occurring in some subset of the URD.
Fifteen of the total 92 faults fall into the first category. These faults were found by some teams during specification and design but were not detected by any team during URD inspection. These fifteen faults are indeed difficult to find. Fault complexity is a plausible explanation.
Most URD faults, however, fall into the second category. These faults were missed by some teams during inspection, yet at the same time, found independently by other teams. One cannot assume a fault is complex just because a team misses it during inspection. For each fault, Figure 6 shows the number of teams to find it both during URD inspection and after inspection. Inspection by a single team is hardly effective. Of the total faults, the average team found only 37 percent during URD inspection. Replicated inspection, using the N-Fold technique, is quite effectively since it can find faults that would otherwise only be found during later phases of development.
During the first phase, the team's efforts are directed at inspecting the URD. During later phases, the team's emphasis has shifted to creating other products, such as specification and design, which leaves less time for scrutinizing the URD. In the study, there was a substantial decrease in the number of URD faults identified after inspection. This trend is apparent even if we examine the detection abilities team by team. Figure 7 shows the number of URD faults detected by each team during each phase. During URD inspection, the average team had already located 70 percent of the URD faults it found during all three phases, i.e., URD inspection, specification, and design.
Choosing the Best Value for N
Obviously, increasing the number of URD inspection teams will result in finding more faults, but at some point adding more teams will not significantly increase the number of faults detected. Figure 8 shows this phenomenon. This graph shows the number of faults detected by N teams during URD inspection for various values of N. Using two teams on the average, 13.3 more faults were found than using an average single team. Three teams found approximately 9.5 more faults on average than two teams. Eventually, the payoff for larger values of N does not substantially increase the number of faults detected. For example, ten teams found only 2.2 more faults than the average set of nine teams. In general, choosing the best value for N will depend on many factors such as the availability and cost of additional inspection teams and the potential expense for letting a URD fault slip by undetected.
Is N-Fold Inspection Cost-Effective?
By examining the diagnostic efforts of all possible pairs of teams, we discovered that, on average, a pair of teams found more faults during one week of inspection than a single team located after inspection, specification, and design. This leads us to consider the cost-effectiveness of N-Fold inspection compared to other methodologies. Three other methodologies for reliable software construction are worth mentioning which are illustrated in Figure 9.
(1) The traditional waterfall model of development.
(2) Dual development.
(3) N-version programming.
N-Fold inspection uses N independent teams to inspect the user requirements (N to be chosen), while later phases of development are carried out by only a single team.
The traditional waterfall model is equivalent to N-Fold inspection when N is 1. A single team does URD inspection and carries out the remaining development phases.
The dual development approach was used by Ramamoorthy, et. al.,  in their software development for a nuclear reactor. Dual development uses two independent teams throughout. While the efforts of the two teams are independent, the two teams cross examine each other at the end of each development phase.
Another methodology for increased reliability is N-version programming . Avizienis uses the combined effort of N independent designs and implementations to produce fault-tolerant code. N-version programming uses replicated software and the multiple computation method for the tolerance of design and implementation faults. He also uses N independent teams, but the emphasis is on producing design and code of greater reliability.
Except for the waterfall model, each development approach uses replication during some phases of development. N-Fold inspection uses N teams initially, then only one; Dual development uses two teams through-out. N-version programming uses a single team and then N teams during design and coding.
Data in our study can be used to formulate a hypothetical model comparing these four development approaches. The model measures the number of URD faults detected againt the amount of replication used during development. Specifically, the study data has recorded the independent efforts of ten teams during the first three phases of software development: one week on URD inspection, three weeks on building a specification, and three weeks on design. This data can be used to approximate the cost-effectiveness of these methods in detecting URD faults. Calculations are based on team-weeks so that replicating development efforts (i.e., using N teams) at any phase increases team-week costs (5). N remains variable in calculating team-week costs for N-Fold inspection and for N-version programming. The team-weeks for each of the four approaches is given in Table I. (6)
Although this model does not permit measurement of the quality of the software products produced by each of the approaches, the number of URD faults detected gives a strong indication of the teams' understanding of the URD. It does not account for other sources of reliability such as design diversity, an obvious advantage of N-version programming. N-version programming and dual development may offer increased reliability which cannot be captured by our model, since we are interested only in the detection of URD faults.
Figure 10 shows the average number of URD faults detected compared with development costs for all four approaches. The conventional waterfall development is really just a special case of N-Fold inspection when N equals 1. The number of faults found by the N-fold inspection is taken by averaging all possible subsets of ten teams during inspection. The number of faults found by the N-version programming is taken by summing the average number of faults found by a single team inspection, the average number of faults found by a single team requirement specification, and the average number of faults found by N teams during design. With N equals 4, N-fold inspection finds more URD faults than dual development. When N equals 10, N-fold inspection finds almost 50 percent more URD faults than dual development, yet team-week costs are almost equal. By directing development efforts toward earlier phases, a more reliable product can be achieved for the same cost.
In software engineering, requirements inspection and analysis are the most important aspects of mission-critical software development. Our study presents a case of N-fold inspection of the URD. The data suggest it is more cost-effective in locating URD faults than other methods which concentrate on downstream activities.
The early detection of URD faults is an obvious advantage of N-fold inspection, but some indirect benefits of N-fold inspection can also be realized. In addition to producing more reliable URD at a lower development cost, more software engineers become familiar with the products developed within their organization. To reiterate Boehm's item 9, "It also has significant side benefits in team building and in ensuring backup knowledge if a designer or programmer leaves the project."
N-fold inspection has the side benefit of providing a foundation for later stages of development, and it can be used as a precursor to other downstream development activities. The N-version programming method is effective in providing reliable designs and implementations, but its success hinges on the quality of the formal specification and a reliable interpretation of the URD. Thus, one of the principal issues in N-version programming is the development of a robust specification, as outlined by Avizienis.
The most critical condition for the independence of design falts is the existence of a complete and accurate specification of the requirements that are to be met by the diverse designs. This is the "hard core" of this fault tolerance approach. Latent defects, such as inconsistencies, ambiguities, and omissions, in the specification are likely to bias otherwise entirely independent programming or logic design efforts toward related design faults... With this approach, perfection is required only at the highest level of specification... The independent writing and subsequent comparison of two specifications, using two formal languages is the next step that is expected to increase the dependability of specification beyond the present limits .
N-fold inspection can provide this type of reliability at the URD and specification level.
N-fold inspection can be useful in other phases of development, such as during the development of a functional specification or design. Specifically, one possible use of N-fold inspection in developing highly reliable software is that N-fold inspection can find many faults in the URD. Once these faults are corrected, a formal specification can be developed, and the specification can be checked for correctness using N-fold inspection. The design document can also be subjected to N-fold inspection. Further downstream development can be done by borrowing the best aspects of dual development, N-version programming, and N-fold inspection, choosing the most cost-effective techniques for each phase.
(1) There are about 200,000 miles of mainline rail in North America. Roughly one fourth (500,000 miles) uses computerized CTC. This software primarily provides a human interface function, aimed at reducing support and maintenance staff. In the U.S. there are 30 major railroads. On average, each purchases two computerized CTC systems per year. The typical system controls 240 miles of traffic and costs an average 10,700 dollars per mile. This industry is expected to spend 100 million dollars annually in 1990.
(2) The author of the URD was amazed at the number of faults detected and their severities. Seventy-seven URD faults were eventually detected during inspection. Some faults were serious enough to jeopardize system integrity or result in operation mishaps.
(3) Faults were categorized by seven types: ambiguousness, functional, interface, omission, sequencing, typographical, and inconsistency. Severities were assessed on a five-point scale. Although this information was recorded for each URD fault, analysis of the data did not provide enough statistical base to support any proposition relating to fault severity and fault types.
(4) In addition to reporting URD faults, the teams also reported defects found in their formal specification document and design document. the variety of methodologies used did not permit a quantitative comparison of this data.
(5) Teams worked part-time and were composed of four members, so in our study, one team-week equals approximately 100 person-hours.
(6) We do not consider URD faults detected during coding and later phases.
 Alford, M. SREM at the age of eight: The distributed computing design system. IEEE Comput. 18, 4 (Apr. 1985), 36-46.
 Avizienis, A. The n-version approach to fault-tolerant software. IEEE Trans. on Softw. Eng. 11, 12 (Dec. 1985), 1491-1501.
 Avizienis, A., Lyu, M. R., and Schutz, W. In research of effective diversity: A six-language study of fault-tolerant flight-control software. In Proceedings of the IEEE 18th International Symposium of Fault Tolerant Computing (Tokyo, Japan, June 27-30.) IEEE/CS, New York, 1988, pp. 15-22.
 Berzins V., and Gray, M. Analysis and design in MSG 84: Formalizing functional specifications. IEEE Trans. on Softw. Eng. 11, 8 (Aug. 1985), 657-670.
 Birrell N. D., and Ould, M. A. A Practical Handbook for Software Development. CAmbridge University Press, Melbourne, Autralia, 1985.
 Boehm, B. Industrial software metrics top-10 list. IEEE Softw. 3, 9 (Sept. 1987), 84-85.
 Boehm, B. Verifying and validating software requirements and design specifications. IEEE Softw. 1, 1 (Jan. 1984), 75-88.
 Booch, G. Software Engineering with Ada. Benjamin Cummings Publishing Co., Inc., Menlo Park, Calif., 1983.
 DeMarco, T. Structured Analysis and System Specification. Yourdon Press, New York, 1978.
 U. S. Department of Defense. Reference Manual for the Ada Programming Language. ANSI-MIL-STD-1815A-1983, 1983.
 Fagan, M. E. Advances in software inspections. IEEE Trans. on Softw. Eng. 12, 7 (July 1986), 744-751.
 Fagan, M. E. Design and code inspections to reduce errors in program development. IBM Syst. J. 15, 3 (1976), 182-211.
 Jones, C. B. Systematic Software Development using VDM. Prentice-Hall, Englewood Cliffs, N. J., 1986.
 Nielsen K., and Shumate, K. Designing large real-time systems with Ada. Commun. ACM 30, 8 (Apr. 1987), 695-715.
 Ould A., and Unwin, C. Testing in Software Development. Cambridge University Press, 1986.
 Ramamoorthy, C. V., et al. Application of a methodology for the development and validation of reliable process control software. IEEE Trans. on Softw. Eng. 7, 6 (Nov. 1981), 537-555.
 Sheil, B. Power tools for programmers. Datamation, (Feb. 1983), 131-143.
ABOUT THE AUTHORS:
JOHNNY MARTIN is a Ph.D. candidate in the Department of Computer Science at the University of Minnesota, where he served as President of the University of Minnesota Student ACM Chapter. His research interests include formal language theory, software engineering, and software development tools.
W.T. TSAI is currently an assistant professor in the Department of Computer Science at the University of Minnesota. His areas of interest are software engineering, computer security, and computer systems. Author's Present Address: Department of Computer Science, University of Minnesota, Minneapolis, MN 55455.
|Printer friendly Cite/link Email Feedback|
|Author:||Martin, Johnny; Tsai, W. T.|
|Publication:||Communications of the ACM|
|Date:||Feb 1, 1990|
|Previous Article:||SNA and OSI: three strategies for interconnection.|
|Next Article:||ACM news.|