Software failure: counting up the risks.When Boeing's new 777 airliner first takes to the skies in a few years, computers will control such crucial functions as setting flaps and adjusting engine speed. Electrical circuits will relay a pilot's actions to these computers, where complicated programs will interpret the signals and send out the instructions necessary for carrying out the appropriate maneuvers. Pilots will no longer fly the aircraft via direct electrical and mechanical controls, except when using an emergency backup system Noun 1. backup system - a computer system for making backups ADP system, ADPS, automatic data processing system, computer system, computing system - a system of one or more computers and associated software with common storage . Because of the disastrous consequences of even a single fault, the software for such a computer system must be extremely reliable. A new analysis, however, demonstrates that testing complex software to estimate the probability of failure cannot establish that a given computer program actually meets such high levels of reliability. The analysis also affirms that using multiple programs, which independently arrive at an answer to a given problem. doesn't necessarily guarantee sufficiently high reliability. "This leaves us in a terrible bind," say Ricky W. Butler and George B. Finelli of the NASA NASA: see National Aeronautics and Space Administration. NASA in full National Aeronautics and Space Administration Independent U.S. Langley Research Center Langley Research Center (LaRC) Oldest of NASA's field centers, LaRC is located in Hampton, Virginia and directly borders Poquoson, Virginia and Langley Air Force Base. LaRC focuses primarily on aeronautical research, though the Lunar Lander was flight-tested at this facility and a in Hampton, Va., the computer scientists who performed the analysis. "We want to use digital processors in life-critical applications, but we have no feasible way of establishing that they meet their ultra-reliability requirements." In a paper presented last week in New Orleans New Orleans (ôr`lēənz –lənz, ôrlēnz`), city (2006 pop. 187,525), coextensive with Orleans parish, SE La., between the Mississippi River and Lake Pontchartrain, 107 mi (172 km) by water from the river mouth; founded at the Association for Computing (body) Association for Computing - (ACM, before 1997 - "Association for Computing Machinery") The largest and oldest international scientific and educational computer society in the industry. Machinery's conference on software for critical systems, they argue: "Without a major change in the design and verification methods used for life-critical systems A life-critical system or safety-critical system is a system whose failure or malfunction may result in:
Many military aircraft and the European-built A320 airliner already use computer-controlled "fly-by-wire" systems. Computers also play important roles in medical technology, transportation systems, industrial plants, nuclear power stations This is a list of major nuclear power plants in all countries in the world. This is an incomplete list. You can help Name of power station Installed capacity in MW Country Atucha I nuclear power plant 357 Argentina and telephone networks - realms in which a software failure can cause tragedy (SN: 2/16/91, p.104). "I think this is ... an important paper," says David L. Parnas, a computer scientist at McMaster University McMaster University, at Hamilton, Ont., Canada; nondenominational; founded 1887. It has faculties of humanities, science, social sciences, business, engineering, and health sciences, as well as a school of graduate studies and a divinity college. in Hamilton, Ontario. "Its very convincing and provides a lot of insight." The traditional method of determining the reliability of a light bulb or a piece of electronic equipment involves observing the frequency of failures among a sample of test specimens operated under realistic conditions for a predetermined pre·de·ter·mine v. pre·de·ter·mined, pre·de·ter·min·ing, pre·de·ter·mines v.tr. 1. To determine, decide, or establish in advance: period of time. Using these data, engineers can estimate failure probabilities of not only individual components but also entire systems. Unlike hardware, however, software doesn't wear out or break. "Software errors are the product of improper human reasoning," Butler says. Unless they are caught, software errors persist throughout a system's lifetime. That makes conventional methods of risk assessment difficult to apply. The problem is further compounded by the high degree of reliability required for life-critical applications. Historically, manufacturers of aircraft and other systems in which faults could threaten human lives have accepted a reliability level that corresponds to a failure rate of about 1 in a billion for every hour of operation. Butler and Finelli demonstrate that techniques often used by computer scientists and programmers to quantify software risk take too long to be practical when used to assess systems that require such high reliability. For example, software design often involves a repetitive cycle of testing and repair, in which the program is tested until it fails. Testing resumes after the cause of failure is determined and the fault repaired. But it generally takes longer and longer to find and remove each successive fault. To establish that a complicated computer program presents minimal risk would require year, if not decades, of testing on the fastest computers available. Butler says. In an attempt to reduce the risk of failure, computer-system designers sometimes use multiple versions of a program, written by different teams, to perform certain functions. The idea is that although each version may contain flaws, it's highly unlikely that all or even a majority of the programs would contain the same error. However, experiments have shown that computer programs independently written to do the same thing often contain surprisingly similar mistakes. Many computer experts at last week's meeting pointed to these findings as evidence that limits should be placed on the complexity of computer programs that go into life-critical applications. "Do we want to run with systems that are not as demonstrably de·mon·stra·ble adj. 1. Capable of being demonstrated or proved: demonstrable truths. 2. Obvious or apparent: demonstrable lies. safe as we say they are ... when we cannot demonstrate ultra-reliability before development?" asks Martyn Thomas of Praxis prax·is n. pl. prax·es 1. Practical application or exercise of a branch of learning. 2. Habitual or established practice; custom. plc, in Bath, England. "We should build only those systems that rely on software to a degree that can be assessed," contends Bev Littlewood of City University in London, England. That means accepting a higher risk or building simpler computer systems. A few remain optimistic op·ti·mist n. 1. One who usually expects a favorable outcome. 2. A believer in philosophical optimism. op . "Maybe we're being a lot more demanding than we need to be," says John D. Musa of AT&T Bell Laboratories in Murray Hill Murray Hill may refer to one of the following places:
He adds that software developers have a variety of tools and techniques that can help them deliver - if not assess - highly reliable systems. |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion