Printer Friendly

Software failure: counting up the risks.

When Boeing's new 777 airliner first takes to the skies in a few years, computers will control such crucial functions as setting flaps and adjusting engine speed. Electrical circuits will relay a pilot's actions to these computers, where complicated programs will interpret the signals and send out the instructions necessary for carrying out the appropriate maneuvers. Pilots will no longer fly the aircraft via direct electrical and mechanical controls, except when using an emergency backup system.

Because of the disastrous consequences of even a single fault, the software for such a computer system must be extremely reliable. A new analysis, however, demonstrates that testing complex software to estimate the probability of failure cannot establish that a given computer program actually meets such high levels of reliability.

The analysis also affirms that using multiple programs, which independently arrive at an answer to a given problem. doesn't necessarily guarantee sufficiently high reliability.

"This leaves us in a terrible bind," say Ricky W. Butler and George B. Finelli of the NASA Langley Research Center in Hampton, Va., the computer scientists who performed the analysis. "We want to use digital processors in life-critical applications, but we have no feasible way of establishing that they meet their ultra-reliability requirements."

In a paper presented last week in New Orleans at the Association for Computing Machinery's conference on software for critical systems, they argue: "Without a major change in the design and verification methods used for life-critical systems, major disasters are almost certain to occur with increasing frequency."

Many military aircraft and the European-built A320 airliner already use computer-controlled "fly-by-wire" systems. Computers also play important roles in medical technology, transportation systems, industrial plants, nuclear power stations and telephone networks - realms in which a software failure can cause tragedy (SN: 2/16/91, p.104).

"I think this is ... an important paper," says David L. Parnas, a computer scientist at McMaster University in Hamilton, Ontario. "Its very convincing and provides a lot of insight."

The traditional method of determining the reliability of a light bulb or a piece of electronic equipment involves observing the frequency of failures among a sample of test specimens operated under realistic conditions for a predetermined period of time. Using these data, engineers can estimate failure probabilities of not only individual components but also entire systems.

Unlike hardware, however, software doesn't wear out or break. "Software errors are the product of improper human reasoning," Butler says.

Unless they are caught, software errors persist throughout a system's lifetime. That makes conventional methods of risk assessment difficult to apply.

The problem is further compounded by the high degree of reliability required for life-critical applications. Historically, manufacturers of aircraft and other systems in which faults could threaten human lives have accepted a reliability level that corresponds to a failure rate of about 1 in a billion for every hour of operation.

Butler and Finelli demonstrate that techniques often used by computer scientists and programmers to quantify software risk take too long to be practical when used to assess systems that require such high reliability. For example, software design often involves a repetitive cycle of testing and repair, in which the program is tested until it fails. Testing resumes after the cause of failure is determined and the fault repaired.

But it generally takes longer and longer to find and remove each successive fault. To establish that a complicated computer program presents minimal risk would require year, if not decades, of testing on the fastest computers available. Butler says.

In an attempt to reduce the risk of failure, computer-system designers sometimes use multiple versions of a program, written by different teams, to perform certain functions. The idea is that although each version may contain flaws, it's highly unlikely that all or even a majority of the programs would contain the same error. However, experiments have shown that computer programs independently written to do the same thing often contain surprisingly similar mistakes.

Many computer experts at last week's meeting pointed to these findings as evidence that limits should be placed on the complexity of computer programs that go into life-critical applications. "Do we want to run with systems that are not as demonstrably safe as we say they are ... when we cannot demonstrate ultra-reliability before development?" asks Martyn Thomas of Praxis plc, in Bath, England.

"We should build only those systems that rely on software to a degree that can be assessed," contends Bev Littlewood of City University in London, England. That means accepting a higher risk or building simpler computer systems.

A few remain optimistic. "Maybe we're being a lot more demanding than we need to be," says John D. Musa of AT&T Bell Laboratories in Murray Hill, N.J. "There are risks in everything we do in engineering."

He adds that software developers have a variety of tools and techniques that can help them deliver - if not assess - highly reliable systems.
COPYRIGHT 1991 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1991, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Author:Peterson, Ivars
Publication:Science News
Date:Dec 14, 1991
Words:815
Previous Article:Waving a red flag against melanoma.
Next Article:'Tis the season for an El Nino warming.
Topics:


Related Articles
New help in cleaning up your machines' hydraulic fluid.
How auditors can detect financial statement misstatement.
PROFESSIONAL SERVICES: HOW TO SELL TRUST.
Picking a business partner.
Directors: The Enterprise SAN Building Blocks.
POWER OUTAGE SNARLS SCV; DEPUTIES, COURTS, DETENTION CENTER LEFT SCRAMBLING BY LOSS OF POWER.
Felonious housekeeping?
Thrombocytopenia and acute renal failure in Puumala hantavirus infections.
Increasing process reliability in fine-pitch wire bonding: a 2-year study identifies close ties between capillary performance and bonding failures.
Viewpoint--networks first.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters