Printer Friendly

Phone glitches and other computer faults.

The widespread disruption of local telephone service in three separate areas last week serves as a dramatic example of what can unexpectedly go wrong in complex, computer-controlled systems. These incidents demonstrate how a seemingly minor, isolated fault can trigger a massive system failure when system designers, developers and operators fail to anticipate all of the problems that can possibly occur.

In some instances, a few minor problems can add up to a huge disaster, says computer scientist Peter G. Neumann of SRI International in Menlo Park, Calif. "We may think we are dealing with independent problems, but a confluence of unrelated or seemingly unrelated events . . . may cause the system to behave in an unanticipated way."

Neumann presented his nominations for the "computer-related risk of the year" at last week's conference on computer assurance (COMPASS '91), held at the National Institute of Standards and Technology (NIST) in Gaithersburg, Md.

As an example, Neumann cited the Feb. 25 failure of a Patriot missile battery to track and intercept and Iraqi-launched Scud missile, which subsequently struck a warehouse used as a barracks for U.S. forces at Dhahran, Saudi Arabia. The fault was traced to a0.36-second error in the timing of a software-driven clock used for tracking the incoming missile. That error was large enough to prevent the system's radar from locking on to its target.

But several other factors exacerbated the problem. The original specifications for the Patriot were based on the assumption that the system would operate continuously for no more than 14 hours before being shut down for maintenance. With that in mind, its computer programmers decided to use a set of equations for the tracking clock that happened to produce answers with an error of a millionth of a seond per second. Such an error would normally pose no serious difficulties--if the system were shut down and reset after 14 hours.

But the crew operating the Dhahran battery didn't know about the accumulated clock error. By the time the Scud missile appeared, they had been running the system for 100 hours without any discernible problems. By then, however, the clock error had grown large enough to make a difference, especially for tracking a missile traveling at six times the speed of sound.

Ironically, U.S. Army personnel had identified the timing problem a week earlier and produced a corrected software ware cassette. The new cassette, delivered first to Patriot batteries close to the Iraqi border, didn't arrive in Dhahran until the day after the Scud attack. Meanwhile, no one had warned the Patriot crew of the fault.

"A combination of circumstances led to the deadly results," Neumann says. "A lot of the things that we think of as multiple events are actually correlated in some hidden way that nobody recognizes."

Last week's disruption of telephone service in Washington D.C., and neighboring states apparently started in a single, faulty circuit board at a computer facility in Baltimore, one of four centers used by the Chesapeake & Potomac Telephone Companies to route local calls. By itself, the circuit-board failure wasn't particularly serious, but it somehow triggered a catastrophic response in the computer software running the system.

Similar service disruptions also occurred on the same day in the Los Angeles areas and a few days later in Pittsburgh and western Pennsylvania. In all three cases, the local telephone systems failed to cope with the effects of suddenly overloaded telephone lines.

Although no one has yet identified the precise causes of these telephone-system collapses, investigators are centering their attention on possible software glitches in Signaling System 7, a recently installed, sophisticated communications system for telling telephone switches over which routes to send calls. AT&T's long-distance telephone network has an equivalent but unrelated communications system for routing calls between switches in different parts of the country (SN: 2/16/91, p.104).

Some computer experts suggest that even more instances of telephone-system disruptions and other computer-related failures will occur in the future. "The number of problems is increasing," says Dolores R. Wallace of NIST. "We know there are a lot of things out there waiting to happen."

Neumann agrees. "We can't seem to learn the lessons well enough to avoid running into the same problems in the future," he says. "Things are getting worse faster."

One difficulty is that computer-system designers and software engineers aren't using many of the innovative techniques now available to improve system reliability and avoid software errors. "I don't think it has anything to do with a lack of desire to do the best job possible," Wallace says. "They just don't understand what they need to do the best possible job."

"We certainly need a lot more research into technology to find good ways of doing things," she adds, "but equally important is taking what we have and getting it into practice."
COPYRIGHT 1991 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1991, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:analysis of computer failures in telephone systems and in the Patriot missile
Author:Peterson, Ivars
Publication:Science News
Date:Jul 6, 1991
Words:804
Previous Article:Dad's farming may hike baby's liver risk.
Next Article:Volcano could cool climate, reduce ozone.
Topics:


Related Articles
Software failure: counting up the risks.
ABM -- Microchips Support The `Umbrella'.
Y2K BUG SWATTED IN SOUTHLAND.
ALL QUIET LOCALLY ON THE Y2K FRONT COMPUTER BUG FAILS TO LIVE UP TO DIRE PREDICTIONS.
L.A. DISASTER 2000? POLICE GEAR UP FOR Y2K COMPUTER CHAOS.
GLITCH CAUSES TEMPORARY HALT IN CELL PHONE USE.
KIDS' PCS SHIP LATE; POST-HOLIDAY DELIVERY LIKELY.
911 SYSTEM GLITCH SPELLS LONG WAIT FOR CHOKING TOT.
GULF WAR WEAPONRY CLAIMS FAR OFF-TARGET, REPORT SAYS.
Army revives anti-missile system with novel maintenance approach.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters