Printer Friendly
The Free Library
14,709,470 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Analyzing multiple component failures in a system; an analysis of each failed component--and knowing how components fail--can help uncover initial failure.


When a product fails due to a single component, determining why the component failed is relatively easy. You examine the component specifications and application in the product to be sure that the component was not being misapplied and that its maximum stress levels were not being exceeded.

However, when faced with the simultaneous failure of multiple components, the determination of the root cause is not so easy. Postulating a single point of failure, which will cause the rest of the components to fail simultaneously, or several component failures, which will still permit the system to continue to operate until a final failure occurs that causes the product to quit, becomes necessary.

This article deals with the analysis of the failure of a switching power supply Switch´ing power supply

n. 1. a device used as part of an electronic device, which transforms electrical current from an AC line circuit to DC for use in electronic devices, and which can use either 110 volt or 220 volt AC line curent.
 on a computer motherboard. The failing components were a combination of capacitors, MOSFET (Metal Oxide Semiconductor Field Effect Transistor) The most popular and widely used type of field effect transistor (see FET). MOSFETs are either NMOS (n-channel) or PMOS (p-channel) transistors, which are fabricated as individually packaged  transistors, a power supply controller integrated circuit integrated circuit (IC), electronic circuit built on a semiconductor substrate, usually one of single-crystal silicon. The circuit, often called a chip, is packaged in a hermetically sealed case or a nonhermetic plastic capsule, with leads extending from it for  (IC) and traces on the printed wiring board (PWB (Printed Wiring Board) An alternate term for printed circuit board. See printed circuit board. ).

The power supply is used to reduce +5V to lower voltages necessary for the central processing unit See CPU.

(architecture, processor) central processing unit - (CPU, processor) The part of a computer which controls all the other parts. Designs vary widely but the CPU generally consists of the control unit, the arithmetic and logic unit (ALU), registers, temporary buffers
 (CPU CPU
 in full central processing unit

Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit.
). The system shown in Figure 1 consists of a control IC, two MOSFET transistors that act as current switches, two inductors and three aluminum electrolytic e·lec·tro·lyt·ic
adj.
1. Of or relating to electrolysis.

2. Produced by electrolysis.

3. Of or relating to electrolytes.



e·lec
 capacitors connected in parallel on the input and five parallel aluminum electrolytic capacitors on the output side of the supply. The capacitors are connected in parallel to reduce the overall equivalent series resistance (ESR ESR - Eric S. Raymond ) of the capacitors as well as to increase the capacitance.

The field failure rate of the computer was between 30% and 40%. The motherboards usually failed after several months of use. Initially, we relied on the judgment of the manufacturer who proposed a failure mechanism based on an earlier problem that had been noted. The manufacturer suggested that a grounding clip between the backside of the motherboard near the CPU socket The term CPU socket (or CPU slot) is widely used to describe the connector linking the motherboard to the CPU(s) in certain types of desktop and server computers, particularly those compatible with the Intel x86 architecture.  and the case had not been installed correctly, which caused the clip to pop off and in some cases a short between the motherboard and the case. Months were lost discussing this failure mechanism and how to fix it.

In retrospect, no one seemed to notice obvious problems. Our product was stationary and secluded. At the time of failure, nothing was around to bump the machine to cause the clip to fall off or to move a loose clip lying in the bottom of the case. In any case, a loose clip would produce a series of random and sometimes intermittent failures by shorting a signal line. All of our failures, however, were catastrophic in nature--drop dead failures.

[FIGURE 1 OMITTED]

When we examined a set of failed motherboards, we realized immediately that the clip theory was not appropriate. We found that the aluminum electrolytic capacitors had bulged and ruptured, leaking their electrolyte electrolyte (ĭlĕk`trəlīt'), electrical conductor in which current is carried by ions rather than by free electrons (as in a metal). . We also found some of the MOSFETS shorted, the controller IC output MOSFET drivers damaged and about half the PWBs had damaged traces that supplied current to the MOSFETS.

The clip short theory would explain the presence of a fused open or damaged trace seen on some of the motherboards (Figure 2). However, the theory would not explain the bulged and ruptured capacitors (Figure 3) nor the shorted MOSFETs and controller ICs.

[FIGURE 2 OMITTED]

The capacitors had failed because they overheated o·ver·heat  
v. o·ver·heat·ed, o·ver·heat·ing, o·ver·heats

v.tr.
1. To heat too much.

2. To cause to become excited, agitated, or overstimulated.

v.intr.
, causing excessive pressure to build up inside of them. In essence, the electrolyte had boiled. A short external to the capacitor would simply have discharged the capacitors one time before something else failed. The capacitors would not have heated excessively by this one-time occurrence.

No thermal damage occurred to the PWB, except directly beneath the fused trace, to indicate that the capacitors had been heated externally. These capacitors were rated at 105[degrees]C and could operate at temperatures as high as 125[degrees]C, although with a greatly reduced lifetime. An external temperature source great enough to heat these capacitors to this level would have left thermal damage on the other components in the vicinity.

Failure Scenario

We determined that the failure scenario occurred in this manner:

1. One of the three input capacitors failed due to internal overheating Overheating

An economy that is growing very quickly, with the risk of high inflation.
. This event would occur slowly, and the capacitor would leak and eventually become an open circuit after several months use.

2. The second and third capacitors then failed within a few weeks due to the extra loading they now experienced. The computer would still be functional at this time, but the individual components in the supply circuitry would be stressed beyond their operating limits. We found one system in a lab at our facility that was still operating with all but one input capacitor failed.

3. Finally, all the input capacitors failed. As the input capacitors failed, the AC ripple voltage Ripple voltage

The time-varying part of a voltage that is ideally time-invariant. Most electronic systems require a direct-current voltage for at least part of their operation.
 riding on the DC voltage supplied to the MOSFET switching transistors increased. To compensate for the varying voltage, the high side switching MOSFET transistor conducted greater amounts of current, which placed an excessive load on the high side switching MOSFET transistor and caused the high side transistor to overheat o·ver·heat  
v. o·ver·heat·ed, o·ver·heat·ing, o·ver·heats

v.tr.
1. To heat too much.

2. To cause to become excited, agitated, or overstimulated.

v.intr.
 and fail. MOSFET transistors failing in this manner usually fail with a drain to source short.

[FIGURE 3 OMITTED]

4. When the high side transistor failed in the shorted condition, the +5V from the supply was then applied to output inductor inductor, electric device consisting of one or more turns of wire and typically having two terminals. An inductor is usually connected into a circuit in order to raise the inductance to a desired value.  all the time and would cause the output voltage to rise. The control IC tried to reduce this voltage by dumping charge to ground through the low side MOSFET. However, when the low side MOSFET was turned on, a +5V to ground short occurred through both MOSFETS. Normally, this directly shorted condition would be prohibited by the control IC that cannot turn on both MOSFETS simultaneously, but it cannot detect the short in the high side MOSFET.

5. This power supply short did two things. First, it caused the voltage on the output capacitors to alternately go from near +5V to 0V. This reading far exceeded their ripple voltage and current specifications, causing them to overheat and rupture within a few seconds. Second, it caused the power supply trace supplying current to the short to overheat and ultimately fuse open.

Steps 1 and 2 could, and did, take months to initiate the failure. Steps 3 and beyond could happen in the course of minutes or seconds. In essence, the customer relies on the computer's operation to tell him/her if a problem occurs. The customer, who relies on the operation of the computer to detect a problem, would not know a problem exists until the computer shuts down at Steps 4 or 5.

This scenario covers all noted component failures. It also covers those PWBs in which not all the indicated components had failed. Every failed system had all of the input capacitors bulged and ruptured. In addition, the high side MOSFET had also failed 100% of the time, but the low side MOSFET did not always fail. In these cases the system shut itself down before this MOSFET could short. In some of these cases, the trace fused open before the low side MOSFET could fail. Not all output capacitors were found bulged or ruptured. In these cases, other components failed before the output capacitor could overheat.

Summary

The advice of a specially trained failure analyst is of tremendous value in examining failures. The designer knows his/her circuit and how it should operate, when it is operating correctly. However, he/she does not have the knowledge to determine if a failed capacitor caused a MOSFET, for example, to fail or vice versa VICE VERSA. On the contrary; on opposite sides. . Neither can the designer determine if a MOSFET failed due to a high voltage The term high voltage characterizes electrical circuits, in which the voltage used is the cause of particular safety concerns and insulation requirements. High voltage is used in electrical power distribution, in cathode ray tubes, to generate X-rays and particle beams, to  spike or an overcurrent condition. The failure analyst has the tools and knowledge at his/her disposal to determine these things "These Things" is an EP by She Wants Revenge, released in 2005 by Perfect Kiss, a subsidiary of Geffen Records. Music Video
The music video stars Shirley Manson, lead singer of the band Garbage. Track Listing
1. "These Things [Radio Edit]" - 3:17
2.
 based on subtle differences in the physical characteristics of the failed devices.

We made electrical and temperature measurements of the circuit under simulated failure conditions, with some or all of the input capacitors removed. These showed that the suppositions we made during the initial portion of the analysis were, indeed, correct. The temperature of the MOSFETs increased as the input capacitance was reduced and the ripple voltage increased.

Once we determined that the capacitors were the initiator of the failure, we set out to determine why they had failed. We examined them physically and found no evidence of punch through nor had they been reverse biased. We found nothing physically wrong with the foil electrodes or separator papers.

We performed a life test on them. We placed them in an oven under correct DC bias and at the maximum ripple current Ripple current refers to the AC portion of the current signal applied to a device in its application. Although this term defines an AC portion of the applied signal, it is usually in reference to the small level of variation of DC signals encountered in a power supply application.  allowed to see if they met their manufacturer's specified lifetime. We found that these capacitors failed after only about 50% to 60% of the specified lifetime. About a year after these findings, the industry reported similar failures of low-ESR capacitors from several manufacturers. Investigation showed that a key component of the electrolyte used in these capacitors had been left out and that the electrolyte was breaking down prematurely, causing the pressure to rise within the capacitor and subsequent failure. (1) (2) (3)

Conclusion

Almost a year passed between the first few field failures and the point at which the capacitors were found to be at fault. During that time products continued to fail and were replaced at great cost. The supplier of the computer systems looked at failed units, but its staff was not trained in the analysis of component and systems failures. By relying only on their analysis, the correct solution was delayed.

To correctly understand the failure of an electronic system, the analyst must not only have a broad understanding of electrical engineering electrical engineering: see engineering.
electrical engineering

Branch of engineering concerned with the practical applications of electricity in all its forms, including those of electronics.
 but also of physics, chemistry, thermal mechanics and metallurgy. The use of a trained failure analyst could have reduced the effects of the failures by determining the cause and corrective action A corrective action is a change implemented to address a weakness identified in a management system. Normally corrective actions are instigated in response to a customer complaint, abnormal levels if internal nonconformity, nonconformities identified during an internal audit or  quickly.

The use of a life test on the capacitors to verify the conclusions was also necessary because the failures were not immediate. Measurements of operating computers showed that the ratings of the various components were not being exceeded.

For more information, cell: Susan Jones 404/822-8900

References

1. Passive Component Industry Magazine, "Low-ESR Aluminum Electrolytic Failures Linked to Taiwanese Raw Material Problems," September/October 2002.

2. Passive Component Industry Magazine, Bettyann Liotta "Taiwanese Cap Makers Deny Responsibility," November/December 2002.

3. IEEE Spectrum IEEE Spectrum is a magazine edited by the Institute of Electrical and Electronics Engineers. The IEEE's description of it is:

IEEE Spectrum Magazine, the flagship publication of the IEEE, explores the development, applications and implications of new
, Yu-Tzu Chiu (Taipei) and Samuel K. Moore, "Leaking Capacitors Muck up Motherboards," February 2003.

RELATED ARTICLE: Failure Modes in Electrolytic Capacitors

Aluminum electrolytic capacitors are basically two aluminum foil Noun 1. aluminum foil - foil made of aluminum
aluminium foil, tin foil

foil - a piece of thin and flexible sheet metal; "the photographic film was wrapped in foil"
 electrodes separated by strips of porous paper that are rolled together and placed in an aluminum canister. The canister is then filled with an electrolytic solution and sealed with a rubber plug. The aluminum electrodes are oxidized oxidized

having been modified by the process of oxidation.


oxidized cellulose
see absorbable cellulose.
 to produce a thin dielectric layer of aluminum oxide aluminum oxide: see alumina. . The anode anode (ăn`ōd), electrode through which current enters an electric device. In electrolysis, it is the positive electrode in the electrolytic cell.
anode

Terminal or electrode from which electrons leave a system.
 plate is formed by one of the aluminium foils. The cathode electrode is formed by the electrically conductive electrolyte. The second aluminum foil is used only to make electrical contact Noun 1. electrical contact - contact that allows current to pass from one conductor to another
tangency, contact - (electronics) a junction where things (as two electrical conductors) touch or are in physical contact; "they forget to solder the contacts"
 with the electrolyte.

The aluminum electrolytic capacitor has a built-in wear out mechanism. All aluminum electrolytic capacitors will ultimately fail given enough time. The liquid electrolyte is sealed with a hard rubber plug that is crimped crimped

said of grain that has been passed through corrugated rollers after previous exposure to moist heat so that the grain is fractured but there is a minimum of dust.
 into the aluminum canister. Over time, the rubber will lose its elasticity and the electrolyte will then leak out Verb 1. leak out - be leaked; "The news leaked out despite his secrecy"
leak

get around, get out, break - be released or become known; of news; "News of her death broke in the morning"
. Since the electrolyte forms one of the electrodes, its loss results in an open capacitor.

Normally, this process happens so slowly that most capacitors will operate for over 10 years at room temperature. Also, a buildup in pressure within the canister will cause it to leak around the seal. This pressure can be caused by an increase in temperature of the capacitor. Under some failure conditions the electrolyte will break down, producing hydrogen and/or oxygen gas. In addition to causing a loss in the electrolyte by decomposition, these gases increase the pressure within the canister, causing it to leak through the seal.

The loss of electrolyte forms one failure mode in the capacitor. The other two failure modes are similar to each other and difficult to separate. The thin oxide covering the aluminum can be broken down by the application of too much voltage. This can occur if the maximum bias voltage See bias.  is exceeded or at a much reduced voltage potential if the bias voltage is reversed. In cases of substantial DC current through the rupture, visual evidence of the damage on the electrodes occurs. Determining which polarity of voltage damaged the oxide is difficult, but an inspection of the placement of the capacitor in the circuit and a measurement of the combined DC and AC voltages can determine the source of the damage.

Mark Woolley is a research scientist, email: woolleym@avaya.com; and Jae Choi is manager, email: jaechoi@avaya.com--both with the Product Reliability Laboratory at Avaya, Westminster, CO.
COPYRIGHT 2003 UP Media Group, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Failure Analysis
Author:Choi, Jae
Publication:Circuits Assembly
Date:Dec 1, 2003
Words:2123
Previous Article:PCB UPdate: news and resources for PCB industry professionals.(Assembly Insider: Special Advertising SECTION)
Next Article:Aging characteristics of immersion tin surface finishes: an investigation of this surface finish's aging and its reliability during...



Related Articles
Testing and analysis of rubber-to-metal bonded parts.
Mirroring Your Way To A Fault-Tolerant Storage System Beyond RAID 5.
Rubber product failure. (Reports/Studies).(Brief Article)
Path managers keep SANs on the right track: guide storage admins through a forest of network devices.
Are lead-free solder joints reliable? Judge for Yourself. A NEMI team found that lead-free manufacturing can be implemented without degrading solder...
Component cleanliness in a no-clean world: a case study of component residues.(Process Doctor)
Afraid of the dark? We'll examine the effect of severe black pad defect on solder bonds on BGAs.(Soldering)
What are all those nines? A failure probability primer. Just the facts, Jack.(Backup/Restore)
Reliability evaluation of systems with interdependent components.
No silver lining: sulfate contamination causes visible silver crystalline growth.(Process Doctor)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles