Printer Friendly
The Free Library
14,503,364 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Simplify complexity: the solution for improved service levels and reduced risk.


Complexity drives up support efforts, resulting in higher operational risk and costs and degraded service levels. In fact, research statistics often state that 80% of all unplanned downtime is attributed to people and process issues such as performing an operation out of sequence. But most times, we blame failure to manage networks or systems on people and process, not on the complexity itself.

Two means to manage complexity address improving Mean Time Between Failures (MTBF (Mean Time Between Failure) The average time a component works without failure. It is the number of failures divided by the hours under observation.

MTBF - Mean Time Between Failures
) and Mean Time to Repair (MTTR (Mean Time To Repair, Mean Time To Restore) The average time it takes to repair a failed component. See reliability.

MTTR - Mean Time To Recovery
). Both are effective, but they also actually introduce more complexity into a system. For example, the hot-swappable disk improves MTTR but is internally far more complex than a non-swappable disk.

The primary vehicle to attack complexity is utilizing tools that automate tasks that are otherwise performed manually. These tools wrap technologies behind simple well-defined human interfaces. For example, some tools automate tasks for data center operations for Tier 1 support people. There are tremendous cost advantages to automating Tier 1 tasks because Tier 1 operations are generally staffed 24X7, so the cost justification for these "enterprise management tools" is amplified by the staffing costs of three shifts.

Not only do Tier 1 tools reduce the number of operational people, but they also reduce the skill level required because the Tier 1 tools enforce strict operational processes in a workflow. For example, an alarm event occurs in the net-work and sends a trap to the Fault Management System's operational console; the Tier 1 console operator detects the fault to be a failed hardware component on the network and dispatches a repair person opening a trouble ticket to track the event. The repair person receives the ticket and resolves the issue and closes the incident when the problem is resolved. The tasks performed by the console operator and repair person were clearly defined in a workflow that spanned the Fault Management and Trouble Ticketing systems.

Tier 1 tools automate operations while Tier 2 tools automate the operation of technology. In the simplest case, the Tier 2 tools automate the management of technology also by simplifying the tasks performed by Tier 2 support persons, such as configuring the technology. If incorrectly configured, extremely complex technologies like operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. , switches or databases are unreliable. Tier 2 tools make these complex technologies manageable by humans.

A second class of tools for managing complexity affects MTTR by using redundancy mechanisms to route traffic around failures. Load balancers are a perfect example, in which traffic is routed around a failed web server to maintain availability in spite of the inevitable failure of the underlying technology.

Go back in time and compare Assembler Software that translates assembly language into machine language. Contrast with compiler, which is used to translate a high-level language, such as COBOL or C, into assembly language first and then into machine language.  language to Java, or MS-DOS MS-DOS
 in full Microsoft Disk Operating System

Operating system for personal computers. MS-DOS was based on DOS, developed in 1980 by Seattle Computer Products. Microsoft Corp. bought the rights to DOS in 1981, and released MS-DOS with IBM's PC that year.
 to Windows. We can generalize that early technologies are simple in comparison to their modern counterparts, but many wouldn't know it by observing their crude controls. This is not only true for computing technology; this is true for all technology. Besides the manual adjustments for film advance, focus, aperture and shutter speeds, early photographers were also the director, artist, producer and developer of their own creations. In stark contrast, with the modern "point and click" camera even a child can take a great picture. During the exploratory stage, the controls for a technology are crude because the technology has not yet found its niche.

As technologies like cars, computers or cameras become widely adopted, the human interfaces become simpler but the technology "under the hood under the hood - [hot-rodder talk] 1. The underlying implementation of a product (hardware, software, or idea). Implies that the implementation is not intuitively obvious from the appearance, but the speaker is about to enable the listener to grok it. " gets far more complex. This is the first rule of simplification: the price tag for simplifying an interface is increasing the complexity behind the scenes. Tier 1 tools hide the complexity of the data center. An Ethernet port A socket on a computer or network device for plugging in an Ethernet cable. See WAN port.  hides the complexity of the Internet just as an automotive ignition key Noun 1. ignition key - a key that operates the ignition switch of an automotive engine
key - metal device shaped in such a way that when it is inserted into the appropriate lock the lock's mechanism can be rotated

ignition key n
 is simpler to use, yet internally far more complex than its predecessor the crank-start.

Today's automobiles can practically think for them selves, adjusting fuel, brakes and traction, to compensate for changes in road conditions. If the family car breaks down today, Dad will call AAA AAA: see American Automobile Association.


(Triple A) A common single-cell battery used in a myriad of electronic devices of all variety. Like its double A (AA) cousin, it provides 1.5 volts of DC power. When used in series, the voltage is multiplied.
, or even click OnStar. Pervasive technologies are designed to be operated by anyone, but the price for this utility is that experts must service them. It takes a small army of experts in most IT departments to manage the corporate computing assets and desktops to keep them humming along without interruptions.

Early technology requires experts. When a technology "crosses the chasm" (Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers by Geoffrey A. Moore, Regis McKenna This article or section needs sources or references that appear in reliable, third-party publications. Alone, primary sources and sources affiliated with the subject of this article are not sufficient for an accurate encyclopedia article. ) and becomes pervasive, the human interface is simplified while it encapsulates still greater complexity. Experts are then required to service the technology. TV repair, film development, VCR VCR: see videocassette recorder.
VCR
 in full videocassette recorder

Electromechanical device that records, stores on a videotape cassette, and plays back on a TV set recorded images and sound.
 rental, gas stations and IT departments are examples of experts who offer services that support the automation of simplified technologies.

There is no greater simplification in our society than utilities like power, phone or water. Utilities mask tremendous complexity behind extremely simple interfaces like the light switch, telephone receiver or water faucet. As we become addicted to these simple human interfaces, we also put our blind faith in the underlying complexity. One of the best examples of this occurred in 2003 when a power blackout temporarily postponed the Information Age for 50 million people in the northeastern United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. . Utility disruptions resulting from natural disasters, technology failures or criminal acts can threaten our way of life.

As we become dependent on technology we become vulnerable. Our once personal computers are anything but personal anymore. They need to be protected by firewalls, virus scanners, anti-ad ware software, and pop-up blockers. These protection schemes are a stark reminder that there is a darker side to becoming part of the largest and most complex man-made machine on the planet.

What about the people impacts to all this complexity? Let's go Let's Go may refer to: Television
  • Let's Go (Philippine TV series), a teen Philippine sitcom on ABS-CBN
  • Let's Go (New Zealand TV series), a New Zealand television music show
  • Let's Go
 back to our earlier references that industry research has suggested that 80% of all unplanned downtime can be attributed to people and process issues and only 20% is caused by technology failures. The Information Technology industry has a disproportionate number of automated solutions for addressing the 20% problem and so we blame the rest of the unplanned downtime on people.

This scene repeats itself thousands of times per night: The Tier 1 operational support person detects a fault and awakens an expert in Tier 2 support to resolve it and who must now make a choice:

The firefighter can meet Service Level Agreements and get back to bed by just rebooting the environment to restore service.

The debugger Software that helps a programmer debug a program by stopping at certain breakpoints and displaying various programming elements. The programmer can step through source code statements one at a time while the corresponding machine instructions are being executed.  can risk the Service Level Agreements and try to debug To correct a problem in hardware or software. Debugging software means locating the errors in the source code (the program logic). Debugging hardware means finding errors in the circuit design (logical circuits) or in the physical interconnections of the circuits.  the problem so that he might get a full night's sleep tomorrow.

A fire fighter is a hero when he restores service without impacting the user. On the other hand, no one ever says to the debugger, "Wow, it's been a year since we've seen that problem." Restoring service quickly is recognized and rewarded frequently at the expense of long-term stability The long-term stability of an oscillator, the degree of uniformity of frequency over time, when the frequency is measured under identical environmental conditions, such as supply voltage, load, and temperature.  issues. Companies spend millions of dollars on high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue.  solutions to be able to tolerate a failure of technology, but solutions that help people and process are sorely lacking.

Why? We know that we should spend more time preparing and planning, but we don't. We would rather spend time in "firefighter" mode, reacting to urgent and even non-important things. Stephen Covey best represented this concept in his book "First Things First Things is a monthly ecumenical journal concerned with the creation of a "religiously informed public philosophy for the ordering of society" (First Things website).  First", explaining that we tend to be driven by a sense of urgency. Tools that help us prioritize our activities and use time more effectively are sorely lacking because the demand for them is so low. We place less value on avoiding a problem and reward "firefighting 1. firefighting - What sysadmins have to do to correct sudden operational problems. An opposite of hacking. "Been hacking your new newsreader?" "No, a power glitch hosed the network and I spent the whole afternoon fighting fires."
2.
".

If the people in Tier 2 support were proactively keeping better control of the infrastructure they would synchronize See synchronization.  all machines performing similar functions with the same patch levels, tunable parameters, hardware configurations and/or workloads, creating standard configurations for given functions. In doing this, they can leverage commonality to manage and understand problems. For example, vendors are notorious for pushing a patch as the aspirin of infrastructure stability. In a standardized environment they can push back and seek an explanation as to why the other 79 machines are not experiencing the exact same problem.

Policy-based management See policy management.  is the name of the solution space that minimizes choices when simplifying complex technologies. When a policy engine detects state changes to a saved baseline, it triggers activities to identify and resolve the issue, returning the system to a known state. This eliminates variability; this eliminates choices, and ultimately reduces complexity. The human interface to the technology is simplified because the variable-controls (such as specific configuration parameters) are inessence "locked" into a standard position, making the complex technology more manageable.

If organizations are willing to fight the urge to be a "fire-fighter", there are companies that provide bona fide [Latin, In good faith.] Honest; genuine; actual; authentic; acting without the intention of defrauding.

A bona fide purchaser is one who purchases property for a valuable consideration that is inducement for entering into a contract and without suspicion of being
 solutions that simplify complexity, providing policy-based management solutions that create and implement standards. These solutions lower cost, improve service levels and reduce business risk for both security and compliance.

Dave Nocera is chief technology officer at Innovativ Systems Design (Edison. NJ)

www.innovativ.com
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Business of Technology
Author:Nocera, Dave
Publication:Computer Technology Review
Date:Dec 1, 2004
Words:1498
Previous Article:Automated tape storage for check image archival.(Business of Technology)
Next Article:Nightmare? Or dream come true?
Topics:



Related Articles
InfiniCon Systems Discusses Sharable I/O and InfiniBand -- The Next Evolutionary Steps in Clustering and Cross-Server Application Communications.
Virtualization: a strategic tool to beat storage inefficiency. (Storage Networking).(Industry Overview)
Unisys unveils advanced Sentinel self-management technologies.(Application Sentinel and Server Sentinel 2.0)
Are you ready to outsource your storage? (Tape/Disk/Optical Storage).
MaXXan unveils MXV320 switch: for business continuity, storage consolidation and NAS/SAN convergence.(Top Technology Showcase)
Genesys announces Genesys Express 3.0.(New Products ...)
Jacada unveils business solution to boost employee productivity.
Network Appliance marks 750th NetApp iSCSI deployment.(achievement)
Unveil Technologies Receives Customer Inter@ction Solutions Magazine's ''Product of the Year'' Award for 2004.
Network Appliance and VERITAS Deliver Disk-Based Data Protection for Multi-Vendor Environments.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles