Ensuring reliability of customized software in distributed systems.
IT professionals already have the standardized tools necessary to manage hardware systems, networks and databases, but they still can't manage the behavior of the most complicated and important link in the processing chain--the customized software applications themselves. Until now, that is. In this article, we will detail a viable solution: an emerging technology called application behavior management (ABM).
Customized software applications automate, define and differentiate the companies that create them based on their uniqueness and value. When these mission-critical applications fail, the results can be catastrophic, including loss of revenue and clients and damage to reputation and shareholder value--not to mention the resource-draining efforts involved in damage assessment, post-outage diagnosis and analysis, and application recovery. Moreover, for the inevitable eventuality of application failures, there are contingency costs to consider for redundant systems, standby personnel, etc.
Unfortunately, the rapid introduction of new distributed system frameworks is making the problem even more acute and the solutions to correct it more complex than in client/server or mainframe environments. That's because all the binding between software is happening at runtime. This makes distributed systems extremely difficult to debug, and a company exposes important automated segments of its enterprise to trusted partners that can send data that can bring its system down. When these systems are integrated, there's no way to know everything that can happen, and it's progressively harder to fully and realistically test an application's custom code before it goes into production. In essence, the software development lifecycle extends into early production cycles, with customers often experiencing performance problems and application defects.
When a customized software application does start behaving abnormally, how do we pinpoint the source of the problem and reduce downtime? Some companies have developed their own solutions. For the most part, these non-standard methods address problems as they occur, but the cost of implementing them is typically high and the returns tend to be more diminishing than the downtime.
Moreover, an application's availability and performance need to be monitored, verified and reported on constantly. The primary means of determining availability parameters for production applications has been to rely on external simulation testing, typically using ghost transactions with various levels of sophistication.
This technique does supply some useful information and is a good fit for simply verifying acceptable response levels. Unfortunately, an external view of an application produces an incomplete picture of its overall behavior in production. One reason is a lack of data granularity, which makes it impossible to clearly and accurately identify the cause of bottlenecks when behavior issues are detected. Another reason is that ghost transactions generally need to stop short of exercising certain production-oriented services used by the application system, e.g., dummy credit card transactions. In other words, looking at application system behavior "from the outside in" is a solution that is both incomplete in its view and provides incorrect data.
The Complete View, The Correct Data
If we were able to see into the inner workings of any software application, we could monitor in detail how the application is behaving, just as we have insight into the operating system, network and database. We should be able to see just how the application is behaving in production. For example:
* Which parts of the system are performing abnormally slowly?
* What are the overall system activity levels at a given time of day?
* How long is it taking to settle my credit card transactions through each of my two banks?
* How many transactions is each bank processing at various times of the day?
If access to this kind of information were available to us in a production application environment, the resulting data could be put to powerful and valuable use to verify acceptable performance; recognize and pinpoint problems as, or even before, they occur; send notification of problems to the appropriate specialists; and initiate automated pre-configured tasks to take remedial action.
This is where application behavior management conies in, providing that information access. ABM technology involves the collecting and analyzing of specific internal performance information and using that data intelligently by acting upon changing conditions to prevent costly failures. But who or what determines that a specific software process is behaving acceptably at all times? Is potential failure assessment a black-and-white issue or does it depend on who is asking the questions and when? What facilities are currently provided to proactively minimize the failure of software systems and to manage them when they do fail? These are some of the problem areas associated with ABM.
To be truly effective, an ABM solution must provide its users with the capability to:
* Produce and expose application behavior data with various degrees of granularity;
* Process this data to determine the acceptability level of application behavior in real time, as well as maintain database statistics for later reporting and analysis;
* Provide intuitive and flexible management capabilities to respond to changes in application behavior, both proactively and reactively.
Implementing ABM requires a standardized platform for the collection, distribution and management of application behavior data, but the ABM capabilities must not be allowed to burden the application; to manage application behavior, the management process itself has to stay out of the way of the managed process and not impact performance. In addition, ABM'S problem detection and resolution capabilities must be provided automatically and remotely, without incurring severe security risks by opening up the applications from the inside out. ABM also needs to generate, collect and make available an appropriate amount of information without producing information overload. Finally, it must provide the required management capabilities to roll up component-level behavior information into custom views that are intuitive to the various members of the organization involved with the reliability of its customized, mission critical applications.
ABM in Action
in the following case study, a car company offers an online service for financing vehicle leases and loans. The financial model is based on usage from three main areas: the car dealer's staff, shoppers who are online at the dealer's location, and customers online via the Internet.
The system depends completely on third-party service offerings in order to successfully complete a transaction. In the past, this was accomplished with a dedicated line linking the systems between the companies. Monitoring and analyzing the traffic on the dedicated line provided sufficient indication of the operability and availability of the third-party vendor's service. This one-to-one approach is no longer viable, given today's use of the Internet as a distributed services platform for transactions that span multiple companies in a one-to-many fashion.
Accordingly, the business needed a reliable indicator to show whether the third-party involved in the distributed transaction was continually providing adequate quality of service. If not, the situation would become even more complex. Problem identification in this scenario is challenging, resulting in a delay beginning remediation. These actions are vital and need to begin immediately, including notifying the third-party vendor; notifying the customer relationship management (CRM) application so customer care can respond to customer calls more effectively; and modifying the interface on the public site dynamically to minimize user frustration while the situation is being rectified.
To support its needs fur application behavior monitoring and management, the car company deployed an ABM platform, enabling them to see into the internal execution of the application in production. Using the data supplied by the ABM platform, a baseline was determined for acceptable application behavior. A clear understanding of what constituted acceptable behavior based on hard data, combined with the ABM ability to monitor application behavior in real-time, yielded a solution that provided accurate and dynamic responses to potentially problematic conditions. The line-of-business owner was provided with a secure view to a single intuitive indicator or gauge, which showed the level of service being obtained from the third-party as it related to this critical transaction. Additionally, network managers and developers now had a facility to drill down to get accurate, detailed, real-time information about the source of the problem.
As a result of deploying an ABM platform, the car company can now determine in real time when the third-party service they depend on is responding too slowly and automate a programmed response. Moreover, requests can be automatically diverted to the CRM application for online support while an acceptable level of service from the third-party is being reestablished.
The beauty of distributed system frameworks is that they give us easy access to the power of networked computer systems without exposing us to the underlying complexity. The down side is that, in a distributed system, the execution path of any request made to a software application is inherently complex because of its non-centralized architecture. When breakdowns occur in production, this complexity increases the cost in time and money of finding the source of the problem and fixing it. ABM technology offers a high-payback solution that provides fine-grained visibility into the running of critical applications, automatically determines acceptable behavior, and automatically reacts to deviations from it.
Oded Noy is chief technology officer for PATH Communications, Inc. (Marina Del Rey, Calif.)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Jul 1, 2003|
|Previous Article:||A roadmap for proper taxonomy design: Part 1 of 2.|
|Next Article:||High stakes. High voltage: new millennium, old dependency on electricity.|
|ENGENUITY TECHNOLOGIES ANNOUNCES JLOOXTELECOM 2.0.|
|InstallShield Professional V7. (Tools).|