Keep .NET applications trouble--free: global consulting and actuarial firm uses end-to-end application monitoring to manage code.
Milliman's recordkeeping development and operations teams have built and rolled out more than a dozen different Web-based applications under the leadership of Craig Burma, the company's director of benefits technology. "We wrote several million lines of code," Burma says, "combining Visual Basic, C#, .NET and Delphi in a dozen Web sites and Web services on the Microsoft Windows Server 2003 platform.
"But managing and maintaining these applications was becoming a major challenge," he adds. One .NET production issue, for example, remained unresolved despite more than a year of research and in excess of $20,000 spent in staff time. Even calls to Microsoft technical support did not solve the problem. So Burma tasked Ryan Miller, a Milliman senior systems administrator, with finding a suite of tools to help the operations team solve this issue and better manage the .NET applications overall.
Milliman already had a variety of products to help manage its data center and applications. These tools provided broad operational monitoring and troubleshooting, but they were missing three critical application-level tools needed to troubleshoot application-coding issues: completeness, traceability and .NET details.
Milliman's existing tools provided some pieces to the puzzle of its production issue. One provided approximations of actual performance, but did not monitor the actual user click-through interactions with the applications. Another provided device and component availability, but no insight into application-level stresses on the servers. "The existing tools provided a glance but did not provide the whole picture," says Miller.
TRACING TO ROOT CAUSE
Simply detecting an issue was insufficient for Milliman. Given the overwhelming complexity of a growing code base and limited IT and development resources, Miller needed tools to pinpoint the specific trail of performance and availability issues. He needed a tool that could trace his .NET performance issues to their root cause and provide actionable data to developers.
Milliman's data-intensive applications rely on .NET to do most of the heavy transaction management, leaving the SQL servers streamlined to select, update, insert and delete data. "Most of the performance issues arose from within the .NET stack," Miller notes. "Our existing tools didn't provide the details from within .NET needed to make swift and accurate diagnoses. We needed to know the stack calls within the troubled applications."
In addition, Miller assembled four additional criteria to apply during the selection process. With IT resources already stretched, he first needed a solution he could quickly deploy. He also required a solution easy to use and configure, with minimal additional workload--especially critical given the pace of application development.
With Milliman's systems responsible for reconciling transactions for more than $8 billion in assets, Miller also knew that monitoring all transactions was essential. This comprehensive monitoring would allow Milliman's IT group to pinpoint specific errors from within the application servers, and eliminate the possibility of being blind-sided by customers on an issue.
Finally, because the applications being managed were so complex, Milliman's IT team needed a better way to communicate issues to Milliman's developers. Miller wanted reports on application
performance to show developers where and what needed to be optimized.
Armed with these criteria, Miller conducted an extensive search via Internet, Web conferences and in-person meetings with vendors for a tool to meet his specific needs. After reviewing all the options, including the offerings from his existing vendors, he chose Symphoniq's TrueView product.
To prepare for the TrueView proof-of-concept trial, Milliman first set up a Windows Server 2003 server with an instance of SQL Server 2000. This server had to be accessible from Milliman's Web and .NET servers, since those servers would be pushing performance data to the centralized management server. Miller also installed Microsoft's free SQL Server 2000 Reporting Services module, since TrueView's reports use that Microsoft standard.
INSTALLATION IS EASY
Installation of the TrueView software took less than an hour. With Symphoniq personnel supervising via Web conference, Miller performed the installation himself. First, he installed the TrueView Management Server and Data Repository on the Windows Server 2003 box, followed by the TrueView Server Probes on Milliman's Web and .NET servers (running IIS 6 and .NET 1.1 on Windows Server 2003).
Once installed, the TrueView software began providing useful data almost immediately. "Within 15 minutes of deploying TrueView, I was able to drill down and identify the exact cause of our production issue," says Miller, "We got farther in 48 hours than we had in a year."
As it turned out, the .NET production issue related to a single method call in the .NET native code, out of the thousands of different method calls sprinkled throughout Milliman's applications. Armed with this information, Ryan worked with developers and Microsoft to fix the issue.
Another issue was the amount of data being collected by TrueView's .NET tracing function. Because Milliman's application was complex and computationally intensive, it was using thousands of method calls and SQL queries to render a single Web page. TrueView recorded all of these calls and queries, and the resulting flow of data threatened to bog down the system and consume the available disk space. "The monitoring function itself was creating a production issue," offers Miller.
Again, Miller worked with Symphoniq to address these concerns by adding in additional parameters to filter out routine data and retain only the elements of the .NET trace causing issues.
Based on a successful trial period, Miller recommended a significant investment in the full TrueView platform to Burma. "It was a large five-figure deal that required a presentation to my oversight board of directors" Burma says. "But once I mentioned how TrueView identified our production issue in 15 minutes, the approval was quickly granted."
Today, Milliman uses the TrueView product as a vital part of its daily Web operations. Says Miller, "It works seamlessly to provide actual transaction metrics experienced by end-users, with detail that allows operational and development staff to analyze and troubleshoot our applications step-by-step."
For more information from Symphoniq: rsleads.com/609cn-255
RELATED ARTICLE: Monitor application performance.
by Fred Dumoulin
An effective monitoring strategy should zero in on Web performance problems from three directions: the user application experience, device availability and site availability. A three-pronged performance-monitoring strategy can improve problem identification and resolution. This comprehensive, multi-pronged approach enables IT managers to streamline error detection and resolution, while ensuring that most errors are fixed well before users are impacted.
Of particular interest is a new class of Web-monitoring technology called real-user monitoring (RUM). RUM's real-time view of each user's experience enables IT organizations to quickly point to causes of errors and to find or fix even transient brownouts by correlating seemingly unrelated error patterns.
RUM is important on three key levels. First, it lets operations teams find and isolate problems instantly, reducing mean time to repair and fixing problems before the phone rings. Second, it lets engineering teams track the effects of changes to content, applications, networks and infrastructure, reducing risk and providing change-management accountability. Third, it enables line-of-business and marketing teams to monitor customer quality of service, optimizing performance and availability while resolving service disputes and protecting revenue/renewals.
Site availability is best assessed by invoking synthetic testing solutions from various locations worldwide. Some organizations improperly extend this to include assessing service levels and user satisfaction of individual transactions, which likely will be ineffective in the case of brownouts and transient errors, which are often related to the state of a real user's session, the timing of the request, navigation, the user's browser or the resource selected by a load balancer, and occur only intermittently to a small percentage of users.
Monitoring the user application experience requires seeing and tracking pertinent details of every transaction, every page, every object and every problem encountered by each user, every day. RUM tools cull through and drill down deeply into this session-level data on a point-and-click basis to quickly and accurately assess performance levels, correlate common errors and isolate root causes of problems. By logging and time-stamping all pertinent transactions and error conditions in every user's session, RUM identifies problems and isolates their source. Web-management personnel can then drill down through the session to uncover root causes.
The best RUM systems detect early warning signs and evaluate overall performance against a service-level agreement (SLA). These and other capabilities improve production networks and the entire application life cycle, including development and deployment testing.
Today's RUM tools also integrate with complementary monitoring and diagnostic tools for a more detailed, accurate and comprehensive view of Web site health and performance. They include application programming interfaces and other hooks to third-party tools and infrastructure. This enables RUM summary or individual user activity transcripts to be transported to data warehouses, for instance, or for RUM output to be streamed to business or operations dashboards in real time, extracted for archival purposes, or loaded into third-party or custom reporting tools or enterprise-wide data buses.
Employees accessing inappropriate Web sites will always be a concern to organizations of all sizes, but the growing presence of harmful applications on desktops, and the use of instant messaging and peer-to-peer applications has also dramatically increased organizations' risks. According to a study by the Websense Resource Center:
* 90% of businesses suffered hacker attacks in the last year.
* 30% of peer-to-peer requests are for pornographic downloads.
* One in three corporations have spyware on their networks.
* 45% of IT managers reported that viruses have infected their company's network,
* 70% of all Internet porn traffic occurs during the 9-to-5 work day.
Fred Dumoulin is a project manager for Coradiant, Poway, Calif.
For more information: rsleads.com/609cn-250