Benchmarking relevance?Like many of the world's largest software vendors, database software giant Oracle uses product benchmarking results as the cornerstone of multi-million dollar marketing campaigns. One of Oracle's recent advertisements compared the performance of its flagship 8i database with rival IBM's DB2 database: "Oracle runs SAP [applications] four times faster."
Oracle is hardly alone. BEA Systems BEA Systems, Inc. (NASDAQ: BEAS) is one of the major companies developing enterprise infrastructure software. BEA makes middleware, products that help software run on top of databases. gleaned publicity after boasting that its application server, WebLogic, could run four times faster than IBM's own application server, WebSphere, under IBM's own 'Trade2' online brokerage benchmark. BEA BEA - Basic programming Environment for interactive-graphical Applications, from Siemens-Nixdorf. also says that under the Java 'Pet Store' benchmark, its product is over 50% faster than Oracle's 9i Application Server.
Effective marketing material, no doubt, but IT decision-makers need to ask themselves to what extent should such claims of superiority be taken at face value?
Not at all, argues Ulrich Marquard, vice president of performance, benchmarking and data archiving at German software giant SAP. "In the Oracle case, our research shows the numbers Oracle compared [to DB2] were produced on hardware with an absolutely different numbers of CPUs [central processing units See CPU.
(architecture, processor) central processing unit - (CPU, processor) The part of a computer which controls all the other parts. Designs vary widely but the CPU generally consists of the control unit, the arithmetic and logic unit (ALU), registers, temporary buffers ]," he says. In fact, Oracle's database was tested on a 128-way Symmetric Multiprocessing See SMP.
(parallel) symmetric multiprocessing - (SMP) Two or more similar processors connected via a high-bandwidth link and managed by one operating system, where each processor has equal access to I/O devices. (SMP (Symmetric MultiProcessing) A multiprocessing architecture in which multiple CPUs, residing in one cabinet, share the same memory. SMP systems provide scalability. As business increases, additional CPUs can be added to absorb the increased transaction volume. ) computer architecture, while IBM's was tested on a 24-way SMP system.
Perhaps not surprisingly, technology industry analysts' reaction to the way suppliers use benchmarking data is scathing. "It is useless," says Ted Schadler, group director at Forrester Research's TechRankings business which assesses technology product strengths and weaknesses across a broad set of subjective and objective categories.
Benchmarking, in itself, is not without its uses, of course. In the early 1990s, for example, software performance benchmarking data (generated by the independent Transaction Processing Performance Council Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1985 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. , or TPC (Transaction Processing Performance Council, San Francisco, CA, www.tpc.org) An organization devoted to benchmarking transaction processing systems. In order to derive the number of transactions that can be processed in a given time frame, TPC benchmarks measure the total performance of ) was very helpful to IT decision-makers, says Susan Dallas, research director at analyst Gartner. When client-server systems were first introduced, organisations struggled to get all of the different components of an IT system to work together and to figure out its potential throughput and the source of bottlenecks, she explains.
But over the last few years, the Years, The
the seven decades of Eleanor Pargiter’s life. [Br. Lit.: Benét, 1109]
See : Time validity of benchmarking has dwindled as vendors have seen them as key marketing vehicles - even going as far as to build benchmarking optimisers into product code. So just how relevant are benchmarks in helping users' make good software purchasing decisions?
Reality check A major problem is that most benchmarks by their very nature are carried out under laboratory conditions. "Benchmarks are not the real world [of most enterprise IT architectures]," says Mark Brockbank, a consultant in the software group at IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) .
For example, application server software that is capable of processing large volumes of web transactions per second In a very generic sense, the term Transactions Per Second refers to the number of atomic actions performed by certain entity per second. In a more restrictied view, the term is usually used by DBMS vendor and user community to refer to the number of database transactions performed in a test environment may struggle to reproduce a fraction of that performance on an organisation's actual systems. "This could be down to an organisation using different hardware, but our research shows it is the design of the [organisation's own] application that determines 90% of the performance characteristics that they get in the end," adds IBM's Brockbank.
However, apart from generating performance data, analysts still see an important role for benchmarking within specific technology markets, particularly when applied to emerging technologies. This is certainly the case with security software, says Dallas at Gartner.
The NSS (Novell Storage Services) A 64-bit file system introduced with NetWare 5 that can support terabyte-sized files. NSS files and standard NetWare files can be used in the same server. See NetWare 5.
1. (networking) NSS - Nodal Switching System. Group provides independent third-party testing and certification services for security products, for example. In particular, NSS specialises in performance- and feature-oriented tests on firewalls, intrusion detection system This article is about the computing term. For other uses, see Burglar alarm.
An intrusion detection system (IDS) generally detects unwanted manipulations of computer systems, mainly through the Internet. products and public key infrastructure products. Bob Walder, director of NSS, acknowledges that "vendors will present their test results in a certain way that will make their product look better."
To support his argument, he contrasts how NSS and an intrusion detection system (IDS) software vendor, which he declines to name, benchmarked the vendor's software. NSS used a system to blast web traffic of 148,000 packets at 64 bytes per second at the IDS software to see how many malicious attacks, such as denial of service A condition in which a system can no longer respond to normal requests. See denial of service attack. packets, it recognised. This test focuses on pure performance or "raw sniffing speed", says NSS Group's Walder.
The IDS software vendor, by contrast, used 1,514 byte packets instead of 64 byte packets to perform its own benchmarking tests. Increasing the size of packets dramatically means that fewer packets travel through its IDS software. "This way, the vendor could say it achieved a detection rate of over 90% at 100% network saturation, but when we tested it [with 64 byte packets] it scored way, way lower than that, and in fact, the software crashed," says Walder.
In addition to security product performance capabilities, end users are now demanding far more detailed benchmarking criteria about software including functionality, reliability and integration, according to according to
1. As stated or indicated by; on the authority of: according to historians.
2. In keeping with: according to instructions.
This is an area that Forrester Research Forrester Research is an independent technology and market research company that provides its clients with advice about technology's impact on business and consumers. Corporate facts
Each product is tested against between 400 and 600 criteria, says Schadler.
In this format, benchmarking appears to be of more use to prospective customers - an acceptable 'second best' to an organisation running its own real-life target applications against different products. Benchmarking, however, will never replace the need to carry out quality assurance and fine tuning of whole systems and their individual components.
Stress tests The most common form of benchmarks are bespoke be·spoke
Past tense and a past participle of bespeak.
1. Custom-made. Said especially of clothes.
2. Making or selling custom-made clothes: a bespoke tailor. - that is, tests conceived and run internally within organisations to gauge the performance characteristics of their existing or proposed applications. But over the past decade, a slew of standard software benchmarks have emerged that score the performance of core software products.
TPC (Database applications) By far the most prominent of the independent benchmarking and testing organisations is the Transaction Processing Performance Council (TPC). Founded in 1988, the TPC is a non-profit organisation that provides a strict set of parameters for benchmarking hardware and systems software as they process database transactions and queries. The tests are audited and verified by the TPC - albeit on behalf of hardware and software vendors.
The TPC tests cover four areas: TPC-C A benchmark that measures overall transaction processing performance. See TPC. simulates an order-entry environment; TPC-H TPC-H Transaction Processing Council Ad-hoc/decision support benchmark (computer performance) measures the capability of a system to process ad hoc queries; TPC-R TPC-R Transaction Processing Performance Council - Decision Support Benchmark tests configurations for their ability to handle advanced decision-support processes; and TPC-W TPC-W Transaction Processing Performance Council - E-Commerce Benchmark is a transactional web benchmark measuring web interaction processing.
Tests are typically carried out by server vendors in cooperation with one of the major database software companies and are scored both on raw performance and price/performance. But there is one caveat: while results have to be approved by the TPC, vendors are under no obligation to publish unfavourable results.
Pet Store (Application servers) Vendors themselves also provide benchmarks for software infrastructure technologies. This is the case with the Java Pet Store application benchmark developed by Sun Microsystems. Pet Store has become the industry benchmark for testing the capabilities of application servers that adhere to the Java 2 Enterprise Edition (J2EE (Java 2 Platform, Enterprise Edition) A platform from Sun for building distributed enterprise applications. J2EE services are performed in the middle tier between the user's machine and the enterprise's databases and legacy information systems. ) interoperability standard. The idea is to simulate ecommerce activities of an online pet store.
Trade2 (Application servers) Systems and software giant IBM has also developed the Trade2 benchmark to show off the performance characteristics and cost benefits of its WebSphere Application Server product. Formerly known as the WebSphere eBusiness Benchmark, Trade2 tries to replicate the real-world application behaviour of an online brokerage firm to measure the overall performance, as well as the individual components, of WebSphere. However, the benchmark can be run against other application servers.
SAP SAB (Various) A different approach to benchmarking has been taken by German software giant SAP. Since 1993, it has provided a standard set of application benchmarks for other vendors to test new hardware, system software components, and database systems against SAP applications. The company's Standard Application Benchmarks (SAP SAB) are available for most core modules of its enterprise resource planning See ERP.
(application, business) Enterprise Resource Planning - (ERP) Any software system designed to support and automate the business processes of medium and large businesses. system, mySAPcom, including financials, retail, sales and distribution. For example, a relational database software vendor might use SAP SAB to test performance capabilities, including scalability, concurrency Operations that are performed simultaneously within the computer. For example, dual-core CPUs provide complete overlapping of two independent processes. See dual core, hyperthreading, multiprocessing, multitasking, multithreading, SMP and MPP.
concurrency - multitasking and multi-user behaviour of systems running on its database.
This also benefits SAP because it can measure how well specific products work with SAP applications.
Compared to the complexity of software benchmarks, hardware tests are far more focused on measuring fixed elements of systems, such as the performance of a computer's processors, rather than overall system performance. The de facto standard Hardware or software that is widely used, but not endorsed by a standards organization. Contrast with de jure standard.
de facto standard - A widespread consensus on a particular product or protocol which has not been ratified by any official standards body, such as ISO, performance test for processors is that of the Standard Performance Evaluation Corporation The Standard Performance Evaluation Corporation (SPEC) is a non-profit organization that aims to produce "fair, impartial and meaningful benchmarks for computers." SPEC was founded in 1988 and their goal is to ensure that the marketplace has a fair and useful set of metrics to (SPEC), a non-profit organisation formed by hardware vendors to create standardised benchmarks that can genuinely rate servers and workstations.
But SPECrate results cannot exist in a vacuum; system's performance can be substantially influenced by interconnected components such as a computer's disk and memory. This is one reason why hardware vendors have relied on the benchmarks of the Transaction Processing Performance Council, which aim to measure entire system workloads.
Rateable value Market research organisation Forrester Research's TechRankings service aims to provide organisations with a level of data about software products that empirical benchmarks lack.
It combines Forrester's research data, including vendor interviews and analysis, with laboratory-based product testing by its partner Doculabs. However, unlike other benchmark tests, such as the TPC's, once products are tested vendors cannot prevent publication of the analysis - even if it is unfavourable.
Forrester claims that end users are now putting pressure on vendors to put their products through its TechRankings test - to prove their products are up to scratch. Product test areas include content management software, integration servers, and commerce platforms.