Storage benchmarks.There is at this time hardly one activity in industry and society that is not subject to the scrutiny of a 'benchmark', usually Government subscribed. In computer technology, benchmarks as old as computers themselves, are useful tools for comparing different configuration alternatives for purchase or performance- tuning decisions. There are a large number of storage-related benchmarks, and this feature discusses the most popular ones, including the following topics: * Benchmarking definitions, general tips, and guidelines * Block I/0-based storage benchmarks * File system-based storage benchmarks * Application-based system benchmarks Benchmarking Considerations Artificial, over-simplified programs (known as benchmarks) are needed to study performance because real applications and workloads are costly, hard to measure, impractical to set up, not repeatable, and made of unknown, complex interactions. Benchmarks, which overcome these limitations are a set of well-defined, representative workloads that can be executed on many systems to compare their performance providing measurable, repeatable performance results. They are used for monitoring system performance, diagnosing problems, and comparing alternatives. Benchmarks are limited, by definition (they must be simple), providing only a partial picture. Consider them a complementary source of information to other studies involving cost, reliability, ease of use, and so on. Benchmarks range from toy benchmarks that have simple kernel loops, to large system benchmarks that simulate enterprise-level information processing information processing: see data processing. information processing Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer-based operations. . This analysis focuses on benchmarks that provide meaningful information about storage subsystems. A good benchmark must provide understandable, relevant information for its domain. It must be scalable for testing a wide range of systems, and must be sufficiently unbiased to be acceptable by a wide range of users and vendors. The following is a list of items that you should consider when setting up a benchmark test-bed for storage. Further tips are discussed later as different benchmarking concepts are introduced. * In multiple central processing unit See CPU. (architecture, processor) central processing unit - (CPU, processor) The part of a computer which controls all the other parts. Designs vary widely but the CPU generally consists of the control unit, the arithmetic and logic unit (ALU), registers, temporary buffers (CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. ) systems (Symmetric Multi-Nocessing, or SMP (Symmetric MultiProcessing) A multiprocessing architecture in which multiple CPUs, residing in one cabinet, share the same memory. SMP systems provide scalability. As business increases, additional CPUs can be added to absorb the increased transaction volume. ), the number of processors will affect the systems' performance. In addition, on these systems, assigning subsets of processors to specific processes and their threads will change execution performance. To utilize processor caches more effectively, threads could be assigned to a small subset of available processors. On many systems, this is accomplished through processor affinity Processor affinity is a modification of the native central queue scheduling algorithm. Each task (be it process or thread) in the queue has a tag indicating its preferred / kin processor. At allocation time, each task is allocated to its kin processor in preference to others. system calls. Using affinity calls should be carefully considered. * Caches change benchmark behavior in unexpected ways. Therefore, you must make sure that all caches are empty before each benchmark execution. Examples of caches to consider in storage benchmarks include the processor caches (L1, L2, and L3), file system buffer cache, network file system client and server caches. You may need to power-cycle client and server machines, or unmount and remount re·mount tr.v. re·mount·ed, re·mount·ing, re·mounts 1. To mount again. 2. To supply with a fresh horse. n. A fresh horse. Noun 1. file systems in between benchmark runs to ensure that the caches have a cold start. * In addition to the choice of using or not using a cache, the size and scheduling policies of caches profoundly affects execution profiles. Expanding or restricting cache sizes and cache policies are some of the first factors to try in a benchmarking study. * In benchmarks that can execute on multiple machines simultaneously, you might need to synchronize the machine times and timings of events. Network Time Protocol (NTP (Network Time Protocol) A TCP/IP protocol used to synchronize the real time clock in computers, network devices and other electronic equipment that is time sensitive. It is also used to maintain the correct time in NTP-based wall and desk clocks. ) can be used to synchronize machine clocks. Distributed benchmark utilities generally have options that can be set to synchronize the benchmark events (such as start and stop execution) between multiple benchmark instances. Here we discuss storage benchmarks in three major categories: block I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output benchmarks, file system benchmarks, and application-level benchmarks. Block I/O Benchmarks Benchmarks that exercise and measure storage systems using block I/O interfaces are useful for obtaining the performance characteristics of storage layers below the file system level. These could be regarded as raw storage performance benchmarks. IOmeter and SPC-1 are the two most widely accepted benchmarks in this category. IOmeter IOmeter was developed by Intel Corp., and is currently distributed as an open-source project. It is a tool for generating tightly controlled I/Oworkloads and collecting performance data such as response time, throughput, and CPU usage. It is best used for stress testing Determining the durability of a system by pushing it to its limits. Stress testing a network is performed by transmitting excessive numbers of packets or attempting to break in illegally. I/O systems to find system bottlenecks. Because it does not have any prescribed, real-world workload definitions, it cannot be used as an application performance predictor. However, it can easily be configured to generate almost any kind of I/O pattern through a graphical user interface graphical user interface (GUI) Computer display format that allows the user to select commands, call up files, start programs, and do other routine tasks by using a mouse to point to pictorial symbols (icons) or lists of menu choices on the screen as opposed to having to . Although IOmeter sometimes appears in literature for benchmarking file servers, it lacks the necessary capabilities to generate a proper file access workload. IOmeter generates reads and writes to volumes (raw or formatted). However, file server workloads are generally dominated by meta- data-type operations such as directory searches and file attribute A file access classification that determines how a file can be viewed or whether it can be edited. File attributes are maintained in the file system's directories, and typical attributes are Read-Only, Hidden, System and Archive. checks. When used to benchmark file systems, IOmeter is only useful for checking the read/write throughput of a file system. It is more useful for testing the block devices directly (as raw devices, which bypass the file system). At the block device level, everything is a data read or a data write operation, no matter what the higher levels might be doing (for example, accessing file attributes). IOmeter can be set to execute with any queue length. A device's queue length denotes the number of outstanding I/Os on that device at a given time. Deep queue lengths generally increase the throughput and the response time. Figure 1 contains sample throughput-response time curves obtained using IOmeter. In these experiments, four IOmeter "worker" processes are executed on four different client machines. These machines access a disk array as their back-end storage, using 16KB random write operations. In the figure, the two curves represent tests with write caching enabled and disabled on the disk array, respectively. Successive data points on each curve are obtained by setting the client queue depths at values of 2, 8, 32, 128, and 512. The figure shows the range where write caching significantly reduces the response time. The figures are obtained from spreadsheet outputs of IOmeter runs. This example shows the usefulness of IOmeter as a stress test tool. Storage Performance Council Benchmarks Storage Performance Council (SPC 1. (business) SPC - Statistical Process Control. Something to do with quality management. 2. (body) SPC - Software Productivity Centre. 3. (company) SPC - Software Publishing Corporation. 4. ) is a group comprised of companies that are predominantly in the data storage and server business. SPC was formed to develop industry-standard benchmarks for storage networks. By overseeing the development and publication of benchmark results, the group aims to provide a level playing field See net neutrality. for storage system vendors. SPC plans to release a series of benchmarks, each intended for use in a different environment. The first benchmark, SPC-1 (SPC, 2002; McNutt, 2001), represents a workload that is both throughput- and response time-sensitive. It was developed by studying the workload of transaction processing systems A Transaction Processing System (TPS) is a type of information system. TPSs collect, store, modify, and retrieve the transactions of an organization. A transaction is an event that generates or modifies data that is eventually stored in an information system. that require small, mostly random, read and write operations (for example, database systems, OLTP (OnLine Transaction Processing) See transaction processing and OLCP. OLTP - On-Line Transaction Processing systems, and mail servers). SPC started working on the SPC-2 benchmark, which will represent workloads with large I/O sizes and mostly sequential access In computer science, sequential access means that a group of elements (e.g. data in a memory array or a disk file or on a tape) is accessed in a predetermined, ordered sequence. Sequential access is sometimes the only way of accessing the data, for example if it is on a tape. . It is intended to emulate video on-demand servers, film rendering applications, and backup/restore operations. SPC benchmark source code is controlled by the member companies, and SPC has put an audit structure in place to authenticate test results. After a test sponsor (a member company) submits benchmark results in the form of a Full Disclosure Report (FDR), auditors assigned by SPC review the results and post them for peer-review for 60 days, at the end of which time the results are assumed to be official. SPC-1 reports two main performance metrics Performance metrics are measures of an organizations activities and performance. Performance metrics should support a range of stakeholder needs from customers, shareholders to employees [1]. . SPC-1 IOs per second (IOPS IOPS Input/Output Per Second IOPS Input/Output Operations Per Second (server performance measurement) IOPS International Organization of Pension Supervisors IOPS Information Operations Planning System IOPS Internet Official Protocol Standards ) represents the highest IOPS rate achieved during the benchmark. Any reported SPC-1 IOPS result should not have a response time greater than 30ms. The second reported metric is the SPC-1 LRT LRT Light-Rail Transit LRT Likelihood Ratio Test LRT Light Rapid Transit LRT Lower Respiratory Tract LRT Lehrstuhl für Raumfahrttechnik LRT Long Range Transportation LRT Light Railway Transit LRT London Regional Transport LRT Loving Relationships Training (Least Response Time), which is obtained at 10 percent of the load level of the reported SPC-1 IOPS rate. Figure 2 illustrates these two metrics. An FDR is supposed to contain the throughput-response time curve, as shown in Figure 2, and the IOPS rate and LRT. In addition, sponsors who want to publish SPC-1 results are supposed to disclose the total price of their Tested Storage Configuration (TSC TSC Thestreet.com (stock symbol) TSC Time Stamp Counter TSC Tuberous Sclerosis Complex TSC Tractor Supply Company TSC Terrorist Screening Center (Department of Homeland Security) ). An FDR contains a cost/performance value in the form of dollars per SPC-1 IOPS. Table 1 presents a summary of public SPC-1 results currently available. Test sponsors are also supposed to disclose the data capacity and the data protection level they have used in the tests, as shown in Table 1. In SPC-1, all host computers involved in the test, the storage network, and the storage subsystems comprise the Benchmark Contiguration (BC), as shown in Figure 3. All the storage- related items, including the host adapters, cables, network switches, and hubs, constitute the Tested Storage Contiguration (TSC). This is significant because the reported system cost involves everything in the TSC. The workload generator in SPC-1 is based on Business Scaling Units (BSUs). One BSU BSU Ball State University BSU Boise State University BSU Black Student Union BSU Bemidji State University BSU Bowie State University (Bowie, Maryland) BSU Baptist Student Union (college religious organization) represents a group of users collectively generating a prescribed I/O demand. Each BSU demands 50 IO operations per second. These are generated through eight streams. Five streams generate random reads and writes. Two of the streams generate sequential reads. The final stream generates sequential writes. The streams are assigned to Application Storage Units (ASUs), as shown in Figure 4. There are three ASUS. ASU-1 is the data store that holds the raw data for the applications. It contains 45 percent of the total ASU ASU Arizona State University (Tempe, AZ) ASU Appalachian State University ASU Arkansas State University ASU Angelo State University ASU Alabama State University ASU Australian Services Union capacity. ASU-2 is the user store that holds an organized, secure store for user files. It contains 45 percent of the total ASU capacity. ASU-3 is the log that provides information consistency for the data in the ASU-1 and ASU-2. It contains 10 percent of the total ASU capacity. An official SPC-1 test, which can be submitted to SPC, comprises several mandatory phases. Figure 5 shows the progress of the test in time (not drawn to scale). Changing the number of simulated BSUs controls the load in SPC-1. To increase the throughput, more BSUs are added, and to decrease the load, fewer BSUs are used. The primary metrics (IOPS and LRT) are measured only in the stable states (shown as plateaus in figure 5). The first phase is the sustainability test, which executes at the highest BSU load for a long period. The intention is to ensure that the highest throughput can be sustained over time. Through the end of the sustainability test, a measurement is taken to obtain the official SPC-1 IOPS rate. Then, in the response time ramp test, the load is decreased gradually to obtain a response time-throughput graph. At the end of the ramp, at the 10 percent load level, SPC-I LRT is recorded. IOPS and LRT measurements are repeated twice to check the repeatability of the test results after system shutdowns. SPC-1 IOPS and SPC-1 LRT results must be within 5 percent of the values obtained in the sustainability and repeatability tests to be valid for official submittal. SPC-1 is the first and only industry benchmark applicable to storage networking environments. Although SPC-1 has a narrow workload scope, SPC is working on additional benchmarks to broaden the covered workload types. File System Benchmarks File systems add another layer above the block FO interface, and they change the storage work- load coming down to the storage devices. File system caching and metadata handling muse a unique type of workload that must be tested with special benchmarks. This category includes Bonnie bon·ny also bon·nie adj. bon·ni·er, bon·ni·est Scots 1. Physically attractive or appealing; pretty. 2. Excellent. , IOzone, NetBench, PostMark, and SPEC SFS (1) (Self-certifying File System) A global, network file system sponsored by DARPA that runs under Unix. Providing strong security in an untrusted environment, it enables any client to access any server that supports it. . File System Benchmarking Considerations File system performance is sensitive to many system configuration parameters, and benchmarking file systems can be a tricky task. The following is a short list of tips to consider when setting up file systems for benchmarking: * File systems (local or network-mounted) buffer write operations in buffer caches before committing them to stable storage (such as hard disks). This data destaging can happen immediately before the write call returns, or later at a synchronization (1) See synchronous and synchronous transmission. (2) Ensuring that two sets of data are always the same. See data synchronization. (3) Keeping time-of-day clocks in two devices set to the same time. See NTP. time or at the file close time. Applications that require absolute reliability an use the synchronous write operations, which force all data to stable storage before an acknowledgement is generated. In most systems, this is done through the 0_SYNC option at the file open time. You should be aware of the method your benchmark is using. Using synchronous write operations will prohibit any gains from write buffering. If synchronous operations are allowed, the benchmark developer or user must decide whether to include file destaging overhead (fsynch, and fflush system calls) in the benchmark timings. * File locking See file and record locking. keeps data consistent when multiple readers/writers are operating on the file. Files can be locked on local or network-mounted file systems. In both cases, locking files might disable file caching. Check your benchmark options for allowing or disallowing file locks. * Memory-mapped files effectively cache the entire file on the client computer's memory. This eliminates almost all disk I/O until the file is closed or a synchronization system call is made. Performance and reliability implications of this behavior are obvious and should be considered in file system benchmarks. The following list contains tips that are especially useful when benchmarking network file systems: * Decide where the benchmark program, data files, and output files will reside. When benchmarking network file systems, executing the benchmark programs from local file systems while the data files reside on the remote mounted directories is advisable * Network File System (NFS (Network File System) The file sharing protocol in a Unix network. This de facto Unix standard, which is widely known as a "distributed file system," was developed by Sun. See file sharing protocol and WebNFS. NFS - Network File System ) client caches must be cleared by unmounting and remounting the remote directories between test runs. This guarantees that the performance of later benchmark runs are not tainted taint v. taint·ed, taint·ing, taints v.tr. 1. To affect with or as if with a disease. 2. To affect with decay or putrefaction; spoil. See Synonyms at contaminate. 3. with the cached data of previous runs. * Server write commit times must be included in the execution time. Any outstanding write operations on the server side are committed to the disk storage at file close time. This could have a large impact on the performance of small files that are entirely in the server cache. * While benchmarking networked storage, the parameters used to set up the network connection will directly impact performance. NFS, for example, can execute over Transmission Control Protocol (TCP (1) (Transmission Control Protocol) The reliable transport protocol within the TCP/IP protocol suite. TCP ensures that all data arrive accurately and 100% intact at the other end. ) and User Datagram Protocol See UDP. (protocol) User Datagram Protocol - (UDP) Internet standard network layer, transport layer and session layer protocols which provide simple but unreliable datagram services. UDP is defined in STD 6, RFC 768. (UDP UDP (uridine diphosphate): see uracil. (User Datagram Protocol) A protocol within the TCP/IP protocol suite that is used in place of TCP when a reliable delivery is not required. ) sockets. The choice between the two protocols will determine where data error handling is performed (in the network stack in the case of TCP, and in upper-layer protocols with UDP). * Socket buffer sizes and TCP window sizes determine the amount of in-transit data over the network. These can be changed by changing the registry keys in MS Windows-based machines, and by manipulating the system parameters in the /proc file system on Linux machines. * NFS allows the user to set the granularity of the read and write data exchange (rsize and wwsize). You will need to experiment with multiple values to find the optimum settings for your environment. * Operational parameters of Ethernet connections are important as well. Full-duplex connections will eliminate contention at the wire level. Many older Ethernet adapter The Ethernet hardware required to attach to an Ethernet network. It typically resides on an expansion board, but is sometimes built into the motherboard. An Ethernet adapter is required in each client and server. See Ethernet and network adapter. cards might be set to half-duplex connections by default. In addition, a larger Maximum Transmission Unit (MTU (1) (Maximum Transmission Unit, Maximum Transfer Unit) The largest frame size that can be transmitted over the network. For example, an Ethernet MTU is 1,500 bytes. Messages longer than the MTU must be divided into smaller frames. ) size (jumbo frames) will reduce the network packets' fragmentation. Network file system performance is proportional to the number of clients accessing the file server. The throughput, as well as the response time, will increase by the number of clients. * The number of NFS daemons on the client (biod) and server (nfsd) affects the throughput. The number of daemons determines the number of operations that can be served simultaneously. * Besides data, file system metadata (inodes and unodes) is cached on both the client side and the server side. Metadata caches allow fast access to file attributes, and they will eliminate disk accesses as long as the data is in the cache. Therefore, the inode cache size is important. Bonnie Bonnie and its variants (for example, Bonnie++) are simple file system workload generators that can be used to quickly test a file system's throughput on UNIX UNIX Operating system for digital computers, developed by Ken Thompson of Bell Laboratories in 1969. It was initially designed for a single user (the name was a pun on the earlier operating system Multics). machines. You can use it for quick comparisons. However, the benchmark results do not have a real-world correspondence. Bonnie uses standard C library calls, which are portable to many platforms. The benchmark performs a series of operations on a large file. A sample output looks like this: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec testsys 500 4332 13.7 4722 .3.3 1413 .2.8 4674 10.3 4744 %CPU /sec %CPU .5.6 52.0 1.0 One of the useful outputs is the CPU utilization percentage that can be used to check whether CPU is a bottleneck. IOzone IOzone is a free, open-source file system benchmark. It enables the study of system configurations on file system performance. The user can set IOzone to generate a wide variety of access patterns and to collect statistics on performance. Rather than being an application-specific benchmark, IOzone is a tool for generating a large number of access patterns. The source code is in ANSI C (language, standard) ANSI C - (American National Standards Institute C) A revision of C, adding function prototypes, structure passing, structure assignment and standardised library functions. ANSI X3.159-1989. cgram is a grammar for ANSI C, written in Scheme. and can be compiled on a large number of platforms, including many Microsoft Windows See Windows. (operating system) Microsoft Windows - Microsoft's proprietary window system and user interface software released in 1985 to run on top of MS-DOS. Widely criticised for being too slow (hence "Windoze", "Microsloth Windows") on the machines available then. and UNIX-based machines. The parameters can be set to generate the following: * Read * Write * Re-read * Re-write * Read/write backwards/strided * Random read/write * Memory-mapped read/write * Asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end. read/write On a single machine, the benchmark can use multiple processes or threads. On multiple machines, IOzone can execute as a distributed file system Software that keeps track of files stored across multiple networks. When the data are requested, it converts the file names into the physical location of the file so it can be found. benchmark, processes or threads. On multiple machines, IOzone can execute as a distributed file system benchmark. IOzone can purge processor reaches and mount/unmount file systems to remove dirty cache effects. It can generate Microsoft Excel (tool) Microsoft Excel - A spreadsheet program from Microsoft, part of their Microsoft Office suite of productivity tools for Microsoft Windows and Macintosh. Excel is probably the most widely used spreadsheet in the world. Latest version: Excel 97, as of 1997-01-14. output data that can be used to draw surface plots showing the interactions between file sizes, access sizes, and performance metrics. IOzone can be configured to generate synchronous or asynchronous I/O Overlapping input and output with processing. Both the hardware and the software must be designed for this capability. The peripherals must be able to run independent of the CPU, and the software must be designed to manage it. operations. IOzone has useful parameters for benchmarking both local file systems and network mounted file systems. A typical invocation invocation, n a prayer requesting and inviting the presence of God. of IOzone for a network mounted file system would look like this: ./iozone -acR -U /mnt/test -f /mnt/test/testfile -b output.xls > logfile In this example, -a is used to let IOzone test all file sizes between 64KB and 512MB and record sizes from 4KB to 16MB. The parameter -c is used to tell IOzone to include write commit times at the end of an NFS V3 file close. The -R option generates output data as Excel spreadsheets. The -U option causes IOzone to unmount the file system between tests. The next two parameters denote the directory and file name to use for data accesses. -b denotes the Excel output filename file·name also file name n. A name given to a computer file to distinguish it from other files, often containing an extension that classifies it by type. . The standard outputs are piped to a local file. Figure 6 shows the results of file system performance data obtained using IOzone. Here, two file systems (fs1 and fs2) are compared for two file sizes (16MB and IGB IGB Istituto di Genetica e Biofisica (Italy) IGB Internationaler Gewerkschaftsbund (German: International Trade Union Federation) IGB Illinois Gaming Board IGB Institute & Guild of Brewing ) and various buffer sizes (4KB to 16MB). The figure clearly shows the effects of various caches on the data path. For small-sized reads from a small file, most of the data is in the processor cache, which yields a very high throughput. The data for the large file size comes directly from the physical disks, causing a big drop in throughput. The point that differentiates the two file systems is the use of the buffer cache for the small file size. The first file system (fs1) effectively uses the buffer cache, while the second one (fs2) bypasses the buffer cache and causes poor performances. These results were obtained on Linux servers using two commercially available journaling file systems. Although IOzone results cannot be used to predict the performance of a particular application on a particular platform, its wide variety of configuration parameters and ease of use make it an excellent tool for diagnosing and debugging (programming) debugging - The process of attempting to determine the cause of the symptoms of malfunctions in a program or other system. These symptoms may be detected during testing or use by real users. the performance pitfalls in file system-based storage networks. NetBench NetBench is a network file system benchmark for Common Internet File System (protocol) Common Internet File System - (CIFS) An Internet file system protocol, based on Microsoft's SMB. Microsoft has given CIFS to the Internet Engineering Task Force (IETF) as an Internet Draft. CIFS is intended to complement existing protocols such as HTTP, FTP, and NFS. (CIFS (Common Internet File System) The file sharing protocol used in Windows. It evolved out of the SMB (Server Message Block) protocol in DOS, which is why the terms CIFS/SMB and SMB/CIFS are sometimes seen. The word "Internet" in the CIFS name has little relevance. ) clients and servers. CIFS is a network file system protocol based on Microsoft's Server Message Block See SMB. (protocol) Server Message Block - (SMB) A client/server protocol that provides file and printer sharing between computers. In addition SMB can share serial ports and communications abstractions such as named pipes and mail slots. (SMB (1) (Small to Medium-sized Business) Also called "SME" (small to medium-sized enterprise), it refers to companies that are larger than the small office/home office (SOHO), but not huge. ), and is the native resource-sharing protocol for Microsoft Windows platforms. Although NetBench is freely available, the source code is controlled by Ziff-Davis, Inc. NetBench accepts workload definition files and replays these workloads on client machines. In standard NetBench practice, the "disk-mix" workload definition file provided as part of the distribution is used. This workload was obtained by collecting traces of popular desktop applications. Figure 7 shows a breakdown of SMB operations generated between a client machine running NetBench and a CIFS server. The figure shows that a vast majority of the operations are writes and metadata access (get attribute, open, close) operations. The workload has a profound effect on performance outcomes. For example, a home-directory server, which keeps and serves user files, will have a very different operation distribution than the one shown in Figure 7. A home-directory server will face mostly metadata-type operations (such as directory opens, closes, searches, and file attribute checks). One study shows that actual read and writes in a home-directory server are less than 25 percent of all operations (Ramany, 2001). Figure 8 shows the distribution of operation sizes generated by the NetBench disk-mix workload. While most of the write operations (updates) are less than 1KB, read operations center The facility or location on an installation, base, or facility used by the commander to command, control, and coordinate all crisis activities. See also base defense operations center; command center. around 4KB. Another important factor determining workload is the place where the workload is defined in the storage network. In NetBench, CIFS workload is defined from the perspective of end-user desktops. However, network-attached storage See NAS. (NAS (1) See network access server. (2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular ) devices are increasingly being used as back- end storage for file servers, Web servers, or database servers. Therefore, the client traffic is filtered and transformed into server traffic before it reaches a CIFS server (for example, a NAS device). Note: Previous studies showed that file servers generate exclusively write-dominant traffic because almost all read and metadata traffic is captured by the large caches on the servers. NetBench executes workload generators on multiple clients (Windows 95/98/NT/2000), which are controlled through a control station (Windows NT/2000). It incorporates a GUI-based control program, which enables the easy launch of the benchmark and generation of output files. The output is in the form of Excel spreadsheets that contain total bandwidth (throughput, in NetBench terms) in Mb/s and average response time in milliseconds. A single setup can be repeated for different numbers of clients. A sample result is shown in Figure 9. The figure combines the NetBench results obtained for two different server file systems. The objective is to study the effect of the server's local file system on overall network file access performance. In this example, the first file system scaled better to a high number of clients and consistently provided better response time. As the figure shows, NetBench performance depends on the number of clients. A high-end file server or a NAS device might require 60 clients before it is saturated. This makes NetBench impractical if you do not have access to a large test-bed. Another concern with NetBench is the small footprint of the accessed data. This causes most of the data to be served from client and/or server caches and makes the benchmark insensitive to back-end stable storage (disk) performance. The server's processing and communication power becomes the key factor for higher NetBench results. To remove the effects of various caches, the caches must be enabled/disabled using separate system configuration manipulations. PostMark Postmark is a very specialized file system workload generation tool. It is intended for testing small file, high-throughput environments such as e-mail and netnews servers. It generates a large pool of small files and performs opens, closes, reads, and updates on that pool. The results are very specific to that environment and are not generally applicable to other workloads. Postmark can exercise local file systems and both CIFS- and NFS-mounted file systems. SPEC SFS Standard Performance Evaluation Corporation The Standard Performance Evaluation Corporation (SPEC) is a non-profit organization that aims to produce "fair, impartial and meaningful benchmarks for computers." SPEC was founded in 1988 and their goal is to ensure that the marketplace has a fair and useful set of metrics to (SPEC) is an industry consortium that develops and publishes a broad range of benchmarks--from CPU to Web server benchmarks--for the evaluation of computing systems. from CPU to Web server benchmarks--for the evaluation of computing systems. System File Server (SFS) is SPEC's benchmark for the performance of NFS servers. It is based on two earlier benchmarks, LADDIS and Nhfsstone. The latest version is SPEC SFS97_RI V3.0. It supports both NFS V2 and NFS V3, as well as TCP and UDP as transport protocols. Newer versions are updated to include workload specifications for modern NFS servers. SFS executes on client computers (which must be UNIX-based) and can access any server that supports NFS. The primary outputs include a table of throughput versus response time. The single figure of merit Noun 1. figure of merit - a numerical expression representing the efficiency of a given system, material, or procedure efficiency - the ratio of the output to the input of any system is the highest throughput obtained at a response time less than 50ms. SPEC SFS is storage throughput-sensitive, and using more spindles will provide better SFS numbers. Instead of using the newer, bigger disk drives, using a larger number of older, smaller drives is more advantageous. Although this might be seen a's a benchmark anomaly, it is a fact of throughput performance. Application-Level Benchmarks Application-level benchmarks stress the system end to end, from the CPU to network, to storage devices. These benchmarks emulate business application loads, and their results are more mean- ingful in their application domains. This category includes TPC (Transaction Processing Performance Council, San Francisco, CA, www.tpc.org) An organization devoted to benchmarking transaction processing systems. In order to derive the number of transactions that can be processed in a given time frame, TPC benchmarks measure the total performance of and SPEC benchmarks. TPC Benchmarks The Transaction Processing Performance Council Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1985 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. (TPC) is a group of companies that produces benchmarks for transaction processing Updating the appropriate database records as soon as a transaction (order, payment, etc.) is entered into the computer. It may also imply that confirmations are sent at the same time. Transaction processing systems are the backbone of an organization because they update constantly. and database applications. Most of the TPC benchmarks are system-level, end-to-end benchmarks that exercise almost all parts of the computing system, including the clients, the network, the servers, and the storage subsystems. The current flagship TPC benchmark is TPC-C A benchmark that measures overall transaction processing performance. See TPC. , which simulates an Online Transaction Processing See transaction processing and OLCP. (OLTP) environment with multiple terminal sessions in a warehouse-based distribution operation (TPC, 1998). It contains read-only and read/write operation mixes that simulate new-order, payment, order-status, stock-level, and delivery transactions. TPC-C can be scaled quite well by increasing the number of warehouses and the users. The TPC-C metrics include throughput (new-order transactions per minute, tpmc) and price/performance ($/tpmC). TPC-C is a widely accepted benchmark with results submitted from all major systems companies. Because it stresses all components, it is hard to tell the effect of storage subsystems on TPC-C results directly, unless the storage subsystem is the bottleneck. One obvious expectation from storage subsystems is a high throughput rate Throughput rate is an obsolete term[1] in the terminology of automated chemical analysis. It may mean either:
1. ^ International Union of Pure and Applied Chemistry. "throughput rate". , rather than high data bandwidth. TPC-H TPC-H Transaction Processing Council Ad-hoc/decision support benchmark (computer performance) and TPC-R TPC-R Transaction Processing Performance Council - Decision Support Benchmark benchmarks simulate Decision Support System (DSS (1) (Digital Signature Standard) A National Security Administration standard for authenticating an electronic message. See RSA and digital signature. (2) (Digital Satellite S ) environments. However, they are not as popular as TPC-C, and most of the time vendors ignore them. TPC-W TPC-W Transaction Processing Performance Council - E-Commerce Benchmark is one of the latest benchmarks from TPC (TPC, 2000). TPC-W simulates a transactional Web environment such as that seen with e-commerce sites. It provides performance and price/performance metrics. It is modeled after a Web bookstore. Primary transactions include browsing, shopping, ordering, and business-to-business transactions. TPC-W's primary metrics are Web Interactions per Second (WIPS WIPS Wireless Intrusion Protection System (Symbol) WIPS Wireless LAN-based Indoor Positioning System WIPS Whetstone Instructions Per Second WIPS Web Interaction Per Second WIPS Weather Information Processing System ), dollars per WIPS ($/WIPS), and Web Interaction Response Time (WIRT WIRT Web Interaction Response Time ). TPC-W improves over TPC-C/H/R by requiring a very detailed system performance disclosure that includes the CPU utilizations, database logical and physical I/O activity, and network and storage I/O rates. SPEC Web SPEC produced a series of Web server benchmarks over the years (Eigenmann, 2001). The latest version, SPECWeb99, is based on Web workloads obtained from logs of large Web installations and agreed upon Adj. 1. agreed upon - constituted or contracted by stipulation or agreement; "stipulatory obligations" stipulatory noncontroversial, uncontroversial - not likely to arouse controversy by major server vendors. A companion benchmark, SPECWeb99_SSL (Secure Sockets Layer) The leading security protocol on the Internet. Developed by Netscape, SSL is widely used to do two things: to validate the identity of a Web site and to create an encrypted connection for sending credit card and other personal data. , measures the performance of Web servers using secure communication protocols. Newer versions reflect the latest developments in Web technology, including dynamic HTTP HTTP in full HyperText Transfer Protocol Standard application-level protocol used for exchanging files on the World Wide Web. HTTP runs on top of the TCP/IP protocol. , rotating ads, cookies, and so on Similar to SPEC SFS, SPECWeb99 is a client-based benchmark and supports any Web server capable of serving HTTP. The benchmark's primary outputs are a table of requested load and response times. The peak throughput is the single figure of merit, with no limits on response time. Web servers generate mostly read, random, small-size storage 1/0 operations. Therefore, SPECWeb99 will be sensitive to the throughput of the storage system for such I/Opatterns. Real Applications as Benchmarks Common assumption says that the best benchmark for testing alternative systems is the real application that will be used on these systems in the production phase. Although there is truth in this argument, there are some pitfalls as well. The problem with real applications is that their workload is very difficult to control. The real workloads are mostly dynamic, time- and input-sensitive, which makes repeating the same execution twice almost impossible. Without repeatability, comparing two configurations in a meaningful way is difficult. Real applications are also difficult, costly, and time-consuming to set up. Therefore, the application under study is the best benchmark for a purchase decision or performance tuning Performance tuning is the improvement of system performance. This is typically a computer application, but the same methods can be applied to economic markets, bureaucracies or other complex systems. only if it enables the generation of repeatable workloads. Performance results obtained from applications will not be publishable because they will not be repeatable out of the test-bed in which they are obtained. Tested ASU Data FDR Storage SPC-1 SPC-1 Capacity Protection Submission Configuration IOPS LRT (GB) $/IOPS Level Date 3PAR InServ 47,001 2.34 4,444.44 $34.65 Mirroring 10-Oct-02 S800 Storage Server Dell Corp., 7,650 3.10 440.00 $4.48 Mirroring 19-Jun-02 PERC3/QC SCSI RAID Controller HP StorageWorks 24,006 2.29 2596.3 $22.00 Mirroring 2-Oct-02 Enterprise Virtual Array Model 2C12D IBM Enterprise 8,009 2.99 1,259.85 $44.58 RAIDS 20-May-02 Storage Server F20 LSI E4600 FC 15,708 1.64 400.00 $16.01 Mirroring 20-May-02 Storage System Sun StorEdge 8,404 2.07 343.51 $74.29 Mirroring 20-May-02 9910 Table 1. SPC-1 Results Submitted as end of October 2002 |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion