Printer Friendly

An overview of Motorola's PowerPC simulator family.

The successful introduction of a new computer architecture into the marketplace requires that both software and hardware be available simultaneously at the time of system introduction. Moreover, there is substantial need for software tool support (e.g., compilers and simulators) during the design phase of the microprocessor itself. Such phased development necessitates coordination among several groups:

* Application and operating system software developers need to be able to compile, execute and debug their programs so applications and operating systems are available at system introduction.

* System hardware designers need to be able to design, simulate and debug circuit boards, system ASICs and monitor executive software so the system (e.g., a workstation and microcomputer) can be integrated as soon as possible after the availability of working silicon.

* Microprocessor designers themselves need software tools in order to write and debug diagnostic software used to debug chip designs and test the functionality of the final silicon.

* Prospective customers (either systems vendors or end users) need to be able to "kick the tires" of a new microprocessor design by compiling well-known or application-specific benchmark code and running it on a simulator in order to get estimates of system performance.

An essential part of each of these functions is a software simulator of the new microprocessor(s) that solves the designer's needs. Currently, no single simulator program can solve the diverse needs of these user groups. However, any microprocessor simulator should have several characteristics that enhance its utility to the end user, such as being highly configurable (in order that an appropriate configuration of the simulator be available to solve the particular problems of the end user), extensible (so that the end user's effort in using the simulator has maximum effect and to avoid unnecessary redesign and "creeping featurism" by the simulator implementers), and as efficient as possible (to allow the end user to iterate through as many compile-execute-debug cycles as possible in a given amount of time and to minimize the time necessary to run a benchmark).

Motorola and IBM are currently developing the PowerPC family of RISC microprocessors [7, 8, 9]. To support this effort, Motorola's RISC software team is developing new software tools, including state-of-the-art compilers and simulators. This article describes three members of Motorola's PowerPC simulator family. All of these simulators are based on a common set of object-oriented software technology.

Features of Motorola's PowerPC Simulators

All members of Motorola's PowerPC simulator family share some common features. This common feature set ensures that support code developed on one simulator will run appropriately on another member of the family. It also lessens the burden of learning a new set of commands for a new simulator.

Command line interpreter. All of Motorola's PowerPC simulators use Tcl [6] as a command line interpreter (CLI) language. Tcl (a "tool command language" which is pronounced "tickle") is an efficiently parsed language created by John Ousterhout of the University of California at Berkeley. Tcl offers excellent facilities for string manipulation and program extension. Many of the system support utilities are implemented in Tcl.

Tcl allows the user to define variables and subroutines, implement flow control and in general adapt the user interface to the particular needs of the user. Tcl allows tedious sequences of simulator commands to be replaced by parameterized command shortcuts. Also, the simulators try to make as few policy decisions as possible. Instead, these policy routines may be implemented as Tcl subroutines. Then, these routines are invoked as a callback routine from the simulator command interpreter. In this way, the end user may implement arbitrarily complex task handler routines for managing simulator events or error conditions.

A distinct advantage to Tcl as a CLI language is the availability of the Tk extension to Tcl. Tk is a widget library that works with the X Window System in order to facilitate rapid development of graphic user interfaces (GUIs). Motorola's PowerPC architectural and timing simulators will be extended with user-extendable GUIs based on the Tk toolkit in future versions.

There is also a very easy-to-use alias facility (also implemented in Tcl) which allows an end user to use one-word abbreviations for longer commands, much like the alias facility availability in some Unix shells.

Operating system emulation mode. All of Motorola's PowerPC simulators were written to simulate the entire virtual machine presented by the corresponding PowerPC microprocessor. This in particular allows the end user to boot, debug and analyze the performance of an operating system kernel. This capability requires that an appropriate kernel be compiled with the correct device drivers and that corresponding external device simulation modules have been written and dynamically linked into the simulator.

Booting a kernel is an excessive amount of work if the goal of simulation is to evaluate the performance of user-level programs. Therefore, the Motorola PowerPC simulators have the ability to intercept the system call trap and to vector the simulator code to a predefined set of subroutines which emulate the operating system trap interface. The simulators come with an emulator for AIX (Advanced Interactive eXecutive, IBM's version of Unix). However, this OS emulation module may be replaced by the end user by means of a user-supplied trap table and trap emulation code which is linked at run time by means of a shared object file.

Support for multiprocessing configurations. The Motorola PowerPC architectural and timing simulator families provide direct support for multiprocessing configurations. In the case of the architectural simulators, a very simple nonpipelined one-instruction-per-clock timing model is used for the shared memory bus. The timing simulators use a clock-phase accurate simulation of the shared memory bus to achieve support for multiprocessing. In both cases, the full semantics of the shared memory system (in particular, the bus snooping required to implement a MESI cache protocol) are implemented.

Object-oriented design in C++. All of the simulators are implemented in C++ [4, 5] and have an internal design which follows the Booch methodology [2]. This methodology has improved the reuse of code among all of the members of the simulator family.

C++ and the Booch methods offer several features that facilitate the implementation of this set of simulators. This is especially important when one considers the diversity of microprocessors modeled and that some of these microprocessors offer primitives which are unavailable using more conventional programming languages.

The PowerPC Architecture supports both signed and unsigned 64-bit integers. However, all of Motorola's PowerPC simulators will be hosted initially on 32-bit architecture. Moreover, conventional languages (e.g., C) do not typically offer a 64-bit integer type. C++ allowed us to define a 64-bit integer class and to define operators on that class that make using 64-bit integers as simple (from a syntactical point of view) as the usual 32-bit integers.

The ability to define an abstract register class has allowed us to largely ignore the differences between 32-bit and 64-bit instructions in the source code of the instruction handlers. Where the instruction handlers do differ in semantics from implementation to implementation, they are declared as virtual functions. The customization of such functions can then be incrementally added and maintained.

The ability to derive support classes for the internal interfaces used in the timing simulators allowed specialized classes (representing microprocessor functional units) to take advantage of the interfaces (and constructors) of the ancestor classes. This feature of C++ simplified the construction of a simulated microprocessor and avoided many of the programming problems associated with command parsing and distribution.

The PowerPC Architectural Simulator

The PowerPC Architectural Simulator (PPCArch) family is a set of programs that emulate various PowerPC microprocessors (see Figure 1). Currently, the PowerPC 603 and 604 microprocessors are simulated by different versions of PPCArch, and other models will be simulated in the future. These are instruction set architecture (ISA) simulators. This means that no modeling of timing effect is done by the simulators and each instruction simulated is assumed to complete before the next instruction starts: there is no simulation of a pipeline in PPCArch. The goal is to run PowerPC object code as accurately and as quickly as possible.

The PPCArch simulators are based on a type of simulator originally developed for the Motorola M88K RISC microprocessor by Robert Bedichek for Tektronix [1]. Bedichek developed a style of threaded code simulator that used a unique C language and assembly-code macro function to emulate each instruction in the 88100. He was able to decode an instruction once and use the decoded form many times, depending on locality of code reference and size of simulated instruction cache. He was also able to simulate the 88K virtual machine sufficiently to boot Unix on the simulator. The performance of this simulator was also very impressive: an average of 130,000 instructions per second when hosted on a 2.5MIPS 68020 Tektronix workstation.

Like Bedichek's simulator, each instruction has its own simulation function (the instruction handler) in PPCArch. However, unlike Bedichek we implement our handlers in portable C++ as opposed to assembler macros or nonportable language features. We are willing to trade execution speed for maintainability and portability as PPCArch will be hosted on multiple architectures. We also cache our decoded instructions, but we use an abstract cache of arbitrary size as opposed to using a model of the actual instruction cache.

The basic design of the physical memory system in PPCArch was also inspired by Bedichek's design. We have a physical memory manager which allocates physical memory pages by an on-demand (i.e., "lazy allocation") scheme. The user is able to allocate a physical memory system of a fixed size without paying the potentially high start-up costs of allocating large chunks of host heap memory. Instead, the physical pages are allocated as they are touched.

An interrupt and bus simulation module provides the interface between the physical memory manager and the caches. The interrupt/bus manager has the ability to attach at run time an arbitrary number of memory-mapped I/O device simulator modules to specified ranges of physical address. Whenever such an address range is read from or written to, the appropriate memory-mapped handler is invoked with the address of the access, the type of access (read, write or execute) and the data (if appropriate) as arguments. By way of a simple deferred interrupt manager and an interface to the real memory system, the memory-mapped handler can simulate the actions of a memory-mapped DMA device without knowing the details of how the simulator works. In this way, device drivers (that part of an OS kernel which directly manages hardware devices) can be tested by writing simple memory-mapped handlers to simulate the associated devices.

The interrupt/bus module also supports symmetric multiprocessing simulation configurations. In this configuration, cpu, effective address translation and cache modules are instantiated as a unit and associated with the interrupt/bus module. The interrupt/bus module provides support for snooping between the caches and physical memory. This configuration will allow OS developers to debug multithreaded operating systems.

The effective address translation module, similar to the physical memory module, has a lazy allocation scheme for its structure maintenance. However, the effective address translation module's primary function is to maintain and to accelerate the effective [right arrow] virtual [right arrow] physical translation and to support the semantics of the PowerPC effective memory system. Both memory systems use sparse inverted tree structures to implement the address searches.

Memory-mapped handlers (similar to the instruction handlers and the I/O simulation handlers) are used to implement traps and memory-resident breakpoints. The breakpoint handlers are set using a user-defined code which is returned to the CLI when the user hits the breakpoint. The user is then free to perform whatever manipulations are appropriate to the breakpoint code returned.

The full virtual machine model is primarily implemented with the instruction handlers and with special handler functions that are associated with the supervisor visible control registers. This allows us to treat control registers as uniformly as possible while allowing the diversity of function associated with them.

AIX emulation is accomplished by using a monolithic memory model and a special trap handler which is associated with the system call trap vector. When a system call trap is encountered, control is transferred to the system call trap handler. This handler then examines the appropriate general-purpose registers and emulates the system call using the host operating system. The necessary stack space for the user program is mapped separately using CLI routines. Although not all of the AIX system call interface is provided, provision is made for this set of calls to be extended by the user.

The code for PPCArch (which supports both the 603 and 604 PowerPC microprocessor models) comprises approximately 49,000 lines of noncomment, nonblank C++ code.

The PowerPC 603 Timing Simulator

The PowerPC 603 Timing Simulator (PPCSim603) is the first of a series of simulators that accurately model both the instruction set semantics and the detailed pipeline behavior of a PowerPC microprocessor implementation. The basic design emphasis is similar to the design of the PPCArch simulators, although the organization of the simulator internal to the CPU module is significantly different from the architectural simulators. In particular, the organization of PPCSim603 is based on the microarchitectural features of the actual PowerPC 603 microprocessor, as opposed to the architecture of the PowerPC instruction set as a whole. Therefore, the structure of PPCSim603 is organized around functional units and the synchronized communication among functional units.

Internal to each simulated PowerPC 603 microprocessor in PPCSim603 are functional units which correspond to the actual functional blocks in a physical PowerPC 603 microprocessor. Each of these functional units is implemented as a C++ class and has a specific procedural interface appropriate to its functional type. This procedural interface comprises its public member functions. For example, the bus interface unit (BIU) will have functional interfaces which allow it to communicate directly with the caches and with the external bus. A unit also derives functional interfaces from the classes from which it is derived. This use of C++ inheritance greatly facilitates the implementation of clock and command distribution networks.

Synchronization of the functional units is accomplished by use of a simulated two-phase clock. A functional unit may make no assumptions regarding its order of invocation with respect to any other functional unit within a clock phase. The clock is initially toggled by the user from a command to the bus. The bus distributes the clock signal to all of its bus devices, which include the memory system and any PowerPC 603 microprocessors attached to it. Within the simulated PowerPC 603, the bus clock signal may be multiplied and then redistributed from the BIU to all remaining clocked simulator functional units. The code within a functional unit (for instance, the code which might simulate a floating-point unit pipeline) is responsible for its own synchronization.

Any functional unit may receive commands directly from the command line. The basic syntax of this command is: verb unit arg . . . where verb is either display or modify, unit is the pathname to the functional unit of interest (for instance, motherboard.cpu3.fpu) and arg . . . is an arbitrary list of arguments which is passed into the functional unit for further interpretation. This simple control command syntax greatly simplifies command parsing. The dispatch network that supports the delivery of these commands is managed transparently to the functional unit implementer. To begin to receive commands, a functional unit is simply derived from the Commandable C++ class, and its virtual functions display and modify redefined to suit the particular functional unit.

PPCSim603 has an event mechanism which allows the end user to instrument a simulator configuration. Events occur when the simulator changes state. An end user can easily write his or her own program to analyze the events that are generated from a simulation run. Events are generated by predefined code internal to the functional unit modules within PPCSim603. Each functional unit can individually enable or disable generation of events. There is also a global event enable switch within the CLI. Examples of event types include cache hits and misses, bus cycles, instruction pipeline state changes and so on.

The event's value is a character string, with colon-separated name = value pairs. For instance, a hypothetical instruction event may have the value:

evtype = instr:opcode = 0 x 48000000:unit = fxu:addr = 0xf000:stage = finish:clock = 10

In this example, there are six name = value pairs in the event. In order, they indicate that the event type is an instruction event, the hexadecimal opcode is 0x48000000, the event takes place in the fixed-point unit, the address of the instruction is 0xf000, the stage of the instruction pipeline indicates that the instruction is finished and that the clock count is 10. This form of event is particularly simple to parse and evaluate in Tcl code.

When an enabled event occurs, a global Tcl procedure called an event handler is called with the value of the event. Within this routine, a user may filter events, create histograms, gather and process instruction statistics and so forth. The default event handler merely echoes the value of event string to the command line.

Events have proved to be extremely useful during the development of PPCSim603. Events have been used for internal debugging. Instruction trace formatters have been written using the instruction event as a generation mechanism. Finally, events have been used to verify the timing accuracy of PPCSim603. In this last application, a stream of instruction events is parsed and an instruction timing histogram (using Tcl's associative arrays) is constructed during the simulation of a program. This histogram is then compared to a histogram generated from an instrumented version of the PowerPC 603 hardware models, and the variances were analyzed to detect timing inaccuracies. This verification of timing accuracy would be much more difficult to implement without the event mechanism.

Interactions between the PPCSim603 development and test teams made it clear that error handling, similar to event handling, is a policy decision best left to the end user. Although an error could be considered to be just another kind of event, it was decided that having a separate handler for events made the most sense for a casual simulator user. Therefore, an error handler routine (analogous to the event handler) is called when an error condition arises during a simulation run. Error state information is held in two global Tcl variables (errorCode and errorInfo), which can be interrogated from within the body of the error handler routine. Errors fall into two general categories:

* an error condition may be related to an inappropriate request to the underlying operating system (for example, the user may try to open a nonexistent file), or

* a simulator command (e.g. display or modify) is in error.

In either case, the error is analyzed and additional information is presented in errorCode and errorInfo for use by the user's error handler.

The code for PPCSim603 comprises approximately 45,000 lines of noncomment, nonblank C++ code.

The PowerPC 603 Functional Model

Neither PPCArch nor PPCSim603 directly addresses the needs of the system hardware designer. However, PPCSim603 was carefully designed to preserve the interfaces which are present in a real microprocessor system. In particular, the PowerPC 603 CPU module was driven by a clock distributed from the simulated bus, and the memory system was accessible only from that bus. The adherence to these hardware-oriented interfaces by the PPCSim603 implementers made the CPU code from PPCSim603 especially adaptable to other simulation environments. Indeed, one of the requirements of the PPCSim603 design was that the CPU code could be adapted to an industry-standard hardware simulation environment by means of a software "wrapper." The ability to adapt code to run in a different simulation environment also leverages the many engineer-months of testing that go into bringing a simulator to market.

PPCSim603 was adapted to run under the Cadence programming language interface (PLI) for the Verilog simulation environment [3, 10]. This conversion was accomplished by replacing a particular C++ class (the Pin class) with a variant class that implemented the PLI interface. The real-memory system was also adapted to run under the PLI interface. Neither the CPU nor the real-memory system can detect whether its environment is standalone (i.e., PPCSim603) or Verilog.

The CLI code from PPCSim 603 was also included in the module which runs under the PLI interface. This is not strictly necessary, since all of the semantics of the processor are accessible from the Pin interface. However, it proved especially convenient to do this, since much of the test code for PPCSim 603 was written in Tcl and this code required the CLI in order to run.

This adaptation of the basic PPCSim603 code from standalone simulator to embedded hardware functional model took approximately three engineer months of effort. Much of this effort was spent in learning the Verilog PLI. This PLI has been developed for the C language, but the PPCSim603 CPU module is written entirely in C++. Although there is a mechanism in C++ for dealing with C externals, the reverse situation is not true. In particular, some of the object-oriented semantics associated with C++ classes (the constructors for static objects) caused some initial problems. This problem was solved by having a C++ main() routine call back the Verilog PLI main() routine (which was renamed vmain() and declared an extern "C" function). All compilation was done by the native C++ compiler and all linkage done by the native linker. The initial development was done on an IBM RS/6000 workstation running AIX, and used the xlC C++ compiler.

Approximately 900 lines of C wrapper code were necessary to implement the hardware environment interface. In addition, approximately 1,500 lines of (conditionally compiled) C++ code was added to the source code base for PPCSim603. There are currently no plans to turn this PowerPC 603 functional model into a product, since there are other alternatives which satisfy this market segment. The design constraints that permit a standalone simulator to be adapted to a hardware simulator PLI will prove valuable nonetheless. This constraint will allow Motorola to produce PLI-compliant hardware simulation modules much earlier in the microprocessor design cycle than was previously possible.

Summary

PPCArch and PPCSim603 are the first members of a family of simulators developed by Motorola's RISC software group. They are intended to serve a broad audience of computer system software and hardware engineers.

PPCArch is an architectural simulator that supports both the PowerPC 603 and PowerPC 604 microprocessor models. It is useful to system and application software engineers as a general simulation environment. It has been used to simulate hundreds of application programs for a variety of operating systems, and is being used by OS developers to port OS kernels to the PowerPC Architecture.

PPCSim603 is the first of Motorola's family of cycle-accurate timing simulators for PowerPC microprocessors. It has a number of innovative features, including a user-programmable command line interpreter, a bus interface for user-defined memory-mapped devices, a user-extensible OS emulator and flexible event and error-handling facilities. Using an object-oriented design approach, the CPU core of PPCSim603 was successfully adapted to run in Cadence's Verilog environment with a minimum of additional engineering.

PPCArch and PPCSim603 are the first major projects in M&MTG RISC Software to be implemented using object-oriented design techniques and using the C++ language. The software technology developed in these projects will see much reuse in future Motorola software simulation products.

References

[1.] Bedichek, R. Some efficient architecture simulation techniques. In Proceedings of Winter USENIX, 1990.

[2.] Booch, G. Object Oriented Design with Applications. Benjamin/Cummings, Calif., 1991.

[3.] Cadence Design Systems, Inc., Programming Language Interface Reference Manual for Verilog. Vol. 1 and 2, 1992.

[4.] Ellis, M. and Stroustrup, B. The Annotated C++ Reference Manual. Addison-Wesley, Reading, Mass., 1990.

[5.] Lippman, S. C++ Primer Second ed. Addison-Wesley, Reading, Mass., 1991.

[6.] Ousterhout, J. Tcl: An embeddable command language. In Proceedings of Winter USENIX, 1990.

[7.] PowerPC User Instruction Set Architecture, Book I, Version 1.04, May 4, 1993.

[8.] PowerPC Virtual Environment Architecture, Book II, Version 1.04, May 4, 1993.

[9.] PowerPC Operating Environment Architecture, Book III, Version 1.04, May 4, 1993.

[10.] Voith, R. The PowerPC 603 C++ Verilog interface model. In Proceedings of Spring Compcon '94, San Francisco, 1994.
COPYRIGHT 1994 Association for Computing Machinery, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1994 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:The Making of the PowerPC
Author:Anderson, William
Publication:Communications of the ACM
Article Type:Cover Story
Date:Jun 1, 1994
Words:4025
Previous Article:A modular approach to Motorola PowerPC compilers.
Next Article:Lessons from a restricted Turing test.
Topics:

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters