Using embedded platform management with WBEM/CIM: add IPMI to provide "Last Mile" Manageability for CIM-based solutions. (Enterprise Networking).
Equally, IPMI has also benefited from a multi-vendor consortium. The promoters, Dell, Intel, HP and NEC, over the past five years have standardized the hardware platform management interface to an infinite range of components- power, fans, chassis, controllers, sensors, etc.-and is now supported by 150 other adopters. Both CIM and IPMI thus offer key benefits. But how do they complement each other within the Data Center? How do you exploit these standards and make money doing so?
"Plus ca change, plus c'est la meme chose": The more things change, the more they stay the same. For all the innovation, today's Enterprise Data Center appears to be very similar to the one 10 years ago. Application (and the resulting server) sprawl has always challenged the applications or server administrators to set-up, deploy and manage at the farther corners of the business empire. New technologies still inundate the enterprise with promises of 'breakthrough' ROI--everything is 'web-based.' Everything can be dynamically discovered. Application interoperability is seemingly achieved in a heartbeat.
New product offerings promise tantalizing benefits. Symmetrical Multi-Processing (SNIP) clusters simplify management and thus reduce TCO. Deploying new form factors, such as blades technology, appear as the ultimate weapon for dynamic resource allocation, improving price/performance using power saving designs at ever increasing MIPS/rack. Virtualization then takes its place as the new mantra--build systems that can accommodate 'Resources on Demand'--Compute, Storage, I/O are decoupled from the reality of the rack. Grid computing is now 'on tap' in true utility-like fashion. It's enough to make you tear your hair out.
Well, so much for no change--the Data Center remains at the 'center' of the 'data' universe. However, it should come as no surprise that every Data Center is different. Place the same equipment and same number of people in two identical buildings with the same goals of reducing TCO and improving reliability, availability and serviceability, and you'll get two very different results. Why? Well, leaving aside human fallibility, most systems are modular to some degree. Like Lego blocks, the construction process behind it differs. While there are many characteristics to building out a successful Data Center infrastructure, the chances of success increases when using a building block approach based on products that support standards. Vendors who embrace a modular building block approach can deliver very real benefits to the Data Center, especially in delivering interoperable management to server and network administrators. Products that support modularity have three common characteristics:
Standard Interface: A standardized interface within the device that allows you to monitor and control the components as and when it's needed without regard to the BIOS, OS, Processor or color of the box. The ability to manage a modular server like a single system is crucial. This interface then needs to be exposed to support integration with existing management points. This is vital to ensure a holistic and unified view is retained in a complex and dynamic network infrastructure. Finally, this common interface must be 'Always On' (highly available) and be programmatic so that different applications and other standards can support and extend it.
Platform Instrumentation: Platform instrumentation is key, as without a certain level of "embedded intelligence" you'll be left on the outside looking in. And because no network is truly identical or homogenous, cross platform management requirements are pragmatic. This is best achieved by using standards-based building blocks. Standards help provide support for the basic hardware elements that make up a system--processor, fans, temperature, power, chassis, etc., irrespective of the manufacturer or state of the system.
Support Extensibility: Building blocks also have to embrace differentiation. They need to support the common hardware, firmware and software elements but also allow expansion to support unique elements. They also need to be utilized and reusable by other OEMs, OSVs, IHVs and ISVs. This modularity supports differentiation by adding features specific to certain markets while also retaining existing development IP.
Starting with CIM, let's examine some of the other embedded management technologies that offer Data Center equipment vendors a modular, open building-block approach.
The Distributed Management Task Force (DMTF) introduced the Common Information Model (CIM) 2.0 in 1998. Its purpose is to define an open and consistent method of describing manageable devices and systems. By using an information model with standardized management information, the goal is to abstract, organize and communicate the available information about the managed environment. CIM, now at v2.7, has two parts--the specification and the schema. The CIM specification is all about the language and naming conventions used, usage and properties about devices, but most importantly (to this discussion at least), the mapping to other management standards such as SNMP and IPMI. The CIM Schema includes models for Systems, Applications, Networks, Devices, Events and vendor extensions. In fact, there are now over 800 IT entity classes. Typically, vendors express this management information in an ASCII or UNICODE file (MOF-Managed Object Format) for use by an application. For example, the windows Management Interface ( WMI) is Microsoft's adaptation of CIM for Windows.
CIM also forms the basis of the WBEM (Web Based Enterprise Management) initiative, also at the DMTF. WBEM offers additional standards to help standardize the communication and representation. xmlCIM uses XM!L to represent the 800+ CIM classes and instances. It acts as the binding of entity class instance 'state' to an XML representation. CIM operations over HTTP use these xmlCIM bindings to transport CIM information across open systems. It is interesting to note that Microsoft's WMI is based on CIM schema v2.2. However, MS DCOM (Distributed Component Object Model) is used in place of WBEM's HTTP CIM Ops.
In terms of implementation, there are five elements to consider; providers (an agent that provides data about the managed device); a management client (an application using that data); a CIM Manager and information repository for that data (CIM Object Manager and Database); the managed device; and finally, the communication between these elements (WBEM). Figure 1 shows a conceptual view of CIM to demonstrate the hierarchical approach.
Why CIM Needs Embedded Manageability
As you can tell, there's a lot of software involved that coordinates and makes CIM management happen. Sometimes though, OS resident software just isn't enough. Let's take the example of a failing fan in a chassis sitting in a rack in the far corner of the Data Center. 'Fan out of Threshold' on its own is important data, primarily as it indicates the potential for service interruption. However, if the CPU decides to overheat, that Oracle database isn't going to complete those online credit card transactions. The user gets frustrated with yet another failed shopping cart, and cancels her order. Sounds familiar. So now the server has overheated and the application is but a faded memory behind a blue screen. Now we have a serious problem. How do you respond now to that failed node? By using a CIM-based management application, this failing node may have alerted the administrator as an event on a map, as an e-mail or as an alarm on a network map. But what if the server was running a management agent? Of course, com munication is now lost. Frequently, the server needs someone to restart it and possibly redirect the BOOT to a different device/disk. When it comes down to hardware issues, software agents cannot recover that failed system. So what can? What exists at the hardware level, complimenting CIM's software level management stack that can autonomously recover and react to such an outage? Before we get in any further, it's necessary to step back and take a look at some of the management standards and a few definitions that will be helpful moving forward:
* OS-absent state: No OS has been loaded. System is in a 'Bare Metal' state.
* Pre-OS state: POST/BOOT phases.
* OS-Present state: Normal operational running.
* OS-Unresponsive/Hung state: Unrecoverable error. BSOD (Blue Screen of Death).
Management Layers and Technologies
Different devices serve different purposes unique to their task. This requires different approaches as to how they are built, configured, deployed, managed and maintained. Management is a complex business for all. The requirements for managing an application on a high-end server are different to managing power to a rack. So while technologies are different, typically the usage scenarios (how people use the management features) are not. As we've discussed, we will focus on those technologies that require an embedded approach (normally resident below the OS), complementing CIM to provide additional platform services. We'll start be defining some that are in use today.
Pre-boot eXecution Environment (PXE)/Boot Integrity Service (BIS): PXE is a common and consistent set of pre-boot firmware services whose sole purpose is the dynamic acquisition of a network boot program (NBP) over the LAN. BIS is an addition to PXE and uses public key crypto to digitally sign bootable OS images to ensure authenticity of the boot image.
BIOS/Extensible Firmware Interface (EFL): The BIOS is responsible for booting the computer by providing a basic set of instructions. It performs all the tasks that need to be done at start-up time: POST (Power-On Self Test, booting an operating system from another disk/drive). Furthermore, it provides an interface to the underlying hardware for the operating system in the form of a library of interrupt handlers. EFI, originally introduced as part of the Wired for Management (WfM) initiative, represents a new model for the interface between operating systems and platform firmware. The interface consists of data tables that contain platform-related information, plus boot and runtime service calls that are available to the operating system and its loader. (Note: Compared to PXE/BIS, EFI provides additional authentication. It also allows an OEM to extend the services/targets it supports.)
Systems Management BIOS (SMBIOS)/System Management Bus (SMBus): SMBIOS is a DMTF specification for how motherboard and system vendors present management information about their products in a standard format by extending the BIOS interface. The information is intended to allow generic instrumentation to deliver this information to management applications that use DMI, CIM or by direct access. SMBus was defined by Intel Corporation in 1995. It is used in personal computers and servers for low-speed system management communications. It is based on the Phillips (two- or three-wire) i2c bus standard.
Advanced Configuration and Power Interface (ACPI)/OSPM (OS-Directed Power Management): An Industry-standard interfaces for OS-directed configuration and power management on laptops, desktops, and servers using power modes (suspend, hibernate, etc.). ACPI is the successor to Advanced Power Management (APM).
Alert Standard Format (ASF): A remote control and alert messaging interface that best serves clients' OS-absent environments. ASF is defined by the DMTF pre-OS working group.
Service Availability Forum (ASF)/Hardware Platform Interface (HPI): Industry standard (open) interface specification for monitoring and controlling highly available systems. The design focus and primary application is telecommunications systems needing extreme reliability, availability and serviceability (RAS).
In summary, many of these technologies require an OS hosted "agent" for remote access. Many also have no library standards for user or system access, and especially not industry wide. Finally, the implementation focus is on a one-host/board/blade approach. Also, none of these technologies offer the 'Last Mile Manageability' benefits most eagerly sought in the Data Center. For the Data Center to anticipate and recover from hardware events and outages, a new embedded management approach is required.
IPMI 1v.5 Offers "Last Mile" Manageability
IPMI (Intelligent Platform Management Interface) is a standard for server or Industrial PC/Device management initiated by Intel, HP, NEC and Dell back in 1998. It is a management technology that monitors server hardware health condition like temperature, fan, voltage and chassis intrusion. It is implemented on the server motherboard (or a blade) using a dedicated IPMI chip. This chip, or Baseboard Management Controller (BMC), is a standalone microprocessor that can control all the sensors and the power, even when the OS is hung. It is also agnostic to the BIOS, processor, backplane and OS, making it extremely versatile.
At the software level, IPMI isolates management software from hardware changes. Its standardized platform hardware interface allows the software and drivers to work across various platforms. IPMI's ability to customize software mapping to various management interfaces--such as CIM, DM1, SNMP, WMI, and others--ensures adoption across multiple environments. This is a distinct benefit for an OEM vendor, SI, VAR or ISV, as it eliminates any hardware dependencies (unlike SMBIOS) to other systems and applications. The latest release, IPMI vl.5, has a remote LAN access feature (IPMI uses the term out-of-band to describe this approach) that controls the power of the server remotely, to power down, up or reset, in addition to retrieving events and hardware information over a LAN (or serial) connection. This "Last Mile" manageability is unique as it allows administrators to remotely monitor, manage and recover a server, even if the operating system has halted or the server is off. The next release, IPMI 2.0, will add n ew features, including, but not limited to: IPMI Serial-over-LAN (iSOL)--to redirect the text console over the LAN; support for MS Emergency Management Services (EMS) Systems Administration Console (SAC) over LAN and Linux serial console over LAN; and a Command Line Interface (CLI) to support scripting of commands remotely.
Figure 3 shows a typical IPMI implementation for a pedestal, rack, or bladed server. All IPMI functions are accomplished by sending commands to the BMC, using standardized instructions identified in the specification. Messages all use a request/response protocol (RMCP) and commands are grouped by functionality. The management controller receives and logs event messages, and maintains a Sensor Data Record (SDR) that describes the population of sensors in a system. This includes location, ID, access method and type. The SDR is what also allows software to present relevant data, such as normal reading ranges for temperature. It also helps identify a Field Replaceable Unit (FRU) associated with each sensor. The BMC can be used to deliver such messages to the system's management software and vice versa. IPMI's integration capabilities with system software and its always-available manageability features, make it ideal for a variety of deployment scenarios.
To ensure compliance to the IPMI specification, system vendors should validate IPMI implementations using the IPMI Conformance Test Suite (ICTS). ICTS is designed to verify IPMI 1.5 conformance by automating JPMI tests using a text or GUI-based command tool over serial and LAN connections. The use of ICTS ensures that the enterprise IT groups enjoy broad interoperability across many different system vendors' implementations. The reduction in TCO should be very compelling.
Working Example: CIM and IPMI in a Blade Environment
Let's examine a CIM! IPMI implementation using blades as an example (see Figure 4). Here, the overall management goal is to automatically recover from a failed blade by failing over to a spare (Hot-Standby) blade.
The active server blade experiences a critical fault. The IPMI watchdog (a routine that sets a timeout within the system to detect unresponsive components) detects the failure. The BMC logs this critical event into a System Event Log (SEL in Figure 3) and sends this event to the Chassis Management Module (CMM). The CIM provider sends this to the CIMOM. The CIMOM updates the redundancy configuration in the CIM object schema, sends a CIM Event to the CIM proxy and then to the Management software using WBEM. A message about the fault is then displayed/alerted at the management console.
Depending on the automated policies that have been configured, the CMM sends an IPMI command to power on the spare blade. It also sends any extra information about the failed blade, re-configures the network switch to put the new blade in the active configuration and, using IPMI commands, turns on the visual indicators on the failed blade (to be diagnosed/repaired offline). The BMC on the new blade communicates performance and other relevant data back to the CMM. Any updated information is then reflected back in the CIMOM Repository.
It is worth noting that the Industrial PC/CompactPCI Industry Association (PICMG) has also standardized on IPMI as the manageability specification for PICMG-related devices. Products conforming to PICMG 3.X or Advanced TCA (ATCA) must comply with the IPMI 1.0 specification. ATCA also refers to a series of backplane specifications (Infiniband, etc.). Reliability, availability and serviceability (RAS) are heavily emphasized, as too is the modular nature (blades, chassis, etc.) of these designs. In fact, IPMI is already seeing adoption by vendors m the telecommunications equipment space in equipment such as LAN/WAN line cards, routers, firewalls and switches.
CIM Schema for IPMI
As explained earlier, the schema provides the mapping from the real world to the abstracted (CIM) world. Companies such as OSA Technologies and Intel are looking at defining IPMI schemas. Below are some of the physical characteristics that are being mapped:
* Dynamic Namespace creations/deletions
* Power control
* Critical Event Log
* Boot Options--selection/control
* Power Control & Cooling
* Power Unit Redundancy, Power Supply, Fan monitoring
* Display Manager
* Front Panel, Alarm LEDs, Alarm connector
* Media Selection (CDROM, Floppy)
* Only 'Control' through CIM
* Server Blade Control/ Monitoring/Configuration
* Power Control, Reset, Environmental Monitoring (temperature, voltage, etc.)
* Remote Keyboard/Video, Serial multiplexer
* Alerts (CIM Events)
Management Standards offer a unique way for vendors to customize solutions to the market and build truly interoperable products. These standards are critical building blocks to enable next generation autonomic, utility computing environments.
For system OEMs and software vendors, standards such as CIM and IPMI help promote interoperability. Customers do not want to be locked in by proprietary technologies that increase their IT capital and operating expenses. Operating System Vendors can now determine the system motherboard features in real-time without fat (resource) intensive agents. Management software vendors are now able to identify system component information, especially management controller information, irrespective of the system state (Pre-OS, etc.). Also, it supports health monitoring directly within their applications. Overall, more accurate system information feeds into the global map/database to the benefit of the administrator. Enterprise Managers (End-Users) are able to enhance the system-specific information (serial number, make, model), as well as health monitoring, all pre-integrated into the system.
Finally, if you are a VAR or system integrator, you can build customized, value-add enterprise solutions running on top of the standardized, interoperable platforms from different vendors. For example, consider offering a Remote Management Service--moving your customer relationship from one of 'supplier' to 'business partner.'
Enterprise IT and data centers can also accelerate the industry-wide standardization by making CIM and IPMI mandatory in their next vendor/supplier RFP/RFQ. Standardization of manageability building blocks is the critical next step towards the vision of autonomic, utility computing.
Vendors interested in implementing CIM and IPMI into their products can start by visiting the following websites: DMTF www.dmtf.org IPMI www.intel.com/design/servers/ipmi/index.htm ATCA www.picmg.org/newinitiative.stm SAF www.saforum.org DMTF/IPMI www.osatechnologies.com Building Blocks
Steve Rokov is director of marketing and Steffen Hulegaard is director of engineering at OSA Technologies (San Jose, Calif.)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Jun 1, 2003|
|Previous Article:||Opening the door to open source. (Enterprise Networking).|
|Next Article:||Best practices for managing clustered test environments. (Enterprise Networking).|