Airflow management in a liquid-cooled data center.ABSTRACT Electronics densification is continuing at an unrelenting pace at the server, rack, and facility levels. With increasing facility density levels, airflow management has become a major challenge and concern. Hot spots hot spots acute moist dermatitis. , air short-circuiting, and inadequate tile airflow are a few of the issues that are complicating com·pli·cate tr. & intr.v. com·pli·cat·ed, com·pli·cat·ing, com·pli·cates 1. To make or become complex or perplexing. 2. To twist or become twisted together. adj. 1. airflow management. This paper focuses on a thermal management approach that simplifies facility airflow management in a cost-effective and efficient manner. Implementation of the technology was undertaken with the DOE's Pacific Northwest National Laboratory The Pacific Northwest National Laboratory (PNNL) is one of nine United States Department of Energy (DOE) multiprogram national laboratories. The laboratory PNNL is located in Richland, Washington, and operates a marine research facility in Sequim, Washington. . Under the effort, a single 8.2 kW rack of HP rx2600 servers was converted from air cooling a. 1. In devices generating heat, such as gasoline-engine motor vehicles, the cooling of the device by increasing its radiating surface by means of ribs or radiators, and placing it so that it is exposed to a current of air. Cf. Water cooling. to liquid cooling Liquid Cooling may refer to:
adj. Pierced with one or more holes. tiles in the data center. The air exiting an 8.2 kW air-cooled rack located in a best-case facility location reached a maximum of 34[degrees]C and the air exiting an air-cooled rack located in a worst-case location reached a maximum of 44[degrees]C, while the air exiting the liquid-cooled rack was 10[degrees]C to 20[degrees]C cooler, reaching a maximum of 24[degrees]C. The air is delivered to the tiles at approximately 14[degrees]C. The thermal gradient over the air-cooled racks approximated 10[degrees]C (hottest servers at the top), while that over the liquid-cooled rack was less than half and on the order of 3[degrees]C-4[degrees]C. The image of the internal areas of the air-cooled server showed some significant hot spots on the power pods and memory, while these were significantly diminished for the liquid-cooled server. The tile airflow measurements revealed that the vast majority of the tiles delivered approximately 725 cfm, with five tiles delivering between 1,300 and 1,480 cfm. This paper provides further details on the study and will analyze the manner in which facility airflow management complexity and cost can be reduced for a liquid-cooled facility. INTRODUCTION Electronics densification is continuing at an unrelenting pace at the server, rack, and facility levels. While the most common rack power level is still in the 2-3 kW range (Rasmussen 2005), several vendors are offering racks upwards of 20 kW. ASHRAE's Datacom Equipment Power Trends and Cooling Applications (ASHRAE ASHRAE American Society of Heating, Refrigerating & Air Conditioning Engineers 2005) provides additional details with regard to equipment power trends. In particular, Figure 3.12 of that book provides an update to the original Uptime Institute power trend chart. As the capital, construction, and operating costs operating costs npl → gastos mpl operacionales of facilities continue to climb, data center managers are forced to push for more computationally dense and productive facilities. This forces the data center manager to seek out racks of increasingly higher power Higher power is a term used in a 12-step program, such as Alcoholics Anonymous, to describe "a power greater than yourself." Although many participants equate their higher power with God, a belief in God or in formal religion is not mandatory; the higher power is intended as a levels and to drive facility densities upward. With increasing rack and facility density levels, airflow management has become a major challenge and concern. Data center hot spots, air short-circuiting, and inadequate tile airflow are a few of the challenges that are now plaguing today's data centers. These conditions are making it increasingly difficult for data center managers to maintain recommended server or rack inlet inlet /in·let/ (-let) a means or route of entrance. pelvic inlet the upper limit of the pelvic cavity. thoracic inlet the elliptical opening at the summit of the thorax. conditions. ASHRAE's Thermal Guidelines for Data Processing data processing or information processing, operations (e.g., handling, merging, sorting, and computing) performed upon data in accordance with strictly defined procedures, such as recording and summarizing the financial transactions of a Environments offers guidelines covering items such as rack inlet temperatures and humidities for equipment racks in data centers (ASHRAE 2004). The area of data center airflow management, as a means to ensure proper server or rack inlet conditions, is now getting a lot of attention in the academic and industrial research communities. Researchers are attacking the problem from many different angles, with the overall objective of ensuring that server or rack inlet conditions meet manufacturers' specifications. Sharma et al. (2004) have proposed a supply heat index (SHI) for use in the design and optimization of air-cooled data centers. The study reports results from the first comprehensive heat transfer and fluid flow experiments in a production-level data center. The authors use the experimental results from the study to calculate the SHI under varying conditions and demonstrate the utility of this dimensionless parameter to ensure proper rack inlet conditions. Wang (2004) has proposed a new door design to prevent hot air from being recirculated into the tops of the racks. Wang proposes a mostly solid rack door with perforations restricted to the base of the rack door. This allows the door to pull in chilled air from the perforated tiles and not from the hot air stratified stratified /strat·i·fied/ (strat´i-fid) formed or arranged in layers. strat·i·fied adj. Arranged in the form of layers or strata. toward the tops of the racks. For a given 3.5 kW rack, Wang shows a reduction in temperature rise over the rack, i.e., the temperature difference between the air issuing from the perforated tile and that entering the equipment at the top of the rack, from 12[degrees]C down to 4[degrees]C. Wang acknowledges that this design is susceptible to high inlet air velocities, which increases the potential to entrain entrain /en·train/ (en-tran´) to modulate the cardiac rhythm by gaining control of the rate of the pacemaker with an external stimulus. particulate par·tic·u·late adj. Of or occurring in the form of fine particles. n. A particulate substance. particulate composed of separate particles. contaminants. Bhopte et al. (2005) studied the minimization of rack inlet air temperatures via a multi-variable optimization study. The variables studied were data center floor plenum In a building, the space between the real ceiling and the dropped ceiling, which is often used as an air duct for heating and air conditioning. It is also filled with electrical, telephone and network wires. See plenum cable. depth, floor tile placement, and ceiling height. The authors showed a significant effect of all three variables on rack inlet air temperature. Future study is suggested in the areas of computer room air-conditioning (CRAC CRAC, n contract-relax, antagonist contract; a proprioceptive neuromuscular facilitation (PNF) technique that uses antagonist and agonist muscles to stretch and relax taut muscles. See also PNF. ) unit placement, CRAC flow rates, and floor tile resistances. Schmidt et al. (2005) have designed a water-cooled (rack) rear door heat exchanger heat exchanger Any of several devices that transfer heat from a hot to a cold fluid. In many engineering applications, one fluid needs to be heated and another cooled, a requirement economically accomplished by a heat exchanger. (RDHX RDHX Rear Door Heat Exchanger (server component) ) to extract a large portion of the rack heat load from the exhaust air before it is placed back into the data center. The RDHX relies on a cooling distribution unit (CDU CDU Christlich-Demokratische Union (German: Christian Democratic Party) CDU Clasificación Decimal Universal (Spanish) CDU Control & Display Unit CDU Control Display Unit ) to deliver water above the dew point dew point: see dew. of a given facility. For a demonstration rack with six IBM BladeCenters The IBM BladeCenter is IBM's blade server architecture. History Originally introduced in 2002, based on engineering work started in 1999, the IBM BladeCenter was a relative late comer to the blade market. (25 kW rack simulated), the RDHX was shown to remove 50%-60% of the heat from the exhaust air while simultaneously lowering the exhaust air temperature 25[degrees]C-30[degrees]C. Schmidt et al. also demonstrated a favorable fa·vor·a·ble adj. 1. Advantageous; helpful: favorable winds. 2. Encouraging; propitious: a favorable diagnosis. 3. total cost of ownership for this solution. Heydari and Sabounchi (2004) propose refrigeration-assisted hot spot cooling of data centers by placing refrigeration/fan-coil heat exchanger units over the hot spots. The authors combined thermal hydraulic modeling of the refrigeration refrigeration, process for drawing heat from substances to lower their temperature, often for purposes of preservation. Refrigeration in its modern, portable form also depends on insulating materials that are thin yet effective. system with computational fluid dynamic (CFD CFD - Computational Fluid Dynamics ) analysis of the data center airflow. Their analytical results show a reduction in data center hot spots. An alternative to managing the data center airflow is provided through the use of so-called "refrigerated re·frig·er·ate tr.v. re·frig·er·at·ed, re·frig·er·at·ing, re·frig·er·ates 1. To cool or chill (a substance). 2. To preserve (food) by chilling. " racks. Such racks are totally enclosed en·close also in·close tr.v. en·closed, en·clos·ing, en·clos·es 1. To surround on all sides; close in. 2. To fence in so as to prevent common use: enclosed the pasture. and include an air-to-liquid heat exchanger. The air inside the rack is cooled when it passes over the air-to-liquid heat exchanger and is then delivered to the servers. The authors of this paper propose a technology that significantly reduces facility airflow management challenges. The approach is to spot-cool the microprocessors with dielectric dielectric (dī'ĭlĕk`trĭk), material that does not conduct electricity readily, i.e., an insulator (see insulation). A good dielectric should also have other properties: It must resist breakdown under high voltages; it should not liquid-cooled cold plates. This approach allows approximately 45% of the rack computing heat load to be rejected directly to the facility chilled water (and eventually directly to the cooling towers). By reducing the amount of heat dissipated dis·si·pat·ed adj. 1. Intemperate in the pursuit of pleasure; dissolute. 2. Wasted or squandered. 3. Irreversibly lost. Used of energy. directly to the facility ambient, there is a dramatic reduction in the volume of airflow required per rack and, in turn, for a full facility populated pop·u·late tr.v. pop·u·lat·ed, pop·u·lat·ing, pop·u·lates 1. To supply with inhabitants, as by colonization; people. 2. with liquid-cooled racks. The present paper focuses on an analysis of the impact that the technology has on facility airflow management. Infrared images of the rear door of a liquid-cooled rack and of several air-cooled racks are used in the analysis. HARDWARE SETUP This study was conducted using several air-cooled racks and a single liquid-cooled rack in the Molecular Sciences Computing Facility (MSCF MSCF Mario Strikers Charged Football (gaming) MSCF Minnesota State College Faculty MSCF Master of Science in Computational Finance (degree) MSCF Molecular Sciences Computing Facility ) of Pacific Northwest National Laboratory (PNNL PNNL Pacific Northwest National Laboratory ). An infrared camera was used to take the infrared images used in the study. The following sections provide further detail on the liquid-cooling hardware and the infrared camera. Liquid-Cooling Hardware This study was conducted on racks of servers housed in MSCF's supercomputer supercomputer, a state-of-the-art, extremely powerful computer capable of manipulating massive amounts of data in a relatively short time. Supercomputers are very expensive and are employed for specialized scientific and engineering applications that must handle very . The supercomputer consists of 84 racks of air-cooled HP rx2600 2U servers (dual processor, 1.5 GHz IA64 server). Under a DOE Energy Smart Data Center study, a single rack has been converted from air cooling to liquid cooling. As part of the conversion, the air-cooled fan heatsinks were removed and replaced with ISR (Interrupt Service Routine) Software routine that is executed in response to an interrupt. spray module kits (SMKs). A single spray module and a converted server are shown in Figure 1. Each SMK SMK Smoke SMK Smoker SMK Statsministerens Kontor SMK Slagsmålsklubben (Swedish band) SMK Super Mario Kart (video game) SMK Software Migration Kit SMK Shared Management Knowledge SMK Sierra Match King is supplied with conditioned dielectric coolant coolant (kōō´l n that is used to keep the processors cool. The heat absorbed from the processors converts the single-phase coolant supplied into a two-phase mixture. All SMKs have a dielectric coolant supply line leading to them from a server manifold manifold In mathematics, a topological space (see topology) with a family of local coordinate systems related to each other by certain classes of coordinate transformations. Manifolds occur in algebraic geometry, differential equations, and classical dynamics. and a return line leading away from them. The server manifold, in turn, has a supply line leading to it from a rack supply manifold (see Figure 1) and a return line leading away from it to the rack return manifold. These supply and return lines connect to their respective manifolds This is a list of particular manifolds, by Wikipedia page. See also list of geometric topology topics. For categorical listings see and its subcategories. Generic families of manifolds
TMU Tokyo Metropolitan University TMU Traffic Management Unit (BCOPD) TMU Texture Mapping Unit (3D video rendering hardware) TMU Time Measurement Unit ) sitting under the raised floor, underneath the rack (this unit is also designed to mount in a standard 19 in. rack). The TMU consists of a pump, a reservoir, a controller, power supplies, and a liquid-to-liquid heat exchanger. The liquid-to-liquid heat exchanger is supplied with facility water, which condenses all the vapor and provides a subcooled single-phase liquid. The TMU then supplies the conditioned coolant to the supply manifold. [FIGURE 1 OMITTED] The liquid-cooled rack is installed on MSCF's main floor as part of the supercomputing cluster. Infrared Camera A FLIR Systems | company_name = FLIR Systems | company_type = Public (NASDAQ: FLIR) | foundation = 1978 | location = Wilsonville, Oregon, United States | key_people = Earl R. Lewis }} FLIR Systems Thermacam S45 was used to take the infrared images for this study. The camera is designed for research and development and scientific applications and has high resolution (320 x 240 pixels) and high quality images. The camera has a thermal sensitivity thermal sensitivity, n See sensitivity, tooth. of 0.08[degrees]C at 30[degrees]C (i.e., it can read temperature differences as low as 0.08[degrees]C) and can record temperatures in the range of -40[degrees]C to +1,500[degrees]C (up to a maximum of +2,000[degrees]C with additional hardware). The camera has a field of view of 24[degrees] x 180[degrees], which allowed approximately one-third of a rack door to be imaged from roughly five feet away. The camera has an accuracy of [+ or -]2[degrees]C, or [+ or -]2% of the reading. TEST METHODOLOGY The testing consisted primarily of thermal performance testing Performance Testing covers a broad range of engineering or functional evaluations where a material, product, or system is not specified by detailed material or component specifications: Rather, emphasis is on the final measurable performance characteristics. , software benchmarking, facility airflow measurements, and infrared imaging. The following sections provide further details. Test Conditions The spray modules in the servers were supplied with PF5050, a dielectric coolant. PF5050 (fluorinert) has the following approximate properties given at 1 atm and room temperature: boiling point boiling point, temperature at which a substance changes its state from liquid to gas. A stricter definition of boiling point is the temperature at which the liquid and vapor (gas) phases of a substance can exist in equilibrium. of 30[degrees]C, specific heat of 1,048 J/kg x K, viscosity of 4.69E-4 kg/m x s, thermal conductivity thermal conductivity A measure of the ability of a material to transfer heat. Given two surfaces on either side of the material with a temperature difference between them, the thermal conductivity is the heat energy transferred per unit time and per unit of 0.056 W/m x K, and a latent heat latent heat, heat change associated with a change of state or phase (see states of matter). Latent heat, also called heat of transformation, is the heat given up or absorbed by a unit mass of a substance as it changes from a solid to a liquid, from a liquid to a gas, of vaporization vaporization, change of a liquid or solid substance to a gas or vapor. There is fundamentally no difference between the terms gas and vapor, but gas is used commonly to describe a substance that appears in the gaseous state under standard conditions of of 102.9 kJ/kg. The coolant was delivered at an atomizing pressure of approximately 20 psid across the atomizers, and the system pressure was maintained at roughly one atmosphere. The facility was designed as an air-cooled facility. Sixteen air-handling units (AHUs) located on the periphery periphery /pe·riph·ery/ (pe-rif´er-e) an outward surface or structure; the portion of a system outside the central region.periph´eral pe·riph·er·y n. 1. of the data center deliver 13[degrees]C-15[degrees]C chilled air to all the racks. Chilled air is drawn in through the front of all the servers and is exhausted out the backs. The heated air mixes with the residual air residual air n. See residual volume. that is not passed through the servers and returns to the AHUs (no special ducting duct·ing n. 1. A duct or system of ducts. 2. Material for making ducts. is used). For the purposes of this study, PNNL provided a chilled water supply line to the rack and also provided a water return line. The TMU deploys a fluorinert-to-water heat exchanger, which uses the facility water. The facility delivers chilled water at a temperature as low as 7[degrees]C at a supply pressure of 50 psid and a calculated flow rate of 6.5 gpm. Benchmark Routine In order to get the air-cooled and liquid-cooled racks to their maximum operating temperatures, several benchmark routines were run. While the system cases were closed, both High Performance Linpack (HPL HPL - Language used in HP9825A/S/T "Desktop Calculators", 1978(?) and ported to the early Series 200 family (9826 and 9836, 68000). Fairly simple and standard, but with extensive I/O support for data acquisition and control (BCD, Serial, 16 bit custom and IEEE 488 interfaces), ) (Petitet et al. 2004; Dongarra et al. 1979, 2003) and Stream 2 (McAlpin 2005) were executed. For the test where the servers' internal temperatures were monitored, the interconnect was disconnected in order to open the server case. This precluded the running of HPL within the default environment under the time constraints In law, time constraints are placed on certain actions and filings in the interest of speedy justice, and additionally to prevent the evasion of the ends of justice by waiting until a matter is moot. during which the test took place. Instead of HPL, a small C program was constructed to exercise similar portions of the processor. A separate instance of HPL was executed on each dual-processor server at a time, with two processes. The test cycled through different sizes of N, including 3000, 4000, and 5000, with a P of 1, a Q of 2, and NBs of 8, 16, and 32. Unfortunately, the programs run were compiled on a stand-alone system, and the optimizing Intel compiler was not available due to licensing issues and the use of an older version of GNU compiler collection The GNU Compiler Collection (usually shortened to GCC) is a set of compilers produced for various programming languages by the GNU Project. GCC is a key component of the GNU toolchain, and as well as being the official compiler of the GNU system, GCC has been adopted as the (GCC GCC: see Gulf Cooperation Council. (compiler, programming) GCC - The GNU Compiler Collection, which currently contains front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj, etc). ) (Redhat release 2.96). This limited performance to only 50% of peak ([+ or -]5%, or around 3 Gflops/CPU). The Stream2 benchmark, run with NMIN = 3 and NMAX = 2,000,000, was used to observe the thermal behavior of the memory chips and supporting systems under load. The test program used to stimulate the processor when the system case was open executed two nested loops of floating-point mult-add instructions, with periodic divides to rescale Verb 1. rescale - establish on a new scale resize - change the size of; make the size more appropriate scale down - reduce proportionally; "The model is scaled down" scale up - increase proportionally; "scale up the model" the data to avoid overflows. The data were blocked such that it would efficiently pipeline onto all four of the Itanium floating-point units (hardware) Floating-Point Unit - (FPU) A floating-point accelerator, usually in a single integrated circuit, possible on the same IC as the central processing unit. . The overall measure of work performed was similar to HPL (50% of peak, [+ or -]5%) as measured by the pfmon program (HP 2005). Two instances of this program were executed per node in order to exercise both CPUs. For the desired purpose of exercising the CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. , this was deemed to be functionally equivalent to running HPL to create a temperature increase. Facility Airflow Rate Measurements A TSI TSI Total Solar Irradiance (sum solar light in energy per unit of time) TSI Trading Standards Institute (UK) TSI Transportation Safety Institute (US DOT) Model 8373 AccuBalance tile hood was used to measure the airflow rate from all of the perforated and high-percentage open grate tiles at PNNL. The tile hood is capable of measuring flow rates in the range of 30 to 2,000 cfm, at [+ or -]5 cfm or [+ or -]5% of the reading. The tile hood can be used to measure airflow rates within a temperature range of 0[degrees]C to 60[degrees]C, with a resolution of 0.1[degrees]C and an accuracy of [+ or -]0.5[degrees]C. The flow rates from all the tiles and grates were measured. Results are presented in the section entitled en·ti·tle tr.v. en·ti·tled, en·ti·tling, en·ti·tles 1. To give a name or title to. 2. To furnish with a right or claim to something: "Airflow Management in the Molecular Sciences Computing Facility." Infrared Imaging The manner in which the server racks were fully exercised has been described in detail in the "Liquid-Cooling Hardware" section. A given rack was allowed to run HPL for at least 30 minutes before an image was taken. Given the size of the camera's field of view and the width of the data center aisles (approximately four feet wide), only one-third of a rack's door could be imaged at once. In addition, the images were taken at an angle to the rack door's plane. In the initial stages of the imaging, a type-T thermocouple was placed on a rack door. The thermocouple was read with a calibrated cal·i·brate tr.v. cal·i·brat·ed, cal·i·brat·ing, cal·i·brates 1. To check, adjust, or determine by comparison with a standard (the graduations of a quantitative measuring instrument): Fluke fluke, parasitic flatworm of the trematoda class, related to the tapeworm. Instead of the cilia, external sense organs, and epidermis of the free-living flatworms, adult flukes have sucking disks with which they cling to their hosts and an external cuticle that handheld thermocouple reader. The camera was then focused on the thermocouple and the image probe placed over the top of the thermocouple. This was done to ensure that the camera settings were correct and that there was minimal deviation between the temperature shown at the probe and that recorded by the thermocouple. In all cases, the camera measured within 2[degrees]C of the thermocouple. Images were taken over the entire surface of several rack doors as well as the fronts of several racks. In the case of the liquid-cooled rack, the door was opened and several images were taken of the supply and return tubing. Images were also taken of the internal areas of an air-cooled server and of a liquid-cooled server. The images of the internal areas were taken while the servers were running HPL. Several images were also taken of the perforated tiles. RESULTS AND DISCUSSION The primary focus of this study was to quantify the projected positive impact of liquid-cooling technology on PNNL's facility ambient enviromental conditions. One of the means selected to do this was through the use of infrared imaging. Infrared images were taken of several air-cooled racks of servers and the liquid-cooled rack of servers. In addition, images were taken of the internal areas of an air-cooled and liquid-cooled server as well as of several perforated tiles. The images and associated results are discussed in the following sections. Comparison of Air-Cooled and Liquid-Cooled Racks Prior to taking any infrared images, several racks were chosen for the study. One rack, referred to as a "worst-case" or "hottest" rack, was selected from a group of racks located in the hottest spot in the data center. These racks were located 3-4 tiles away from the outlet of an AHU A´hu n. 1. (Zool.) The Asiatic gazelle. . This placement resulted in an extremely high air velocity under these racks, meaning that the volumetric volumetric /vol·u·met·ric/ (vol?u-met´rik) pertaining to or accompanied by measurement in volumes. vol·u·met·ric adj. Of or relating to measurement by volume. airflow rate issuing from the dedicated tiles was less than optimal (see airflow discussion in the "Comparison of Air-Cooled and Liquid-Cooled Servers" section). At least one rack, referred to as a "best-case" or "coolest" rack, was located in the coolest part of the data center. This rack was located between two other racks and was optimally located with respect to any AHU. The liquid-cooled rack was located on the end of a row of racks and was subject to hot airflow recirculation Noun 1. recirculation - circulation again circulation - the spread or transmission of something (as news or money) to a wider group or area around the front of the rack. The liquid-cooled rack was intentionally located in a less-than-optimal location in the data center. As discussed in the "Liquid Cooling Hardware" section, HPL was run on all servers in order to get them to their maximum operating temperatures. Figures 2 and 3 show infrared (IR) images of the inlet and outlet to a worst-case rack of air-cooled servers. In Figure 3, the camera's probe indicates a rack outlet temperature of 43.4[degrees]C, with a maximum recorded temperature of 44[degrees]C. A rough analysis of this image shows a temperature gradient temperature gradient n. The rate of change of temperature with displacement in a given direction from a given reference point. temperature gradient of at least 10[degrees]C over the surface area of the rack shown. Comparing Figures 2 and 3, and using the temperatures indicated at the probe locations (approximately two-thirds up the height of the rack), shows that the air rises approximately 20[degrees]C across the rack. This temperature rise will differ based upon location of the probe. Figures 4 and 5 show IR images of the inlet and outlet to the liquid-cooled rack of servers. In Figure 5, the camera's probe is actually in error (temperature has been blacked out), but investigation of the temperature scale suggests that the temperature at the location of the probe is approximately 22[degrees]C, with a maximum recorded temperature of 24[degrees]C. The image does not have enough information to allow for an estimation of the gradient over the full door, but the field measurements showed a highly uniform temperature distribution over this door. Comparing Figures 4 and 5, and using the temperatures indicated at the probe locations, shows that the air rises approximately 7[degrees]C across the rack. The temperature rise of 20[degrees]C across the air-cooled rack is 187% greater than the 7[degrees]C rise for the liquid-cooled rack. It should be noted that the temperature rise across the two racks is highly dependent upon the location of the temperature probing point. Figure 6 shows an image of the rear of a given liquid-cooled server (rear door has been opened). In particular, the image shows the coolant supply and return lines. The temperatures of approximately 20[degrees]C for the supply line and 30[degrees]C for the return line are consistent with the temperatures measured by the temperature sensors utilized by the liquid-cooling system. [FIGURE 2 OMITTED] [FIGURE 3 OMITTED] [FIGURE 4 OMITTED] [FIGURE 5 OMITTED] [FIGURE 6 OMITTED] [FIGURE 7 OMITTED] Figure 7 shows an image of the rear doors of two air-cooled racks in a best-case location in the data center. These racks are in the middle of the data center and are supplied with chilled air from CRACs on two opposing walls. The rack on the left-hand side left-hand side n → izquierda left-hand side left n → linke Seite f left-hand side n → lato or is in a favorable position Noun 1. favorable position - the quality of being at a competitive advantage favourable position, superiority advantage, vantage - the quality of having a superior or more favorable position; "the experience gave him the advantage over me" , as it is sandwiched between two other racks. Estimating from the temperature indicated by the probe placed on the rack on the end, the temperature on the rear door of this rack, at the same height as the probe, is approximately 32[degrees]C. It is also safe to assume that the maximum temperature for the area of the door shown is 34[degrees]C. Comparison of Figures 5 and 7 shows that the air-cooled rack outlet air, for either of the two racks shown in Figure 7, is at least 10[degrees]C hotter than the outlet air for the liquid-cooled rack. Comparison of Figures 2 and 5 shows that the worst-case air-cooled rack outlet air is at least 20[degrees]C hotter than the outlet air for the liquid-cooled rack. Comparison of Air-Cooled and Liquid-Cooled Servers Figure 8 shows the inside of an air-cooled server, while Figure 9 shows the inside of a liquid-cooled server. For both servers, the server lid was removed and the image taken immediately thereafter. Removing the server lid compromises the airflow over the server components, but taking the image rapidly upon opening the lid provides a relatively good comparison of the internal temperatures of the two servers. Comparison of the two figures shows the components inside the air-cooled server running significantly hotter than those of the liquid-cooled server. This is particularly evident for the memory dual inline memory modules (DIMMs). Testing of three different servers running Burn P6 showed the air-cooled memory DIMMs running 3.3[degrees]C to 9.7[degrees]C hotter than the DIMMs in the liquid-cooled server (data are not shown in this paper). By removing approximately 170 W for the two processors (average of 85 W per processor), or roughly 45% of the average total server power dissipation Dissipation See also Debauchery. Breitmann, Hans lax indulger. [Am. Lit.: Hans Breitmann’s Ballads] Burley, John wasteful ne’er-do-well. [Br. Lit. , the server internal ambient runs significantly cooler. A cooler server internal ambient results in a cooler motherboard Also called the "system board," it is the main printed circuit board in an electronic device, which contains sockets that accept additional boards. In a desktop computer, the motherboard contains the CPU, chipset, PCI bus slots, AGP slot, memory sockets and controller circuits for the , a cooler server chassis, and cooler components. An additional benefit is that the total server power dissipation decreases with a reduced internal server ambient. Airflow Management in the Molecular Sciences Computing Facility A liquid-cooled data center currently does not exist. The objective of this paper is to analyze and discuss airflow management for a single rack of liquid-cooled servers. The results for a single rack of servers are used to perform a rough scale-up to a full-scale liquid-cooled data center in the section entitled "Scale-Up to a Liquid-Cooled Data Center." [FIGURE 8 OMITTED] [FIGURE 9 OMITTED] As discussed previously, a TSI Model 8373 tile hood was used to measure the airflow rate for each of the perforated and grated grate 1 v. grat·ed, grat·ing, grates v.tr. 1. To reduce to fragments, shreds, or powder by rubbing against an abrasive surface. 2. tiles installed at PNNL. In total, 110 tiles are installed. Table 1 shows the distribution of airflow rate for all the tiles, while Figure 10 shows an infrared image of several perforated tiles. The majority of the tiles provide approximately 725 cfm of airflow, with the grated tiles at approximately 1,400 cfm. The tile directly in front of the best-case air-cooled rack discussed in "Comparison of Air-Cooled and Liquid-Cooled Servers" provides 675 cfm, while the tile in front of the liquid-cooled rack provides 680 cfm. Each rack receives air from an average of 1.5 tiles. In total, the facility provides roughly 76,788 cfm from 16 CRAC units. Airflow management challenges in a data center arise in a number of different ways. For example, racks located too close to CRACs may experience very low to negative static pressure at their tiles, thereby receiving very limited airflow. Racks located at the end of a row may be subjected to hot air recirculation around the side of the rack from the rack exhaust to the rack inlet. Other racks may recirculate rack exhaust air over the top of the racks if poor facility airflow patterns do not effectively return this air to the CRACs (see Figure 3 in Wang [2004]). Sharma et al. (2004) have proposed a supply heat index (SHI) as a means of gauging rack and facility airflow recirculation and air delivery design. The index is defined as SHI = ([T.sub.rack,in] - [T.sub.CRAC])/([T.sub.rack,out] - [T.sub.CRAC]), (1) where [T.sub.rack,in] = rack inlet air temperature, [T.sub.rack,out] = rack outlet air temperature, and [T.sub.CRAC] = temperature of air as supplied by the CRACs (tile exit temperature used). For this index, the higher the value, the greater the airflow recirculation and the poorer the air delivery design. The index values are comparable only for identical racks under identical work loads operating in identical airflow conditions. For their studies, Sharma et al. (2004) do not report values much higher than 0.5. [FIGURE 10 OMITTED] [FIGURE 11 OMITTED] The SHI has been calculated for an air-cooled rack located on the end of a row (see right-hand rack in Figure 7) and for the liquid-cooled rack at PNNL. Figure 11 also shows the facility locations of the air-cooled rack (end of row) and the liquid-cooled rack. The results from the calculation of the SHI have been tabulated in Table 2. The index has been calculated for the bottom, middle, and top servers. With the exception of the top liquid-cooled server, the air-cooled servers have SHI values significantly higher than those of the liquid-cooled servers. The high values of SHI for the air-cooled servers support the idea that a significant amount of heated rack exhaust air is being recirculated around the side of the rack and re-entrained into the front of the rack. An additional contributor to this difference is the fact that the liquid-cooled rack needs significantly less airflow than the equivalent air-cooled rack. A lower airflow rate requirement reduces the chance of hot air recirculation. Figure 12 presents airflow requirements for blade servers A server architecture that houses multiple server modules ("blades") in a single chassis. It is widely used in datacenters to save space and improve system management. Either self-standing or rack mounted, the chassis provides the power supply, and each blade has its own CPU, memory and and standard IT equipment. This chart can also be used to place the airflow rate requirement for PNNL's enterprise servers in perspective. The most common airflow rate of 725 cfm per tile at PNNL is indicated on the horizontal axis. Using direct measurements of the power dissipated by all the liquid-cooled servers (while running HPL) and the measured air temperature rise over each server (multiple thermocouples at both the inlet and outlet of each server), an energy balance over each server provided the required airflow rate per server. The average server airflow rate for the rack was then calculated; this value is indicated on the horizontal axis of Figure 12. A similar energy balance was conducted for the air-cooled rack. The result of this calculation indicates that the air-cooled rack needs 543 cfm of air, which is 83% more volumetric airflow than the 300 cfm needed by the liquid-cooled rack. This is based upon semi-empirical data for the energy balance across the rack and actual test data. This result supports the idea that, due to the lower volumetric airflow rate requirement, the liquid-cooled rack is much less susceptible to hot exhaust air recirculation, even though it is also located on the end of a row of racks. The lower airflow rate requirement for the liquid-cooled rack also means that it will be much less likely to recirculate hot exhaust air in from above the rack or even be highly affected by the low tile flow for locations very close to CRACs. The lower airflow rate required by the liquid-cooled rack will also allow PNNL to get back to more reasonable flow rates for perforated tiles, as indicated in Figure 12. [FIGURE 12 OMITTED] Figure 12 includes the note "Full board cooling." This refers to a new implementation of liquid cooling that is currently being developed. In this implementation, the full board would be cooled and would allow PNNL to deal with budding budding, type of grafting in which a plant bud is inserted under the bark of the stock (usually not more than a year old). It is best done when the bark will peel easily and the buds are mature, as in spring, late summer, or early autumn. problem areas such as the memory and communications chips. This approach offers additional opportunity to further reduce the new requirement for large volumetric airflow rates in data centers. Scale-Up to a Liquid-Cooled Data Center As part of the DOE's Energy Smart Data Center program, the authors have used the results of this study to investigate the feasibility of scaling up to a full-scale liquid-cooled data center. The analysis was conducted for the current 2U servers, for 1U servers, and for dual Opteron blade servers. For each scale-up exercise, PNNL's system architects conducted an analysis to ensure that the correct supercomputer system balance was maintained to allow them to run current production jobs. Results for the 2U and 1U servers are discussed in this paper. A full inventory of all of PNNL's hardware was taken. The primary hardware consisted of 2U servers ("thin" node racks with only compute servers (computer, parallel) compute server - A kind of parallel processor where the parallel processors have no I/O except via a bus or other connection to a front-end processor which handles all I/O to disks, terminals and network. and "fat" node racks with an additional 2U of storage per 2U server), interconnect switch racks, storage racks, and network equipment. Using the power dissipation numbers provided by the system vendor, the supercomputer's total power dissipation was calculated at 590 kW. PNNL's current facility uses a combination of thin node and fat node server racks for a total of 84 racks. They also employ 24 racks of interconnect switches. The theoretical computational capacity for the facility is 11.232 TeraFlops. For the 2U server scale-up, it was assumed that all the processors would be cooled with spray modules and that each rack would be cooled by a single thermal management unit (no change to the total number of racks). Using the measured average power dissipation per CPU, it was estimated that the CPU load for the facility's 1,994 CPUs would be approximately 156 kW, or 26% of the supercomputer's total power dissipation, which includes network and storage power. Since the facility requires 76,788 cfm for the full 590 kW, a linear scaling shows that four fewer CRACs are needed if 156 kW are rejected directly to the process chilled water and not the facility air, keeping in mind that the pumping power required to operate a liquid-cooled facility is equal to the power used by 1.4 CRACs. Schmidt et al. (2005) use a similar argument in their scale-up study for IBM's CoolBlue rear door heat exchanger. While the total cost ownership and COP benefits of scaling up to a liquid-cooled facility were favorable, they were not as attractive relative to a scale-up using 1U servers. No benefits were assumed for the space freed up by the removal of four CRACS or from the lower airflow requirement for the liquid-cooled racks. Before conducting the scale-up to 1U servers, an HP rx1620 1U server was converted to liquid cooling to verify that it could be efficiently cooled. For the conversion, the processors were liquid-cooled similarly to the 2U servers. The liquid-cooled server was cooled with a fluorinert-to-water thermal management unit. The liquid-cooled server was investigated over a range of fluid temperatures in order to demonstrate that the server could still be effectively cooled when rejecting to water with a temperature as high as 30[degrees]C. The reason for rejecting to such warm water was to show the ability to bypass PNNL's chillers and to reject the heat directly to the cooling tower water--PNNL's highest summer water temperatures are unlikely to exceed 30[degrees]C. Additional test results for the 1U server and other platforms are provided in Cader and Regimbal (2005). Rejecting to cooling tower water would increase the COP by simply removing the water chiller A water chiller[1] is a device used in hydroponics to cool the water temperature in the reservoir. In hydroponic application, pumps, lights and ambient heat can warm the reservoir temperatures, leading to plant root and health problems. power load. This affects the process for cooling the processors but does not affect the remaining air-cooled components. Therefore, the air temperature and airflow rate within the data center would need to be maintained. The scale-up to the 1U servers showed that the current supercomputer balance can be maintained with 69 server racks (combination of thin node and fat node racks), 16 racks of node switches, and 8 racks of top switches. By switching to 1U servers, it was assumed that the current Itanium2 processors would be used, meaning that the computational capacity of 11.232 TFlops would be achieved with 15 fewer racks of servers. By rejecting the 156 kW of CPU power to the cooling tower water, four CRACs can be removed. In addition, the removal of 15 racks means that the facility can be significantly reduced in footprint, resulting in the removal of additional CRACs or the addition of other computational resources In computational complexity theory, a computational resource is a resource used by some computational models in the solution of computational problems. The simplest computational resources are computation time, the number of steps necessary to solve a problem, and . Scale-up assuming 1U servers resulted in a 22% increase in facility COP, relative to the current air-cooled facility, and a payback Payback The length of time it takes to recover the initial cost of a project, without regard to the time value of money. time ranging from 0.5 to 2.8 years. The range in the payback years depends upon the assumptions made, with 0.5 year taking advantage of the fact that PNNL can increase computational capacity without increasing facility footprint. The scale-up exercise has highlighted the benefits of a liquid-cooled data center at the facility level. The scale-up was conducted in a relatively conservative fashion. It is clear from the results that the reduction in required airflow rate for the facility will dramatically reduce the facility airflow management challenges. The reduced need for airflow rate delivered to a rack will allow datacenter operators to deploy significantly higher power (density) racks. CONCLUSION Under funding from the DOE's Energy Smart Data Center program, an analysis of the airflow management in PNNL's Molecular Sciences Computing Facility was conducted. As part of the analysis, several high-performance air-cooled racks and a single liquid-cooled rack were investigated. High Performance Linpack was run on the racks of servers while thermal data, airflow rate data, and infrared images were captured. The results of the study were also used to conduct an initial study of the feasibility of scaling up to at least one vision of a full-scale liquid-cooled data center. The infrared images show that the exhaust air from the liquid-cooled rack is 10[degrees]C-20[degrees]C cooler than the exhaust air from the air-cooled racks investigated. The measured data also showed that the air temperature rise across the hottest air-cooled rack investigated was 187% greater than that across the liquid-cooled rack. The SHI for the majority of the air-cooled servers analyzed was significantly higher than that for the liquid-cooled servers used in the comparison. The high value of SHI for the air-cooled rack supported the idea that the rack was re-entraining a significant amount of hot exhaust air. This was also supported by the fact that energy balances over the air-cooled rack and liquid-cooled rack showed that the air-cooled rack needed 83% more airflow. The scale-up study showed a favorable result when using liquid-cooled 1U servers. The results indicate multiple benefits for the liquid-cooled rack. Key among the benefits are (1) cooler air exhausting from the liquid-cooled rack into the facility ambient; (2) a significantly lower airflow rate requirement for the liquid-cooled rack, which has the effect of reducing the amount of airflow shortcircuiting; (3) fewer CRACs; and (4) little to no limitations on the data center placement of liquid-cooled racks. By rejecting the heat directly to the facility's chilled water, or even directly to the cooling tower water, the challenges of facility airflow management are dramatically reduced. While significant challenges remain, scale-up to a liquid-cooled data center appears to be feasible. There are still challenges with implementing a liquid-cooled facility, including the stigma with using water near computing equipment. Historical precedents indicate that this stigma can be overcome; however, the CRAY (Cray, Inc., Seattle, WA, www.cray.com) A supercomputer manufacturer founded in 1972 as Cray Research, Inc., by Seymour Cray, a leading designer of large-scale computers at Control Data. In 1976, it shipped its first computer to Los Alamos National Laboratory. 2 supercomputer and IBM mainframes Following are the mainframe architectures used in IBM mainframes since the original System/360 introduced in 1964. Year Architecture Model numbers System/360 1964 System/360 2xxx (2020 to 2195) System/370 have long histories in datacenter computing. Key techniques for mitigating perceived risks associated with water include advanced plumbing and leak detection technology, which, when integrated at the facility level, will mitigate risk operating coolant water in the datacenter. ACKNOWLEDGMENTS This research was performed in part using the Molecular Science Computing Facility in the William R. Wiley Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the US Department of Energy's Office of Biological and Environmental Research and located at the Pacific Northwest National Laboratory, operated for the Department of Energy by Battelle. The assistance of Andrew Wolf (ISR) and Kevin Fox Kevin Fox is the founding director of the Grammy-winning Pacific Boychoir. He has been involved with boys choirs since the age of eight. He holds degrees in Music (with Honors) and Economics from Wesleyan University, Connecticut, where he studied voice with tenor Wayne (PNNL) are acknowledged. REFERENCES ASHRAE. 2004. Thermal Guidelines for Data Processing Environments. Atlanta: American Society of Heating, Refrigerating re·frig·er·ate tr.v. re·frig·er·at·ed, re·frig·er·at·ing, re·frig·er·ates 1. To cool or chill (a substance). 2. To preserve (food) by chilling. and Air-Conditioning Engineers, Inc. ASHRAE. 2005. Datacom Equipment Power Trends and Cooling Applications. Atlanta: American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc. Bhopte, S., D. Agonafer, R. Schmidt, and B. Sammakia. 2005. Optimization of data center room layout to minimize rack inlet air temperature. Proceedings of InterPACK05, San Francisco San Francisco (săn frănsĭs`kō), city (1990 pop. 723,959), coextensive with San Francisco co., W Calif., on the tip of a peninsula between the Pacific Ocean and San Francisco Bay, which are connected by the strait known as the Golden , CA, July 18-22. Cader, T., and K. Regimbal. 2005. Energy smart data center. InterPack05, San Francisco, CA, July 18-22. Dongarra, J., J. Bunch, C. Moler, and G.W. Stewart. 1979. Linpack Users Guide. Philadelphia: Siam. Dongarra, J., P. Luszczek, and A. Petitet. 2003. The Linpack benchmark: Past, present, and future. Concurrency Operations that are performed simultaneously within the computer. For example, dual-core CPUs provide complete overlapping of two independent processes. See dual core, hyperthreading, multiprocessing, multitasking, multithreading, SMP and MPP. concurrency - multitasking and Computation: Practice and Experience Journal 15:1-18. HP. 2005. Perfmon project. http://www.hpl.hp.com/research/linux/perfmon/. Hewlett-Packard Development Company. Heydari, A., and P. Sabounchi. 2004. Refrigeration-assisted spot cooling of a high heat density data center. Proceedings of Itherm 2004, Las Vegas Las Vegas (läs vā`gəs), city (1990 pop. 258,295), seat of Clark co., S Nev.; inc. 1911. It is the largest city in Nevada and the center of one of the fastest-growing urban areas in the United States. , NV, June 1-4. McCalpin, J. Stream: Sustainable Memory Bandwidth Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used in High Performance Computers. Computer Science Department, University of Virginia. http://www.cs.virginia.edu/stream/stream2/. Petitet, A., R.C. Whaley, J. Dongarra, and A. Cleary. 2004. HPL--A portable implementation of the high-performance Linpack benchmark for distruted-memory computers, version 1.0a. Innovative Computing Laboratory of the University of Tennessee The University of Tennessee (UT), sometimes called the University of Tennessee at Knoxville (UT Knoxville or UTK), is the flagship institution of the statewide land-grant University of Tennessee public university system in the American state of Tennessee. Computer Science Department. http://www.netlib.org/benchmark/hpl/. Rasmussen, N. 2005. Cooling strategies for ultra-high density racks and blade servers. APC (1) (American Power Conversion Corporation, West Kingston, RI, www.apcc.com) The leading manufacturer of UPS systems and surge suppressors, founded in 1981 by Rodger Dowdell, Neil Rasmussen and Emanual Landsman, three electronic power engineers who had worked at MIT. White Paper #46. http://www.apcmedia.com/salestools/SADE5-TNRK6_R4_EN.pdf. Schmidt, R., R.C. Chu, M. Ellsworth, M. Iyengar, and D. Porter. 2005. maintaining datacom rack inlet air temperatures with water-cooled heat exchanger. Proceedings of InterPACK05, San Francisco, CA, July 18-22. Sharma, R., C. Bash, C. Patel, and M. Beitelmal. 2004. Experimental investigation of design and performance of data centers. Proceedings of Itherm 2004, Las Vegas, NV, June 1-4. Wang, D. 2004. A passive solution to a difficult data center environmental problem. Proceedings of Itherm 2004, Las Vegas, NV, June 1-4. Tahir Cader, PhD Levi Westra Kevin Regimbal Ryan Mooney Tahir Cader is the technical director and Levi Westra is a mechanical engineer in the High Performance Computing Group, Isothermal i·so·ther·mal adj. Of, relating to, or indicating equal or constant temperatures. isothermal, isothermic having the same temperature. Systems Research, Liberty Lake, Washington Liberty Lake is an incorporated city in Spokane County, Washington USA near the Washington state line between Spokane, Washington and Coeur d'Alene, Idaho. The population was 4,660 at the 2000 census. . Kevin Regimbal is the Information Technology manager and Ryan Mooney is a technical specialist for MSCF Operations/EMSL at Pacific Northwest Laboratory, Richland, Washington Richland is a city in Benton County in southeastern Washington, at the confluence of the Yakima River and the Columbia River. As of the 2000 census, the city population was 38,708, with a 2005 population estimate of 43,520. .
Table 1. Distribution of Airflow Per Tile Throughout PNNL's Data Center
Number of Tiles 0 11 20 55 11 5 5
Airflow Rate (cfm) 550 600 650 700 750 800 more
Table 2. Supply Heat Index for an Air-Cooled Rack of Servers and a
Liquid-Cooled Rack of Servers
Air-Cooled Servers Liquid-Cooled Servers
Bottom server 0.342 0.068
Middle server 0.635 0.395
Top server 0.062 0.166
|
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion