# Simulations of air distributions in buildings by FFD on GPU.

INTRODUCTION

Computer simulations of air distributions have been widely applied in buildings (Nielsen 2004; Axley 2007; Megri and Haghighat 2007; Chen 2009). Many applications require the simulations to be both informative and fast. For instance, to design natural ventilation in a building, the designer needs to know the details of air velocity and temperature. In addition, the simulation should be fast enough to meet the rapid changes in design process. Another example is smoke and air management in case of building fire. If one can simulate detailed smoke distribution faster than real time, it could help the building fire management.

The most popular models for indoor airflow are nodal models and Computational Fluid Dynamics (CFD) models. Nodal models, including multizone models (Axley 2007) and zonal models (Megri and Haghighat 2007), assume that the air and species are uniform in a large space. This homogeneous assumption allows the nodal models to represent flow and species information in a building with a few nodes. Consequently, they need little computing effort. On the other hand, they are unable to describe the characteristics of flow in detail with the limited quantity of nodes. Moreover, the nodal models solve only the mass continuity, energy conservation, and species concentration equations but not the momentum equations (Wang 2007). Therefore, they fail to provide detailed and accurate information about the airflow and species transport.

By numerically solving the Navier-Stokes equations and other transport equations with an enormous number of computing nodes, the CFD can precisely capture the flow features (Ladeinde and Nearon 1997; Nielsen 2004; Chen et al. 2007). However, the CFD simulation usually requires long computing time. For instance, to precisely evaluate the annual energy performance of a small room of 3 x 3 x 3 m (9.84 x 9.84 x 9.84 ft) with detailed airflow information, a coupled energy-CFD simulation requires at least 150 h of computing time (Zhai and Chen 2003). Over 99% of the computing time is used by the CFD.

In order to accelerate the CFD simulation, some researchers (Crouse et al. 2002; Beghein et al. 2005; Mazumdar and Chen 2008) used multi-processor supercomputers or computer clusters. The speed was much faster, but this approach required expensive computing facilities, space for installing the computer, and a large cooling system to cool the computer (Feng and Hsu 2004). Hence, the multi-processor supercomputer or computer clusters is luxury for building designers and emergency management teams.

Ideally, one should be able to obtain detailed information about airflow motion, temperature distribution, and species concentration in faster-than-real time with minimal costs. This investigation explored different approaches to meet that challenge.

FAST FLUID DYNAMICS MODEL

The first approach is the use of fast fluid dynamics (FFD), which is an intermediate model between the nodal and CFD models. FFD, developed by Stam (1999) for computer flow visualization, can efficiently solve the Navier-Stokes equation (Equation 1), energy equation (Equation 2), and species transport equation (Equation 3):

[[partial derivative][U.sub.i]/[partial derivative]t] = - [U.sub.j][[partial derivative][U.sub.i]/[partial derivative][x.sub.j]] + v[[[[partial derivative].sup.2][U.sub.i]]/[partial derivative][x.sub.j.sup.2]] - [1/[rho]][[partial derivative]P/[partial derivative][x.sub.i]] + [1/[rho]][S.sub.[F,i]], (1)

[[partial derivative]T/[partial derivative]t] = - [U.sub.j][[partial derivative]T/[partial derivative][x.sub.j]] + [alpha][[[[partial derivative].sup.2]T]/[partial derivative][x.sub.j.sup.2]] + [S.sub.T], (2)

[[partial derivative][C.sub.i]/[partial derivative]t] = -[U.sub.j][[partial derivative][C.sub.i]/[partial derivative][x.sub.j]] + [k.sub.[C,i]][[[[partial derivative].sup.2][C.sub.i]]/[partial derivative][x.sub.j.sup.2]] + [S.sub.[C,i]], (3)

where i, j = 1, 2, 3, [U.sub.i] is the ith component of the velocity vector, P is the static pressure of a flow field, and [S.sub.[F,i]] is the ith component of the source, such as buoyancy force and other external forces. v denotes the kinematic viscosity, [rho] is fluid density, T is temperature, [alpha] is thermal diffusivity, and [S.sub.T] is the heat source. [C.sub.i] is the concentration for ith species, and [k.sub.[C,i]] and [S.sub.[C,i]] are corresponding diffusivity and source of ith species. Due to their similarity, Equations 1, 2, and 3 can be written as a general equation:

[[partial derivative][empty set]/[partial derivative]t] = -[U.sub.j][[partial derivative][empty set]/[partial derivative][x.sub.j]] + k[[[[partial derivative].sup.2][empty set]]/[partial derivative][x.sub.j.sup.2]] + S + G, (4)

where S is the source term and G is the pressure term. Corresponding variables and terms of Equations 1, 2, and 3 in Equation 4 are given in Table 1.

The FFD method applies a time-splitting method (Ferziger and Peric 2002) to solve the governing Equation 4. The purpose of the splitting method is to divide a complex problem or equation into several simple ones (John 1982; Levi and Peyroutet 2001; Ferziger and Peric 2002), which can be solved mathematically easily and quickly. Then solutions of these simple equations can be integrated into an approximated solution for the complex equation. The split equations in the FFD are as follows:

[[[[empty set].sup.(1)] - [[empty set].sup.(n)]]/[DELTA]t] = S, (5)

[[[[empty set].sup.(2)] - [[empty set].sup.(1)]]/[DELTA]t] = k[[[[partial derivative].sup.2][[empty set].sup.(2)]]/[partial derivative][x.sub.j.sup.2]], (6)

[[[[empty set].sup.(3)] - [[empty set].sup.(2)]]/[DELTA]t] = - [U.sub.j][[[partial derivative][[empty set].sup.(2)]]/[partial derivative][x.sub.j]], (7)

[[[[empty set].sup.[(n+1)]] - [[empty set].sup.(3)]/[DELTA]t] = G, (8)

where superscripts (1), (2), and (3) represent temporary variables.

The FFD sequentially computes the above four equations. The source is added through Equation 5. Then the FFD calculates diffusion Equation 6 by using a first-order implicit scheme. After that, advection Equation 7 is solved with a semi-Lagrangian solver (Courant et al. 1952). For the momentum equation, the FFD solves pressure Equation 8 together with the continuity equation by using a pressure-correction projection method (Chorin 1967). It is worth noting that there is an extra projection step before the advection step in the implemented FFD code, which is to provide a divergence-free velocity field for the semi-Lagrangian solver in the advection equation.

Performance of the FFD has been systematically evaluated by simulating different indoor airflows, including a fully developed plane channel flow (Kim et al. 1987), a forced convection flow (Nielsen 1990), a natural convection flow (Betts and Bokhari 2000), and a mixed convection flow (Blay et al. 1992). As a comparison, the same flows were also computed using the commercial CFD software FLUENT (www.fluent.com) with standard RNG k-[epsilon] model (Yakhot and Orszag 1986). For instance, Figure 1 compares the prediction of FFD and CFD for mixed convection flow, which represents airflow in a room with mechanical ventilation and floor heating. The grid resolution was 20 x 20 for both FFD and CFD, but grid distributions were adjusted to obtain the best result for each model. As shown in Figure 1a, cold air goes into the room through the upper-left corner. The temperature of supply air is at 15[degrees]C (59[degrees]F), which is the same as that on the side walls and ceiling. The floor is heated to 35[degrees]C (95[degrees]F), so the averaged room air temperature was around 19[degrees]C (66.2[degrees]F). Figure 1b compares the predicted temperature distribution at the center of the room by using the FFD and CFD with the same numerical settings. The CFD prediction agrees with the experimental data (Blay et al. 1992) at most measured points. The FFD also captured the mixed feature of the air. However, there were some difference between the FFD prediction and the experimental data. Figure 1c compares the horizontal velocity. Again, the CFD results had a good agreement with the experimental data. The FFD got the correct direction of flow direction, but overpredicted the velocity.

[FIGURE 1 OMITTED]

Although the FFD is not as accurate as the CFD, it can provide more detailed information than multizone or zonal models. In addition, all the information can be visualized online. Similar to the computer games, our FFD program allows users to interact with the program during the simulation, such as releasing contaminants and changing boundary conditions. Figure 2 shows screen shots of the FFD simulation for the mixed convection case. In velocity field window (Figure 2a), there is a large clockwise circulation due to the inlet jet's inertial momentum. Meanwhile, there were also small recirculations near the corners due to the wall influence. Figure 2b illustrates temperature distribution that shows the mixing of hot air from the floor with cold air from the jet. Blay's experiment only measured velocity and temperature. Our FFD simulation for concentration started with a uniform distribution of white smoke (species) in the room. Then, the smoke was diluted by the fresh supply air, as shown in Figure 2c. The smoke concentration was low at the flow path and high at the center of the large recirculation, which looks plausible. As a whole, the FFD gives sufficient flow information for conceptual design and emergency management.

[FIGURE 2 OMITTED]

Table 2 summarizes the performances of FFD and CFD in the four studied cases. We defined relative error of simulation as |([[empty set].sub.sim] - [[empty set].sub.exp])/[[empty set].sub.exp]|, where [[empty set].sub.sim] and [[empty set].sub.exp] are simulation and experimental data, respectively. Model performance is ranked as 4 if the relative errors are less than 10% at the majority (>80%) of measured points. If the error is between 10 and 30%, the rank is 3. Accordingly, rank 2 is for an error between 30 and 50%, and 1 for errors greater than 50%. The grade performance average of the FFD model was 2.25/4.0, which means it can capture the general trend of the flow, but is not very accurate. This is not surprising, because FFD was proposed to produce a plausible flow in real time (Stam 1999). On the other hand, the grade performance average of the CFD with RND k-[epsilon] model was 3.75/4.0, which suggests why the RNG k-[epsilon] model is recommended for indoor airflow simulations (Chen 1995; Zhang et al. 2007). For more details of the FFD model, see Zuo and Chen (2009a).

By sacrificing some accuracy through the numerical scheme, FFD can gain significant improvements on computing speed. Figure 3 compares the computing time required by the FFD and CFD with the same numerical settings. The time step size was 0.1 s. The computing time for the two models linearly varied with grid number. However, the time required by the FFD was only 2% of that by the CFD.

[FIGURE 3 OMITTED]

The FFD model's accuracy and speed can be further improved. For instance, the FFD has significant numerical diffusion due to the linear interpolation used in the semi-Lagrangian solver for the advection equation. For simplicity, the one dimensional form of the linear interpolation is as follows:

[empty set](x) = [[empty set].sub.i] + (x - [x.sub.i]) [[[[empty set].sub.[i+1]] - [[empty set].sub.i]]/[DELTA]x] (9)

where [DELTA]x is mesh size, x is between [x.sub.i] and [x.sub.[i+1]], and [[empty set].sub.i] and [[empty set].sub.[i+1]] are [empty set] at [x.sub.i] and [x.sub.[i+1]], respectively.

To reduce the numerical diffusion, some researchers (Fedkiw et al. 2001; Song et al. 2005) tried high-order interpolations in the semi-Lagrangian solver. However, none of those approaches was satisfactory (Zuo and Chen 2010a). On one hand, the low-order interpolation may introduce numerical diffusion, but can stabilize simulation. A high-order interpolation can reduce numerical diffusion, but may lead to numerical dispersion.

To obtain a stable interpolation with low numerical diffusion, we may combine different schemes to obtain a hybrid method. For instance, if the profile monotonously increases or decreases, a high-order scheme can be applied to obtain better accuracy. Otherwise, a less accurate but more stable low-order scheme may be used to damp the oscillations.

In this study, we proposed a hybrid scheme by using the first- and third-order interpolations. Assuming a uniform grid distribution, the one-dimensional formula of the hybrid interpolation is as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

This study simulates two simple flows to evaluate the linear and hybrid interpolations. One is the transportation of a one-dimensional triangular wave in inviscid fluid. The initial condition of the flow is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)

Neumann boundary conditions are applied on both sides of the domain:

[[partial derivative][empty set]/[partial derivative]x]] = 0 (12)

If the wave is traveling from left to right at a velocity of 1 m/s (3.28 ft), then exact solution is

[empty set](x, t) = [empty set](x - t, 0). (13)

A one-dimensional uniform mesh with 200 grids was applied for the triangular wave case. The other case is a lid-driven cavity flow, which is well defined in the literature (Ghia et al. 1982; Shankar and Deshpande 2000; Erturk et al. 2005). A uniform 65 x 65 mesh was used for this flow.

Figure 4 compares the FFD results with the linear and hybrid interpolations for both flows. FFD with hybrid interpolation can compute the flow profiles much better than with linear interpolation.

[FIGURE 4 OMITTED]

Besides the numerical diffusion, our studies also showed that the FFD simulation results did not always satisfy mass conservation. One possible reason is that the pressure-correction step in FFD only corrects the velocities once. Thus, the corrected velocity may not satisfy the mass conservation. In comparison, pressure correction in the SIMPLE algorithm (Patankar 1980) requires multiple corrections to achieve a converged solution. We have tried multiple pressure corrections in FFD, but have not seen obvious improvements. In addition, the simulation speed dramatically slows down due to extra computing costs on the trial-correction loop. Although the reason for mass imbalance is not clear yet, we can still take a simple approach to force a mass balance over a domain:

[U.sub.perp.sup.new][|.sub.out] [1 + [alpha]([[M.sub.in]/[M.sub.out]] - 1)][U.sub.perp][|.sub.out] (14)

where [M.sub.in] and [M.sub.out] are total mass flows into and out of the domain, respectively, a is a correction coefficient, and [U.sub.perp][|.sub.out] and [U.sub.perp.sup.new][|.sub.out] are the velocities perpendicular to the outlet surface before and after the correction. To make the numerical simulation stable, the recommended value of [alpha] is 0.7.

The benefits of applying mass correction Equation 14 can be seen from an example of a forced convection flow in an empty room (Nielsen 1990). As shown in Figure 5a, air was injected from the inlet at the left-upper corner and was exhausted from the outlet at the right-lower corner. The CFD results (Zuo and Chen 2009b) obtained with the RNG k-[epsilon] model (Yakhot and Orszag 1986) agreed with the experimental data (Nielsen 1990). Thus, they were used as reference here since no streamlines are available in experimental data. Figures 5c and 5d are streamlines predicted by FFD without and with the mass correction. Obviously, the streamlines by FFD with mass correction (Figure 5d) are much better than those without (Figure 5c). It is worth mentioning that FFD without mass correction computed similar streamlines as the CFD with k-[omega] SST model did (Rong and Nielsen 2008). This similarity is probably coincidental, because the FFD and CFD with k-[omega] SST models are quite different.

[FIGURE 5 OMITTED]

Besides the accuracy, one can accelerate the FFD program by optimizing the numerical scheme. For instance, solving the advection equation first eliminates an extra projection function, and can reduce the computing time by 23% (Zuo 2010). In addition, computing time can be further reduced by optimizing implementation of the FFD program, such as storing the coefficient matrices during the computing of matrix equations instead of calculating them every time. This effort can save another 30% on computing time (Zuo 2010). As a whole, optimization of the numerical scheme and model implementation can save around 50% of the computing time.

GRAPHICS PROCESSING UNIT

The second approach is to run the FFD in parallel on graphics processing units (GPUs). CPUs were originally designed for computer graphics, with a highly parallelized structure for image processing. A GPU can have a few hundred processors, so it is powerful. Until recently, running flow simulations on GPUs was difficult because it required special programming skills. With new general-purpose GPU programming languages, such as Stream by AMD (2010), CUDA by NVIDIA (2007), Ct by Intel (2010), and Brook by Stanford (Stanford University Graphics Lab 2010), it is possible to expand GPU applications from visualization to general-purpose computing, including linear algebra (Bell and Garland 2008; Ries et al. 2009), signal processing (Tenllado et al. 2008), molecular dynamics (Yang et al. 2007; Anderson et al. 2008), and indoor airflow simulations (Zuo and Chen 2009b, 2010b).

General parallel programming on multiple CPUs can be applied for GPU programming, although some details may be different due to the specific structure of GPU hardware. Our investigation used the CUDA language (NVIDIA 2007) on a NVIDIA GeForce 8800 GPU. CUDA treats CPU as host, and the GPU as device. The host controls the entire program, initializes data, and writes out results. The device conducts parallel computing with initialized data from the host. After the computation, results are sent back to the host. CUDA further divides the device into three levels: grid, block, and thread. A GPU consists of grids, each of which includes multiple blocks, which are made up of many threads. A thread is the basic computing unit; a GeForce 8800 GPU can have as many as 12,288 threads running at the same time. For simplicity, our implementation defined only one grid with multiple blocks, each of which has 256 threads. To associate the threads and mesh data, we assigned only one grid to one thread. Thus, if the number of grids is a multiple of 256, the GPU needs the same quantity of threads. This is a balanced allocation. Otherwise, the allocation is not balanced. For example, to carry 257 grids, the program needs 2 GPU blocks with 512 threads in total, although 255 threads are not associated with any grids. Unfortunately, unbalanced allocation has serious consequence on performance.

To evaluate GPU performance for FFD simulations, FFD simulations with same numerical settings were performed on both CPU and GPU. Figure 6 compares the results for a natural convection flow in a tall cavity. The lines are computed velocity and temperature profiles at various heights across the cavity and the symbols are experimental data (Betts and Bokhari 2000). Although discrepancies existed between the simulated results and the experimental data due to the defect of the FFD model, the FFD simulations on CPU and GPU yielded the same results. Thus, as computing hardware, the GPU is as trustworthy as the CPU.

[FIGURE 6 OMITTED]

Computing speed for the FFD on the GPU was much faster than on the CPU. When the allocation of grids and threads is balanced, the speed up can be 30 times. Even in an unbalanced situation, the GPU code is still 10 times faster than the CPU version. Note that simulations on the CPU and GPU were done in single precision, because the GPU in our study did not support double precision.

As a whole, FFD on a GPU can be 500 to 1500 times faster than the CFD on CPU. In other words, if a CFD simulation on CPU needs 24 hours, the FFD on GPU can provide the same mount of information within one minute. With this speed, FFD on GPU can do a real-time simulation at with half a million grids and [DELTA]t = 0.1 s, which can be sufficient for conceptual design of a small building.

Our current GPU program was intended to evaluate the possibility of flow simulation on GPU, so the code was not optimized yet. As shown in Table 3, even at the best performance, our GPU code used only 3% of the computing power offered by the GPU. Thus, there is a great potential to accelerate the simulation speed through optimization. For instance, instead of writing code to solve matrix equations, more efficient solvers that have recently been added to the CUDA library can be used.

Running FFD on better CPUs is another way to reduce computing time. The GPU used in our study was purchased in 2007, which is not the fastest nowadays. For instance, a NVIDIA Tesla C2050 GPU (NVIDIA 2010) is four times faster than ours. In addition, performance can be even higher by using multi-GPUs systems. For instance, a Tesla C2050 GPU system with four C2050 GPUs can be around 2 teraflops for double precision and 4 teraflops for single precision. Using only 5% of this computing capacity, FFD on a Tesla GPU system can be about 558 times faster than the FFD on a CPU, and 27,900 times faster than the CFD on a CPU. With a time step size of 0.1 s, this speed is sufficient for a real-time flow simulation with [10.sup.7] grids, which is enough for a moderately sized building.

CONCLUSION

This paper discussed a new technique for informative and fast simulations of air distributions in buildings. This investigation used an FFD model to provide the same detailed information of air distribution as a CFD model. Although the accuracy of the FFD model was not as good as the CFD's, FFD was 50 times faster than CFD.

The accuracy of the FFD model has been improved by reducing numerical diffusion with a hybrid interpolation method in semi-Lagrangian solver and enforcing mass conservation with a correction function. The computing speed of the FFD model can be further accelerated by modifying the time-splitting method and by optimizing the FFD program.

The other approach to accelerate FFD simulation is running it on GPUs. The FFD model on a GPU produced the same results as that on a CPU, but the speed is 10 to 30 times faster. As a whole, FFD on a GPU is 500 to 1500 times faster than CFD on a CPU. This speed is sufficient for a real-time flow simulation for a small building with half million grids and [DELTA]t = 0.1 s. In addition, the speed can be further accelerated by optimizing the implementation and using better GPUs or GPU clusters, so it is possible to do real-time flow simulation for a moderately sized building with [10.sup.7] grids and [DELTA] t = 0.1 s.

ACKNOWLEDGMENT

This study was funded by the U.S. Federal Aviation Administration (FAA) Office of Aerospace Medicine through the National Air Transportation Center of Excellence for Research in the Intermodal Transport Environment, under Cooperative Agreement 07-CRITE-PU. Although the FAA sponsored this project, it neither endorses nor rejects the findings of this research. The presentation of this information is in the interest of invoking technical community comment on the results and conclusions of the research.

Dr. Wangda Zuo would also like to thank ASHRAE for providing him a grant-in-aid during the study.

NOMENCLATURE

C = concentration of species

[k.sub.C] = diffusivity of species

M = mass flow rate

P = static pressure of flow field

[S.sub.C] = source of species

[S.sub.[F,i]] = ith component of the source in momentum equation

[S.sub.T] = heat source

T = temperature

U = horizontal velocity

[U.sub.i] = ith component of the velocity vector

x = coordinate at horizontal direction

Greek Symbols

[alpha] = thermal diffusivity: model coefficient

[DELTA]t = time step size

[DELTA]x = mesh size

[empty set] = field variable

v = kinematic viscosity of fluid

[rho] = density of fluid

Subscripts

in = inlet boundary

out = outlet boundary

perp = perpendicular

REFERENCES

Advanced Micro Devices. 2010. ATI stream software development kit (SDK) v2.01. [cited; Available from: http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx.

Anderson, J.A, C.D. Lorenz, and A. Travesset. 2008. General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics 227(10):5342-59.

Axley, J. 2007. Multizone airflow modeling in buildings: History and theory. HVAC&R Research 13(6): 907-28.

Beghein, C., Y. Jiang, and Q. Chen. 2005. Using large eddy simulation to study particle motions in a room. Indoor Air 15(4):281-90.

Bell, N. and M. Garland. 2008. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004.

Betts, P.L. and I.H. Bokhari. 2000. Experiments on turbulent natural convection in an enclosed tall cavity. International Journal of Heat and Fluid Flow 21(6):675-83.

Blay, D., S. Mergui, and C. Niculae. 1992. Confined turbulent mixed convection in the presence of horizontal buoyant wall jet. Fundamentals of Mixed Convection 213:65-72.

Chen, Q. 1995. Comparison of different k-[epsilon] models for indoor air-flow computations. Numerical Heat Transfer Part B: Fundamentals 28(3):353-69.

Chen, Q. 2009. Ventilation performance prediction for buildings: A method overview and recent applications. Building and Environment 44(4):848-58.

Chen, Q., Z. Zhang, and W. Zuo. 2007. Computational fluid dynamics for indoor environment modeling: Past, present, and future. Presented at The 6th International Indoor Air Quality, Ventilation and Energy Conservation in Buildings Conference (IAQVEC 2007). Sendai, Japan.

Chorin, A.J. 1967. A numerical method for solving incompressible viscous flow problems. Journal of Computational Physics 2(1): 12-26.

Courant, R., E. Isaacson, and M. Rees. 1952. On the solution of nonlinear hyperbolic differential equations by finite differences. Communication on Pure and Applied Mathematics 5:243-55.

Crouse, B., M. Krafezyk, S. Kuhner, E. Ranka, and C. van Treeck. 2002. Indoor air flow analysis based on lattice Boltzmann methods. Energy and Buildings 34(9):941-49.

Erturk, E., T.C. Corke, and C. Gokcol. 2005. Numerical solutions of 2-D steady incompressible driven cavity flow at high Reynolds numbers. International Journal for Numerical Methods in Fluids 48(7): 747-74.

Fedkiw, R., J. Stain, and H.W. Jensen. 2001. Visual simulation of smoke. Presented at SIGGRAPH 2001, Los Angeles.

Feng. W. and C. Hsu. 2004. The origin and evolution of green destiny. Presented at IEEE Cool Chips VII: An International Symposium on Low-Power and High-Speed Chips. Yokohama, Japan.

Ferziger, J.H. and M. Peric. 2002. Computational methods tor fluid dynamics, 3rd ed. New York: Springer.

Ghia, U., K.N. Ghia, and C.T. Shin. 1982. High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method. Journal of Computational Physics 48(3):387-411.

Integrated Electronics Corporation. 2010. Ct technology, [cited; Available from: http://software.mtel. com/en-us/data-parallel/.

John, R. 1982. Time-split methods for partial differential equations. Ph.D. dissertation, Department of Computer Science, Stanford University.

Kim, J., P. Moin, and R. Moser. 1987. Turbulence statistics in fully-developed channel flow at low Reynolds-number. Journal of Fluid Mechanics 177:133-66.

Ladeinde, F. and M.D. Nearon. 1997. CFD applications in the HVAC&R industry. ASHRAE Journal 39( 1 ):44-48.

Levi, L. and F. Peyroutet. 2001. A time-fractional step method for conservation law related obstacle problems. Advances in Applied Mathematics 27(4):768-89.

Mazumdar. S. and Q. Chen. 2008. Influence of cabin conditions on placement and response of contaminant detection sensors in a commercial aircraft. Journal of Environmental Monitoring 10 (1):71-81.

Megri, A.C. and F. Haghighat. 2007. Zonal modeling for simulating indoor environment of buildings: Review, recent developments, and applications. HVAC&R Research 13(6):887-905.

Nielsen. P.V. 1990. Specification of a two-dimensional test case. Aalborg University. Denmark.

Nielsen. P.V. 2004. Computational fluid dynamics and room air movement. Indoor Air 14:134-43.

NVIDIA, 2007. NVIDIA CUDA compute unified device architecture--Programming guide (version 1.1). NVIDIA Corporation: Santa Clara, CA.

NVIDIA. 2010. Tesla C2050/C2070 GPU computing processor, [cited; Available from: http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html.

Patankar, S.V. 1980. Numerical heat transfer and fluid flow. Hemisphere: New York.

Ries, F., T.D. Marco, M. Zivieri. and R. Guerrieri. 2009. Triangular matrix inversion on graphics processing unit. Presented at Conference on High Performance Networking and Computing, Portland, Oregon.

Rong, L. and P.V. Nielsen. 2008. Simulation with different turbulence models in an annex 20 room benchmark test using Ansys CFX 11.0. DCE Technical Report No. 46. Department of Civil Engineering. Aalborg University, Denmark.

Shankar, P.N. and M.D. Deshpande. 2000. Fluid mechanics in the driven cavity. Annual Review of Fluid Mechanics 32:93-136.

Song, O.-Y., H. Shin, and H.-S. Ko. 2005. Stable but nondissipative water. ACM Transactions on Graphics 24(l):81-97.

Stam, J. 1999. Stable fluids. Presented at 26th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'99). Los Angeles.

Stanford University Graphics Lab. 2010. BrookGPU. [cited; Available from: http://graphics.stanford.edu/projects/brookgpu/.

Tenllado, C, J. Setoain, M. Prieto, L. Pinuel. and F. Tirado. 2008. Parallel implementation of the 2D discrete wavelet transform on graphics processing units: Filter bank versus lifting. IEEE Transactions on Parallel and Distributed Systems 19(3):299-310.

Wang, L 2007. Coupling of multizone and CFD programs for building airflow and contaminant transport simulations. Ph.D. dissertation, Department of Mechanical Engineering, Purdue University, West Lafayette, IN.

Yakhot, V. and S.A. Orszag. 1986. Renormalization-group analysis of turbulence. Physical Review Letters 57(14): 1722-24.

Yang, J.K., Y.J. Wang, and Y.F. Chen. 2007. GPU accelerated molecular dynamics simulation of thermal conductivities. Journal of Computational Physics 221(2):799-804.

Zhai, Z. and Q. Chen. 2003. Solution characters of iterative coupling between energy simulation and CFD programs. Energy and Buildings 35(5):493-505.

Zhang, Z., W. Zhang, Z. Zhai, and Q. Chen. 2007. Evaluation of various turbulence models in predicting airflow and turbulence in enclosed environments by CFD: Part 2: Comparison with experimental data from literature. HVAC&R Research 13(6):871-86.

Zuo, W. 2010. Advanced simulations of air distributions in buildings. Ph.D. dissertation, Department of Mechanical Engineering, Purdue University. West Lafayette, IN.

Zuo, W. and Q. Chen. 2009a. Real-time or faster-than-real-time simulation of airflow in buildings. Indoor Air 19(1):33-44.

Zuo, W. and Q. Chen. 2009b. Fast parallelized flow simulations on graphic processing units. Presented at the 11th International Conference on Air Distribution in Rooms (RoomVent 2009), Busan, Korea.

Zuo, W. and Q. Chen. 2010a. Fast and informative flow simulations in a building by using fast fluid dynamics model on graphics processing unit. Building and Environment 45(3):747-57.

Zuo, W. and Q. Chen. 2010b. Improvements on the fast fluid dynamics model for indoor airflow simulation. Presented at the 4th National Conference of IBPSA-USA (SimBuild 2010), New York, NY.

Received April 12, 2010; accepted July 20, 2010

Wangda Zuo and Qingyan Chen work at the National Air Transportation Center of Excellence for Research in the Intermodal Transport Environment (RITE), School of Mechanical Engineering, Purdue University, West Lafayette. IN.

Wangda Zuo, PhD

Associate Member ASHRAE

Qingyan Chen, PhD

Fellow ASHRAE

Computer simulations of air distributions have been widely applied in buildings (Nielsen 2004; Axley 2007; Megri and Haghighat 2007; Chen 2009). Many applications require the simulations to be both informative and fast. For instance, to design natural ventilation in a building, the designer needs to know the details of air velocity and temperature. In addition, the simulation should be fast enough to meet the rapid changes in design process. Another example is smoke and air management in case of building fire. If one can simulate detailed smoke distribution faster than real time, it could help the building fire management.

The most popular models for indoor airflow are nodal models and Computational Fluid Dynamics (CFD) models. Nodal models, including multizone models (Axley 2007) and zonal models (Megri and Haghighat 2007), assume that the air and species are uniform in a large space. This homogeneous assumption allows the nodal models to represent flow and species information in a building with a few nodes. Consequently, they need little computing effort. On the other hand, they are unable to describe the characteristics of flow in detail with the limited quantity of nodes. Moreover, the nodal models solve only the mass continuity, energy conservation, and species concentration equations but not the momentum equations (Wang 2007). Therefore, they fail to provide detailed and accurate information about the airflow and species transport.

By numerically solving the Navier-Stokes equations and other transport equations with an enormous number of computing nodes, the CFD can precisely capture the flow features (Ladeinde and Nearon 1997; Nielsen 2004; Chen et al. 2007). However, the CFD simulation usually requires long computing time. For instance, to precisely evaluate the annual energy performance of a small room of 3 x 3 x 3 m (9.84 x 9.84 x 9.84 ft) with detailed airflow information, a coupled energy-CFD simulation requires at least 150 h of computing time (Zhai and Chen 2003). Over 99% of the computing time is used by the CFD.

In order to accelerate the CFD simulation, some researchers (Crouse et al. 2002; Beghein et al. 2005; Mazumdar and Chen 2008) used multi-processor supercomputers or computer clusters. The speed was much faster, but this approach required expensive computing facilities, space for installing the computer, and a large cooling system to cool the computer (Feng and Hsu 2004). Hence, the multi-processor supercomputer or computer clusters is luxury for building designers and emergency management teams.

Ideally, one should be able to obtain detailed information about airflow motion, temperature distribution, and species concentration in faster-than-real time with minimal costs. This investigation explored different approaches to meet that challenge.

FAST FLUID DYNAMICS MODEL

The first approach is the use of fast fluid dynamics (FFD), which is an intermediate model between the nodal and CFD models. FFD, developed by Stam (1999) for computer flow visualization, can efficiently solve the Navier-Stokes equation (Equation 1), energy equation (Equation 2), and species transport equation (Equation 3):

[[partial derivative][U.sub.i]/[partial derivative]t] = - [U.sub.j][[partial derivative][U.sub.i]/[partial derivative][x.sub.j]] + v[[[[partial derivative].sup.2][U.sub.i]]/[partial derivative][x.sub.j.sup.2]] - [1/[rho]][[partial derivative]P/[partial derivative][x.sub.i]] + [1/[rho]][S.sub.[F,i]], (1)

[[partial derivative]T/[partial derivative]t] = - [U.sub.j][[partial derivative]T/[partial derivative][x.sub.j]] + [alpha][[[[partial derivative].sup.2]T]/[partial derivative][x.sub.j.sup.2]] + [S.sub.T], (2)

[[partial derivative][C.sub.i]/[partial derivative]t] = -[U.sub.j][[partial derivative][C.sub.i]/[partial derivative][x.sub.j]] + [k.sub.[C,i]][[[[partial derivative].sup.2][C.sub.i]]/[partial derivative][x.sub.j.sup.2]] + [S.sub.[C,i]], (3)

where i, j = 1, 2, 3, [U.sub.i] is the ith component of the velocity vector, P is the static pressure of a flow field, and [S.sub.[F,i]] is the ith component of the source, such as buoyancy force and other external forces. v denotes the kinematic viscosity, [rho] is fluid density, T is temperature, [alpha] is thermal diffusivity, and [S.sub.T] is the heat source. [C.sub.i] is the concentration for ith species, and [k.sub.[C,i]] and [S.sub.[C,i]] are corresponding diffusivity and source of ith species. Due to their similarity, Equations 1, 2, and 3 can be written as a general equation:

[[partial derivative][empty set]/[partial derivative]t] = -[U.sub.j][[partial derivative][empty set]/[partial derivative][x.sub.j]] + k[[[[partial derivative].sup.2][empty set]]/[partial derivative][x.sub.j.sup.2]] + S + G, (4)

where S is the source term and G is the pressure term. Corresponding variables and terms of Equations 1, 2, and 3 in Equation 4 are given in Table 1.

Table 1. Corresponding Terms and Variables in Equation 4 Variables Momentum Equation 1 Energy Species Transport Equation Equation 3 2 [empty set] [U.sub.i] T [C.sub.i] k v [alpha] [K.sub.[c,i]] S [S.sub.[[F,i]/[rho]]] [S.sub.T] [S.sub.[C,i]] G -[1/[rho]] [[delta]P/ 0 0 [delta][x.sub.i]]

The FFD method applies a time-splitting method (Ferziger and Peric 2002) to solve the governing Equation 4. The purpose of the splitting method is to divide a complex problem or equation into several simple ones (John 1982; Levi and Peyroutet 2001; Ferziger and Peric 2002), which can be solved mathematically easily and quickly. Then solutions of these simple equations can be integrated into an approximated solution for the complex equation. The split equations in the FFD are as follows:

[[[[empty set].sup.(1)] - [[empty set].sup.(n)]]/[DELTA]t] = S, (5)

[[[[empty set].sup.(2)] - [[empty set].sup.(1)]]/[DELTA]t] = k[[[[partial derivative].sup.2][[empty set].sup.(2)]]/[partial derivative][x.sub.j.sup.2]], (6)

[[[[empty set].sup.(3)] - [[empty set].sup.(2)]]/[DELTA]t] = - [U.sub.j][[[partial derivative][[empty set].sup.(2)]]/[partial derivative][x.sub.j]], (7)

[[[[empty set].sup.[(n+1)]] - [[empty set].sup.(3)]/[DELTA]t] = G, (8)

where superscripts (1), (2), and (3) represent temporary variables.

The FFD sequentially computes the above four equations. The source is added through Equation 5. Then the FFD calculates diffusion Equation 6 by using a first-order implicit scheme. After that, advection Equation 7 is solved with a semi-Lagrangian solver (Courant et al. 1952). For the momentum equation, the FFD solves pressure Equation 8 together with the continuity equation by using a pressure-correction projection method (Chorin 1967). It is worth noting that there is an extra projection step before the advection step in the implemented FFD code, which is to provide a divergence-free velocity field for the semi-Lagrangian solver in the advection equation.

Performance of the FFD has been systematically evaluated by simulating different indoor airflows, including a fully developed plane channel flow (Kim et al. 1987), a forced convection flow (Nielsen 1990), a natural convection flow (Betts and Bokhari 2000), and a mixed convection flow (Blay et al. 1992). As a comparison, the same flows were also computed using the commercial CFD software FLUENT (www.fluent.com) with standard RNG k-[epsilon] model (Yakhot and Orszag 1986). For instance, Figure 1 compares the prediction of FFD and CFD for mixed convection flow, which represents airflow in a room with mechanical ventilation and floor heating. The grid resolution was 20 x 20 for both FFD and CFD, but grid distributions were adjusted to obtain the best result for each model. As shown in Figure 1a, cold air goes into the room through the upper-left corner. The temperature of supply air is at 15[degrees]C (59[degrees]F), which is the same as that on the side walls and ceiling. The floor is heated to 35[degrees]C (95[degrees]F), so the averaged room air temperature was around 19[degrees]C (66.2[degrees]F). Figure 1b compares the predicted temperature distribution at the center of the room by using the FFD and CFD with the same numerical settings. The CFD prediction agrees with the experimental data (Blay et al. 1992) at most measured points. The FFD also captured the mixed feature of the air. However, there were some difference between the FFD prediction and the experimental data. Figure 1c compares the horizontal velocity. Again, the CFD results had a good agreement with the experimental data. The FFD got the correct direction of flow direction, but overpredicted the velocity.

[FIGURE 1 OMITTED]

Although the FFD is not as accurate as the CFD, it can provide more detailed information than multizone or zonal models. In addition, all the information can be visualized online. Similar to the computer games, our FFD program allows users to interact with the program during the simulation, such as releasing contaminants and changing boundary conditions. Figure 2 shows screen shots of the FFD simulation for the mixed convection case. In velocity field window (Figure 2a), there is a large clockwise circulation due to the inlet jet's inertial momentum. Meanwhile, there were also small recirculations near the corners due to the wall influence. Figure 2b illustrates temperature distribution that shows the mixing of hot air from the floor with cold air from the jet. Blay's experiment only measured velocity and temperature. Our FFD simulation for concentration started with a uniform distribution of white smoke (species) in the room. Then, the smoke was diluted by the fresh supply air, as shown in Figure 2c. The smoke concentration was low at the flow path and high at the center of the large recirculation, which looks plausible. As a whole, the FFD gives sufficient flow information for conceptual design and emergency management.

[FIGURE 2 OMITTED]

Table 2 summarizes the performances of FFD and CFD in the four studied cases. We defined relative error of simulation as |([[empty set].sub.sim] - [[empty set].sub.exp])/[[empty set].sub.exp]|, where [[empty set].sub.sim] and [[empty set].sub.exp] are simulation and experimental data, respectively. Model performance is ranked as 4 if the relative errors are less than 10% at the majority (>80%) of measured points. If the error is between 10 and 30%, the rank is 3. Accordingly, rank 2 is for an error between 30 and 50%, and 1 for errors greater than 50%. The grade performance average of the FFD model was 2.25/4.0, which means it can capture the general trend of the flow, but is not very accurate. This is not surprising, because FFD was proposed to produce a plausible flow in real time (Stam 1999). On the other hand, the grade performance average of the CFD with RND k-[epsilon] model was 3.75/4.0, which suggests why the RNG k-[epsilon] model is recommended for indoor airflow simulations (Chen 1995; Zhang et al. 2007). For more details of the FFD model, see Zuo and Chen (2009a).

Table 2. Grade Performance Average of FFD and CFD Models for Indoor Airflows Model Channel Flow Forced Natural Mixed Averaged Convection Convection Convection FFD 2 3 2 2 2.25 CFD 4 4 3 4 3.75

By sacrificing some accuracy through the numerical scheme, FFD can gain significant improvements on computing speed. Figure 3 compares the computing time required by the FFD and CFD with the same numerical settings. The time step size was 0.1 s. The computing time for the two models linearly varied with grid number. However, the time required by the FFD was only 2% of that by the CFD.

[FIGURE 3 OMITTED]

The FFD model's accuracy and speed can be further improved. For instance, the FFD has significant numerical diffusion due to the linear interpolation used in the semi-Lagrangian solver for the advection equation. For simplicity, the one dimensional form of the linear interpolation is as follows:

[empty set](x) = [[empty set].sub.i] + (x - [x.sub.i]) [[[[empty set].sub.[i+1]] - [[empty set].sub.i]]/[DELTA]x] (9)

where [DELTA]x is mesh size, x is between [x.sub.i] and [x.sub.[i+1]], and [[empty set].sub.i] and [[empty set].sub.[i+1]] are [empty set] at [x.sub.i] and [x.sub.[i+1]], respectively.

To reduce the numerical diffusion, some researchers (Fedkiw et al. 2001; Song et al. 2005) tried high-order interpolations in the semi-Lagrangian solver. However, none of those approaches was satisfactory (Zuo and Chen 2010a). On one hand, the low-order interpolation may introduce numerical diffusion, but can stabilize simulation. A high-order interpolation can reduce numerical diffusion, but may lead to numerical dispersion.

To obtain a stable interpolation with low numerical diffusion, we may combine different schemes to obtain a hybrid method. For instance, if the profile monotonously increases or decreases, a high-order scheme can be applied to obtain better accuracy. Otherwise, a less accurate but more stable low-order scheme may be used to damp the oscillations.

In this study, we proposed a hybrid scheme by using the first- and third-order interpolations. Assuming a uniform grid distribution, the one-dimensional formula of the hybrid interpolation is as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

This study simulates two simple flows to evaluate the linear and hybrid interpolations. One is the transportation of a one-dimensional triangular wave in inviscid fluid. The initial condition of the flow is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)

Neumann boundary conditions are applied on both sides of the domain:

[[partial derivative][empty set]/[partial derivative]x]] = 0 (12)

If the wave is traveling from left to right at a velocity of 1 m/s (3.28 ft), then exact solution is

[empty set](x, t) = [empty set](x - t, 0). (13)

A one-dimensional uniform mesh with 200 grids was applied for the triangular wave case. The other case is a lid-driven cavity flow, which is well defined in the literature (Ghia et al. 1982; Shankar and Deshpande 2000; Erturk et al. 2005). A uniform 65 x 65 mesh was used for this flow.

Figure 4 compares the FFD results with the linear and hybrid interpolations for both flows. FFD with hybrid interpolation can compute the flow profiles much better than with linear interpolation.

[FIGURE 4 OMITTED]

Besides the numerical diffusion, our studies also showed that the FFD simulation results did not always satisfy mass conservation. One possible reason is that the pressure-correction step in FFD only corrects the velocities once. Thus, the corrected velocity may not satisfy the mass conservation. In comparison, pressure correction in the SIMPLE algorithm (Patankar 1980) requires multiple corrections to achieve a converged solution. We have tried multiple pressure corrections in FFD, but have not seen obvious improvements. In addition, the simulation speed dramatically slows down due to extra computing costs on the trial-correction loop. Although the reason for mass imbalance is not clear yet, we can still take a simple approach to force a mass balance over a domain:

[U.sub.perp.sup.new][|.sub.out] [1 + [alpha]([[M.sub.in]/[M.sub.out]] - 1)][U.sub.perp][|.sub.out] (14)

where [M.sub.in] and [M.sub.out] are total mass flows into and out of the domain, respectively, a is a correction coefficient, and [U.sub.perp][|.sub.out] and [U.sub.perp.sup.new][|.sub.out] are the velocities perpendicular to the outlet surface before and after the correction. To make the numerical simulation stable, the recommended value of [alpha] is 0.7.

The benefits of applying mass correction Equation 14 can be seen from an example of a forced convection flow in an empty room (Nielsen 1990). As shown in Figure 5a, air was injected from the inlet at the left-upper corner and was exhausted from the outlet at the right-lower corner. The CFD results (Zuo and Chen 2009b) obtained with the RNG k-[epsilon] model (Yakhot and Orszag 1986) agreed with the experimental data (Nielsen 1990). Thus, they were used as reference here since no streamlines are available in experimental data. Figures 5c and 5d are streamlines predicted by FFD without and with the mass correction. Obviously, the streamlines by FFD with mass correction (Figure 5d) are much better than those without (Figure 5c). It is worth mentioning that FFD without mass correction computed similar streamlines as the CFD with k-[omega] SST model did (Rong and Nielsen 2008). This similarity is probably coincidental, because the FFD and CFD with k-[omega] SST models are quite different.

[FIGURE 5 OMITTED]

Besides the accuracy, one can accelerate the FFD program by optimizing the numerical scheme. For instance, solving the advection equation first eliminates an extra projection function, and can reduce the computing time by 23% (Zuo 2010). In addition, computing time can be further reduced by optimizing implementation of the FFD program, such as storing the coefficient matrices during the computing of matrix equations instead of calculating them every time. This effort can save another 30% on computing time (Zuo 2010). As a whole, optimization of the numerical scheme and model implementation can save around 50% of the computing time.

GRAPHICS PROCESSING UNIT

The second approach is to run the FFD in parallel on graphics processing units (GPUs). CPUs were originally designed for computer graphics, with a highly parallelized structure for image processing. A GPU can have a few hundred processors, so it is powerful. Until recently, running flow simulations on GPUs was difficult because it required special programming skills. With new general-purpose GPU programming languages, such as Stream by AMD (2010), CUDA by NVIDIA (2007), Ct by Intel (2010), and Brook by Stanford (Stanford University Graphics Lab 2010), it is possible to expand GPU applications from visualization to general-purpose computing, including linear algebra (Bell and Garland 2008; Ries et al. 2009), signal processing (Tenllado et al. 2008), molecular dynamics (Yang et al. 2007; Anderson et al. 2008), and indoor airflow simulations (Zuo and Chen 2009b, 2010b).

General parallel programming on multiple CPUs can be applied for GPU programming, although some details may be different due to the specific structure of GPU hardware. Our investigation used the CUDA language (NVIDIA 2007) on a NVIDIA GeForce 8800 GPU. CUDA treats CPU as host, and the GPU as device. The host controls the entire program, initializes data, and writes out results. The device conducts parallel computing with initialized data from the host. After the computation, results are sent back to the host. CUDA further divides the device into three levels: grid, block, and thread. A GPU consists of grids, each of which includes multiple blocks, which are made up of many threads. A thread is the basic computing unit; a GeForce 8800 GPU can have as many as 12,288 threads running at the same time. For simplicity, our implementation defined only one grid with multiple blocks, each of which has 256 threads. To associate the threads and mesh data, we assigned only one grid to one thread. Thus, if the number of grids is a multiple of 256, the GPU needs the same quantity of threads. This is a balanced allocation. Otherwise, the allocation is not balanced. For example, to carry 257 grids, the program needs 2 GPU blocks with 512 threads in total, although 255 threads are not associated with any grids. Unfortunately, unbalanced allocation has serious consequence on performance.

To evaluate GPU performance for FFD simulations, FFD simulations with same numerical settings were performed on both CPU and GPU. Figure 6 compares the results for a natural convection flow in a tall cavity. The lines are computed velocity and temperature profiles at various heights across the cavity and the symbols are experimental data (Betts and Bokhari 2000). Although discrepancies existed between the simulated results and the experimental data due to the defect of the FFD model, the FFD simulations on CPU and GPU yielded the same results. Thus, as computing hardware, the GPU is as trustworthy as the CPU.

[FIGURE 6 OMITTED]

Computing speed for the FFD on the GPU was much faster than on the CPU. When the allocation of grids and threads is balanced, the speed up can be 30 times. Even in an unbalanced situation, the GPU code is still 10 times faster than the CPU version. Note that simulations on the CPU and GPU were done in single precision, because the GPU in our study did not support double precision.

As a whole, FFD on a GPU can be 500 to 1500 times faster than the CFD on CPU. In other words, if a CFD simulation on CPU needs 24 hours, the FFD on GPU can provide the same mount of information within one minute. With this speed, FFD on GPU can do a real-time simulation at with half a million grids and [DELTA]t = 0.1 s, which can be sufficient for conceptual design of a small building.

Our current GPU program was intended to evaluate the possibility of flow simulation on GPU, so the code was not optimized yet. As shown in Table 3, even at the best performance, our GPU code used only 3% of the computing power offered by the GPU. Thus, there is a great potential to accelerate the simulation speed through optimization. For instance, instead of writing code to solve matrix equations, more efficient solvers that have recently been added to the CUDA library can be used.

Table 3. Comparison of FFD Code Performance with Peak Performance of GPU FFD Program on CPU Peak Performance of CeForce Best Performance Worst Performance 8800 GPU 367 GFLOPS ~10 GFLOPS ~4 GFLOPS

Running FFD on better CPUs is another way to reduce computing time. The GPU used in our study was purchased in 2007, which is not the fastest nowadays. For instance, a NVIDIA Tesla C2050 GPU (NVIDIA 2010) is four times faster than ours. In addition, performance can be even higher by using multi-GPUs systems. For instance, a Tesla C2050 GPU system with four C2050 GPUs can be around 2 teraflops for double precision and 4 teraflops for single precision. Using only 5% of this computing capacity, FFD on a Tesla GPU system can be about 558 times faster than the FFD on a CPU, and 27,900 times faster than the CFD on a CPU. With a time step size of 0.1 s, this speed is sufficient for a real-time flow simulation with [10.sup.7] grids, which is enough for a moderately sized building.

CONCLUSION

This paper discussed a new technique for informative and fast simulations of air distributions in buildings. This investigation used an FFD model to provide the same detailed information of air distribution as a CFD model. Although the accuracy of the FFD model was not as good as the CFD's, FFD was 50 times faster than CFD.

The accuracy of the FFD model has been improved by reducing numerical diffusion with a hybrid interpolation method in semi-Lagrangian solver and enforcing mass conservation with a correction function. The computing speed of the FFD model can be further accelerated by modifying the time-splitting method and by optimizing the FFD program.

The other approach to accelerate FFD simulation is running it on GPUs. The FFD model on a GPU produced the same results as that on a CPU, but the speed is 10 to 30 times faster. As a whole, FFD on a GPU is 500 to 1500 times faster than CFD on a CPU. This speed is sufficient for a real-time flow simulation for a small building with half million grids and [DELTA]t = 0.1 s. In addition, the speed can be further accelerated by optimizing the implementation and using better GPUs or GPU clusters, so it is possible to do real-time flow simulation for a moderately sized building with [10.sup.7] grids and [DELTA] t = 0.1 s.

ACKNOWLEDGMENT

This study was funded by the U.S. Federal Aviation Administration (FAA) Office of Aerospace Medicine through the National Air Transportation Center of Excellence for Research in the Intermodal Transport Environment, under Cooperative Agreement 07-CRITE-PU. Although the FAA sponsored this project, it neither endorses nor rejects the findings of this research. The presentation of this information is in the interest of invoking technical community comment on the results and conclusions of the research.

Dr. Wangda Zuo would also like to thank ASHRAE for providing him a grant-in-aid during the study.

NOMENCLATURE

C = concentration of species

[k.sub.C] = diffusivity of species

M = mass flow rate

P = static pressure of flow field

[S.sub.C] = source of species

[S.sub.[F,i]] = ith component of the source in momentum equation

[S.sub.T] = heat source

T = temperature

U = horizontal velocity

[U.sub.i] = ith component of the velocity vector

x = coordinate at horizontal direction

Greek Symbols

[alpha] = thermal diffusivity: model coefficient

[DELTA]t = time step size

[DELTA]x = mesh size

[empty set] = field variable

v = kinematic viscosity of fluid

[rho] = density of fluid

Subscripts

in = inlet boundary

out = outlet boundary

perp = perpendicular

REFERENCES

Advanced Micro Devices. 2010. ATI stream software development kit (SDK) v2.01. [cited; Available from: http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx.

Anderson, J.A, C.D. Lorenz, and A. Travesset. 2008. General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics 227(10):5342-59.

Axley, J. 2007. Multizone airflow modeling in buildings: History and theory. HVAC&R Research 13(6): 907-28.

Beghein, C., Y. Jiang, and Q. Chen. 2005. Using large eddy simulation to study particle motions in a room. Indoor Air 15(4):281-90.

Bell, N. and M. Garland. 2008. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004.

Betts, P.L. and I.H. Bokhari. 2000. Experiments on turbulent natural convection in an enclosed tall cavity. International Journal of Heat and Fluid Flow 21(6):675-83.

Blay, D., S. Mergui, and C. Niculae. 1992. Confined turbulent mixed convection in the presence of horizontal buoyant wall jet. Fundamentals of Mixed Convection 213:65-72.

Chen, Q. 1995. Comparison of different k-[epsilon] models for indoor air-flow computations. Numerical Heat Transfer Part B: Fundamentals 28(3):353-69.

Chen, Q. 2009. Ventilation performance prediction for buildings: A method overview and recent applications. Building and Environment 44(4):848-58.

Chen, Q., Z. Zhang, and W. Zuo. 2007. Computational fluid dynamics for indoor environment modeling: Past, present, and future. Presented at The 6th International Indoor Air Quality, Ventilation and Energy Conservation in Buildings Conference (IAQVEC 2007). Sendai, Japan.

Chorin, A.J. 1967. A numerical method for solving incompressible viscous flow problems. Journal of Computational Physics 2(1): 12-26.

Courant, R., E. Isaacson, and M. Rees. 1952. On the solution of nonlinear hyperbolic differential equations by finite differences. Communication on Pure and Applied Mathematics 5:243-55.

Crouse, B., M. Krafezyk, S. Kuhner, E. Ranka, and C. van Treeck. 2002. Indoor air flow analysis based on lattice Boltzmann methods. Energy and Buildings 34(9):941-49.

Erturk, E., T.C. Corke, and C. Gokcol. 2005. Numerical solutions of 2-D steady incompressible driven cavity flow at high Reynolds numbers. International Journal for Numerical Methods in Fluids 48(7): 747-74.

Fedkiw, R., J. Stain, and H.W. Jensen. 2001. Visual simulation of smoke. Presented at SIGGRAPH 2001, Los Angeles.

Feng. W. and C. Hsu. 2004. The origin and evolution of green destiny. Presented at IEEE Cool Chips VII: An International Symposium on Low-Power and High-Speed Chips. Yokohama, Japan.

Ferziger, J.H. and M. Peric. 2002. Computational methods tor fluid dynamics, 3rd ed. New York: Springer.

Ghia, U., K.N. Ghia, and C.T. Shin. 1982. High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method. Journal of Computational Physics 48(3):387-411.

Integrated Electronics Corporation. 2010. Ct technology, [cited; Available from: http://software.mtel. com/en-us/data-parallel/.

John, R. 1982. Time-split methods for partial differential equations. Ph.D. dissertation, Department of Computer Science, Stanford University.

Kim, J., P. Moin, and R. Moser. 1987. Turbulence statistics in fully-developed channel flow at low Reynolds-number. Journal of Fluid Mechanics 177:133-66.

Ladeinde, F. and M.D. Nearon. 1997. CFD applications in the HVAC&R industry. ASHRAE Journal 39( 1 ):44-48.

Levi, L. and F. Peyroutet. 2001. A time-fractional step method for conservation law related obstacle problems. Advances in Applied Mathematics 27(4):768-89.

Mazumdar. S. and Q. Chen. 2008. Influence of cabin conditions on placement and response of contaminant detection sensors in a commercial aircraft. Journal of Environmental Monitoring 10 (1):71-81.

Megri, A.C. and F. Haghighat. 2007. Zonal modeling for simulating indoor environment of buildings: Review, recent developments, and applications. HVAC&R Research 13(6):887-905.

Nielsen. P.V. 1990. Specification of a two-dimensional test case. Aalborg University. Denmark.

Nielsen. P.V. 2004. Computational fluid dynamics and room air movement. Indoor Air 14:134-43.

NVIDIA, 2007. NVIDIA CUDA compute unified device architecture--Programming guide (version 1.1). NVIDIA Corporation: Santa Clara, CA.

NVIDIA. 2010. Tesla C2050/C2070 GPU computing processor, [cited; Available from: http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html.

Patankar, S.V. 1980. Numerical heat transfer and fluid flow. Hemisphere: New York.

Ries, F., T.D. Marco, M. Zivieri. and R. Guerrieri. 2009. Triangular matrix inversion on graphics processing unit. Presented at Conference on High Performance Networking and Computing, Portland, Oregon.

Rong, L. and P.V. Nielsen. 2008. Simulation with different turbulence models in an annex 20 room benchmark test using Ansys CFX 11.0. DCE Technical Report No. 46. Department of Civil Engineering. Aalborg University, Denmark.

Shankar, P.N. and M.D. Deshpande. 2000. Fluid mechanics in the driven cavity. Annual Review of Fluid Mechanics 32:93-136.

Song, O.-Y., H. Shin, and H.-S. Ko. 2005. Stable but nondissipative water. ACM Transactions on Graphics 24(l):81-97.

Stam, J. 1999. Stable fluids. Presented at 26th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'99). Los Angeles.

Stanford University Graphics Lab. 2010. BrookGPU. [cited; Available from: http://graphics.stanford.edu/projects/brookgpu/.

Tenllado, C, J. Setoain, M. Prieto, L. Pinuel. and F. Tirado. 2008. Parallel implementation of the 2D discrete wavelet transform on graphics processing units: Filter bank versus lifting. IEEE Transactions on Parallel and Distributed Systems 19(3):299-310.

Wang, L 2007. Coupling of multizone and CFD programs for building airflow and contaminant transport simulations. Ph.D. dissertation, Department of Mechanical Engineering, Purdue University, West Lafayette, IN.

Yakhot, V. and S.A. Orszag. 1986. Renormalization-group analysis of turbulence. Physical Review Letters 57(14): 1722-24.

Yang, J.K., Y.J. Wang, and Y.F. Chen. 2007. GPU accelerated molecular dynamics simulation of thermal conductivities. Journal of Computational Physics 221(2):799-804.

Zhai, Z. and Q. Chen. 2003. Solution characters of iterative coupling between energy simulation and CFD programs. Energy and Buildings 35(5):493-505.

Zhang, Z., W. Zhang, Z. Zhai, and Q. Chen. 2007. Evaluation of various turbulence models in predicting airflow and turbulence in enclosed environments by CFD: Part 2: Comparison with experimental data from literature. HVAC&R Research 13(6):871-86.

Zuo, W. 2010. Advanced simulations of air distributions in buildings. Ph.D. dissertation, Department of Mechanical Engineering, Purdue University. West Lafayette, IN.

Zuo, W. and Q. Chen. 2009a. Real-time or faster-than-real-time simulation of airflow in buildings. Indoor Air 19(1):33-44.

Zuo, W. and Q. Chen. 2009b. Fast parallelized flow simulations on graphic processing units. Presented at the 11th International Conference on Air Distribution in Rooms (RoomVent 2009), Busan, Korea.

Zuo, W. and Q. Chen. 2010a. Fast and informative flow simulations in a building by using fast fluid dynamics model on graphics processing unit. Building and Environment 45(3):747-57.

Zuo, W. and Q. Chen. 2010b. Improvements on the fast fluid dynamics model for indoor airflow simulation. Presented at the 4th National Conference of IBPSA-USA (SimBuild 2010), New York, NY.

Received April 12, 2010; accepted July 20, 2010

Wangda Zuo and Qingyan Chen work at the National Air Transportation Center of Excellence for Research in the Intermodal Transport Environment (RITE), School of Mechanical Engineering, Purdue University, West Lafayette. IN.

Wangda Zuo, PhD

Associate Member ASHRAE

Qingyan Chen, PhD

Fellow ASHRAE

Printer friendly Cite/link Email Feedback | |

Author: | Zuo, Wangda; Chen, Qingyan |
---|---|

Publication: | HVAC & R Research |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Nov 1, 2010 |

Words: | 5313 |

Previous Article: | Simplified model of contaminant dispersion in rooms conditioned by chilled-ceiling displacement ventilation system. |

Next Article: | CFD analysis of personal ventilation with volumetric chemical reactions. |

Topics: |