Solution of linear equations and performance analysis on desktop computing systems.
A number of real life engineering applications can be modeled as a system of linear equations. For small number of equations (n less than 4), linear equations can be solved readily by simple techniques. However, for four or more equations, solutions become arduous and computers must be utilized. Historically, the inability to solve all but the smallest set of equations by hand has limited the scope of problems addressed in many engineering applications . Linear equations can be solved using a number of methods; the most popular among them are the Gauss-elimination method and the Gauss-Jordan method. In this work, these methods have been chosen because of their extensive use in finite element applications and in other engineering applications.
The system of linear equations whose matrix size can be assumed as n. These equations take the form as below
AX = B
Where A is a dense matrix and let A[i][j] = aij be its elements. The solution of the system X is a vector with dimensions n and xi as components. The free term B is a vector with dimension n and bi as the components. This system could be solved using one of the following methods :Gauss--elimination and Gauss--Jordan method .The system of equations has been tried on different platforms and a complete analysis of the results has also been done.
Gaussian Elimination Method
The Gaussian elimination method comprises of two steps, namely the forward elimination phase and the backward substitution phase. Through a series of algebraic manipulations, the original system of equations can be reduced to an upper triangular matrix format of the form U * X = Y where U is the unit upper triangular matrix. In the second phase, the upper - triangular system is solved for the variables in the reverse order and that is called backward substitution phase.
The strategy involved in solving the system of linear equations is quite similar to that of the Gaussian elimination method explained previously. But it has few modifications over the Gauss elimination method. When an unknown is eliminated in this method, it is eliminated from all equations rather than the just subsequent ones. In addition, all rows are normalized by dividing them with the pivot elements. Thus, the elimination procedure results in an identity matrix rather than a triangular matrix. As a consequence, the backward substitution method need not be applied here due to the property of the matrix being identical. The detailed explanation of the above two algorithms are explained in .
The efficient software design of the solution for system of linear equations using the Gaussian elimination and Gauss - Jordan methods is largely dependant on exploiting the features of the underlying target machine architecture. The Gaussian Elimination and Gauss--Jordan methods are implemented using assembly language routines and tested on a variety of platforms. A detailed analysis has also been carried out with respect to the number of instructions required to complete the operations, types of instructions required, time complexity analysis and the speed up achieved. This analysis is yet another dimension with respect to the complexity analysis of these methods for solving the system of linear equations. Implementation of these algorithms to solve large matrices is available in the literature. Two of them are given below.
Solving the linear equations on a shared memory system
The implementation of the Gauss Elimination algorithm on several parallel machines using shared memory design and message passing programming model is reported in . It is reported that as the number of processors in a shared memory design is scaled up, hot spots become a significant problem which are to be carefully sought and avoided. It is also reported that the actual floating point performance obtained using 30 processors of Sequent symmetry (11MFLOPS) and using 48 processors on the BBN TC2000 (about 50MFLOPS) are not too impressive when compared to the approximately 300MFLOP performance which a single processor of the Cray YMP can achieve on a linear system solve. The performance measures of these methods are not quite convincing and impressive in a parallel processing environment. It is clear that the effective use of local memory, whether in the form of cache or explicit local memories will be a critical issue in exploiting a scalable machines in the future.
Solving the linear equations on a network of workstations
The system of linear equations is solved using the Gaussian elimination and Gauss-Jordan methods on a network of workstations , the set up which has the highest possibility of implementing the pipeline concepts. But it can be inferred that the speed up of the operations is directly related to the number of processors in a workstation. If the number of workstations is 4, then the speed up is increased four times, and the like. Performance evaluation of large scale applications on cluster and grid platforms are given in  and .
To solve a system with large number of equations we have to necessarily go for parallel machines or a network of clusters. The above two inferences clearly show that there will not be any boosting factor for the performance of the system if the linear equations are solved either in parallel machines or on a network of workstations.
Our objective is to solve the system of linear equations using computers equipped with either single core or multi-core processor and compare the speedups achieved, as most of the academic and research community tends to use a dedicated computer to solve their problems. The programs for solving the equations using the Gaussian elimination and Gauss--Jordan methods have been developed using the assembly language programming and the results have been tested on different types of processors. The time complexity analysis is also done.
Analysis of results obtained using Gaussian Elimination & Gauss--Jordan methods.
The solution for linear equations using the Gaussian Elimination and Gauss Jordan methods was experimented with programs written in C, embedded with 80x87 instructions (since the coefficients are floating point numbers). For the purpose of analysis, a 5x5 matrix was taken and the program was executed one million times to calculate the time required to execute the program. The details of the results are given in the following tables. The details of the number of instructions required in each group are given in Table 1.
The programs were developed using the C programming language with 80x87 floating point instructions embedded. From the analysis of the programs and table 1 the following inferences were made;
* The number of arithmetic operations is more in Gauss Jordan method (150) than in Gauss Elimination method (136) since Gauss Jordan method uses backward elimination step.
* But the total number of instructions is more for Gauss elimination method. Gauss Jordan uses 438 assembly language instructions while the Gauss elimination method uses 514 assembly language instructions. In Gauss Jordan the first row is normalized in the beginning itself, whereas in Gaussian elimination the first row is kept as it is. Hence it needs more operations.
* The solution for tri-diagonal matrix takes approximately 1/3rd the number of instructions required for the regular matrix.
* Also it is found that, irrespective of the algorithm, the number of arithmetic instructions used occupy only 1/3rd of the total number of instructions while the data transfer instructions used to move the data and the intermediate values of the variables between the memory and the processor take 2/3rd of the total number of instructions.
The time taken to execute these programs is also analyzed by running them on different machines. The time taken to execute the program once is given in table 2. It also give the relative speedup compared to the performance of a desktop system with 1.5GHz. The speedup is given in bracket.
With respect to the timing analysis, the following inferences were made;
* In general, the speedup is much higher in Core 2 Duo and multi core / multi processor Server machines.
* A laptop with INTEL core 2 duo, 2.6GHz, processor executes the programs approximately 8 to 10 times (speedup of 8 to 10) faster than the INTEL Pentium IV 1.5 GHz processor based system.
* The speedup was 5 to 10 in a server equipped with Intel Xeon quad core processor. The speedup may be higher if other services are stopped.
* The speedup in a server, based on AMD Optron D250 dual processor, is approximately 25 for Gauss Jordon and 107 for Gaussian elimination. The server uses two AMD opteron processors which are based on AMD64 architecture with quad core. The performance is the highest among all the machines in which the test was conducted.
From the above inferences, it is vividly clear that the system of linear equations can be solved efficiently using desktop machines equipped with core2duo or multi core processors. This is to be compared with the results of the network of clusters and the cluster of workstations where the speedup is directly proportional to the number of nodes. i.e for a n node cluster the speedup is less than or equal to n only. Whereas the speedup in core2duo processors is 8 to 10 times, and the speedup in systems with AMD optronD250 dual processor (64 bit processor) is 25 to 107. From the above experiments it is clear that for computationally intensive applications, systems with multi core processors are more efficient than network of clusters or cluster of workstations with single core processors.
 Numerical Methods for Engineers, Steven C. Chapra, Raymond P. Canale, TMH, 2002.
 Karen H. Warren, Eugene D. Brooks III, Gauss Elimination : A case study on Parallel Machines, IEEE International Conference, Compcon Spring '91, March 1991.
 Gabriel Dimitriu, Felicia Ionescu, Solving system of linear equations in a Network of Workstations, Proceedings of the Fifth International Symposium on Parallel and Distributed Computing (ISPDC'06), 2006
 Lamine M. Aouad, Serge G. Petiton, Mitsuhisa Sato, Grid and Cluster Matrix Computation with Persistent Storage and Out-of-core Programming, IEEE Conference Proceedings on Cluster Computing, Sept.2005
 Lamine M. Aouad, Serge G. Petiton, Parallel Basic Matrix Algebra on the Grid'5000 Large Scale Distributed Platform, IEEE Conference Proceedings on Cluster Computing, Sept.2005
K. Sridhar (*), B. Abirami (*), A. Rajaraman (**) and T.R. Sivaramakrishnan (***)
(*) School of Computing/SASTRA E-mail: email@example.com, firstname.lastname@example.org
(**) Visiting Professor, IIT, Madras, Arar
(***) School of EEE/SASTRA
Table 1 : Details of the floating point instructions used in the programs Matrix Total Arithmetic No. of Instructions Instructions. Fdiv Fmul Fsub Gauss Jordan 168 14 22 22 5 x 5 Tri diagonal Gauss Jordan 438 20 70 69 5 x 5 Gaussian 406 23 60 57 Elimination 5 x 5 Matrix Total Data Transfer No. of Instructions Instructions. Fld Fst (load) (store) Gauss Jordan 168 60 50 5 x 5 Tri diagonal Gauss Jordan 438 160 119 5 x 5 Gaussian 406 157 109 Elimination 5 x 5 Matrix Total Number of No. of Instructions in Instructions. Percentage (%) ArithmeticData Trans Gauss Jordan 168 34.52 65.48 5 x 5 Tri diagonal Gauss Jordan 438 36.30 63.70 5 x 5 Gaussian 406 34.48 65.52 Elimination 5 x 5 Table 2 : Execution times and speedup of the programs on various machines Time taken to execute the program once (time in Micto seconds) CPU Gauss Elimination Desktop system with P IV 1.5 GHz 117.25 processor (Taken as reference) Desktop system with P IV 1.6 GHz 94.59 (1.24) processor Desktop system with P IV 3 GHz 51.66 (2.27) processor LapTop with Core 2 DUO, 1.6GHz, 19.80 (5.92) (On Battery Power) LapTop with Core 2 DUO, 1.6 GHz 21.38 (5.49) (On Mains) LapTop Core 2 DUO, 2.6GHz, 41.31 (2.84) (On Battery Power) LapTop Core 2 DUO, 2GB RAM 12.73 (9.21) (On Mains) Server: Intel Xeon Quad core, 16.21 (7.23) 2.66GHz Server: AMD Optron D250 Dual 1.09 (107.28) Processor, 2.39 GHz CPU Gauss Jordan Tri Diagonal Desktop system with P IV 1.5 GHz 4.09 processor (Taken as reference) Desktop system with P IV 1.6 GHz 2.72 (1.5) processor Desktop system with P IV 3 GHz 1.86 (2.2) processor LapTop with Core 2 DUO, 1.6GHz, 1.56 (2.62) (On Battery Power) LapTop with Core 2 DUO, 1.6 GHz 1.17 (3.49) (On Mains) LapTop Core 2 DUO, 2.6GHz, 2.09 (1.96) (On Battery Power) LapTop Core 2 DUO, 2GB RAM 0.72 (5.70) (On Mains) Server: Intel Xeon Quad core, 0.83 (4.94) 2.66GHz Server: AMD Optron D250 Dual 1.05 (3.91) Processor, 2.39 GHz CPU Gauss Jordan Desktop system with P IV 1.5 GHz 43.484 processor (Taken as reference) Desktop system with P IV 1.6 GHz 35.41 (1.23) processor Desktop system with P IV 3 GHz 18.74 (2.32) processor LapTop with Core 2 DUO, 1.6GHz, 11.31 (3.84) (On Battery Power) LapTop with Core 2 DUO, 1.6 GHz 11.06 (3.93) (On Mains) LapTop Core 2 DUO, 2.6GHz, 17.11 (2.54) (On Battery Power) LapTop Core 2 DUO, 2GB RAM 5.34 (8.14) (On Mains) Server: Intel Xeon Quad core, 6.81 (6.38) 2.66GHz Server: AMD Optron D250 Dual 1.72 (25.31) Processor, 2.39 GHz
|Printer friendly Cite/link Email Feedback|
|Author:||Sridhar, K.; Abirami, B.; Rajaraman, A.; Sivaramakrishnan, T.R.|
|Publication:||International Journal of Applied Engineering Research|
|Date:||Jun 1, 2008|
|Previous Article:||Approximate bit error probability analysis for DS-CDMA systems.|
|Next Article:||Mechanical properties of high-strength concrete with hybrid fibre reinforcement.|