Printer Friendly

Solution of linear equations and performance analysis on desktop computing systems.

Introduction

A number of real life engineering applications can be modeled as a system of linear equations. For small number of equations (n less than 4), linear equations can be solved readily by simple techniques. However, for four or more equations, solutions become arduous and computers must be utilized. Historically, the inability to solve all but the smallest set of equations by hand has limited the scope of problems addressed in many engineering applications [1]. Linear equations can be solved using a number of methods; the most popular among them are the Gauss-elimination method and the Gauss-Jordan method. In this work, these methods have been chosen because of their extensive use in finite element applications and in other engineering applications.

Background

The system of linear equations whose matrix size can be assumed as n. These equations take the form as below

AX = B

Where A is a dense matrix and let A[i][j] = aij be its elements. The solution of the system X is a vector with dimensions n and xi as components. The free term B is a vector with dimension n and bi as the components. This system could be solved using one of the following methods :Gauss--elimination and Gauss--Jordan method .The system of equations has been tried on different platforms and a complete analysis of the results has also been done.

Gaussian Elimination Method

The Gaussian elimination method comprises of two steps, namely the forward elimination phase and the backward substitution phase. Through a series of algebraic manipulations, the original system of equations can be reduced to an upper triangular matrix format of the form U * X = Y where U is the unit upper triangular matrix. In the second phase, the upper - triangular system is solved for the variables in the reverse order and that is called backward substitution phase.

Gauss--Jordan Method

The strategy involved in solving the system of linear equations is quite similar to that of the Gaussian elimination method explained previously. But it has few modifications over the Gauss elimination method. When an unknown is eliminated in this method, it is eliminated from all equations rather than the just subsequent ones. In addition, all rows are normalized by dividing them with the pivot elements. Thus, the elimination procedure results in an identity matrix rather than a triangular matrix. As a consequence, the backward substitution method need not be applied here due to the property of the matrix being identical. The detailed explanation of the above two algorithms are explained in [1].

Measurements Methodology

The efficient software design of the solution for system of linear equations using the Gaussian elimination and Gauss - Jordan methods is largely dependant on exploiting the features of the underlying target machine architecture. The Gaussian Elimination and Gauss--Jordan methods are implemented using assembly language routines and tested on a variety of platforms. A detailed analysis has also been carried out with respect to the number of instructions required to complete the operations, types of instructions required, time complexity analysis and the speed up achieved. This analysis is yet another dimension with respect to the complexity analysis of these methods for solving the system of linear equations. Implementation of these algorithms to solve large matrices is available in the literature. Two of them are given below.

Solving the linear equations on a shared memory system

The implementation of the Gauss Elimination algorithm on several parallel machines using shared memory design and message passing programming model is reported in [2]. It is reported that as the number of processors in a shared memory design is scaled up, hot spots become a significant problem which are to be carefully sought and avoided. It is also reported that the actual floating point performance obtained using 30 processors of Sequent symmetry (11MFLOPS) and using 48 processors on the BBN TC2000 (about 50MFLOPS) are not too impressive when compared to the approximately 300MFLOP performance which a single processor of the Cray YMP can achieve on a linear system solve. The performance measures of these methods are not quite convincing and impressive in a parallel processing environment. It is clear that the effective use of local memory, whether in the form of cache or explicit local memories will be a critical issue in exploiting a scalable machines in the future.

Solving the linear equations on a network of workstations

The system of linear equations is solved using the Gaussian elimination and Gauss-Jordan methods on a network of workstations [3], the set up which has the highest possibility of implementing the pipeline concepts. But it can be inferred that the speed up of the operations is directly related to the number of processors in a workstation. If the number of workstations is 4, then the speed up is increased four times, and the like. Performance evaluation of large scale applications on cluster and grid platforms are given in [4] and [5].

Performance analysis

To solve a system with large number of equations we have to necessarily go for parallel machines or a network of clusters. The above two inferences clearly show that there will not be any boosting factor for the performance of the system if the linear equations are solved either in parallel machines or on a network of workstations.

Our objective is to solve the system of linear equations using computers equipped with either single core or multi-core processor and compare the speedups achieved, as most of the academic and research community tends to use a dedicated computer to solve their problems. The programs for solving the equations using the Gaussian elimination and Gauss--Jordan methods have been developed using the assembly language programming and the results have been tested on different types of processors. The time complexity analysis is also done.

Analysis of results obtained using Gaussian Elimination & Gauss--Jordan methods.

The solution for linear equations using the Gaussian Elimination and Gauss Jordan methods was experimented with programs written in C, embedded with 80x87 instructions (since the coefficients are floating point numbers). For the purpose of analysis, a 5x5 matrix was taken and the program was executed one million times to calculate the time required to execute the program. The details of the results are given in the following tables. The details of the number of instructions required in each group are given in Table 1.

The programs were developed using the C programming language with 80x87 floating point instructions embedded. From the analysis of the programs and table 1 the following inferences were made;

* The number of arithmetic operations is more in Gauss Jordan method (150) than in Gauss Elimination method (136) since Gauss Jordan method uses backward elimination step.

* But the total number of instructions is more for Gauss elimination method. Gauss Jordan uses 438 assembly language instructions while the Gauss elimination method uses 514 assembly language instructions. In Gauss Jordan the first row is normalized in the beginning itself, whereas in Gaussian elimination the first row is kept as it is. Hence it needs more operations.

* The solution for tri-diagonal matrix takes approximately 1/3rd the number of instructions required for the regular matrix.

* Also it is found that, irrespective of the algorithm, the number of arithmetic instructions used occupy only 1/3rd of the total number of instructions while the data transfer instructions used to move the data and the intermediate values of the variables between the memory and the processor take 2/3rd of the total number of instructions.

The time taken to execute these programs is also analyzed by running them on different machines. The time taken to execute the program once is given in table 2. It also give the relative speedup compared to the performance of a desktop system with 1.5GHz. The speedup is given in bracket.

With respect to the timing analysis, the following inferences were made;

* In general, the speedup is much higher in Core 2 Duo and multi core / multi processor Server machines.

* A laptop with INTEL core 2 duo, 2.6GHz, processor executes the programs approximately 8 to 10 times (speedup of 8 to 10) faster than the INTEL Pentium IV 1.5 GHz processor based system.

* The speedup was 5 to 10 in a server equipped with Intel Xeon quad core processor. The speedup may be higher if other services are stopped.

* The speedup in a server, based on AMD Optron D250 dual processor, is approximately 25 for Gauss Jordon and 107 for Gaussian elimination. The server uses two AMD opteron processors which are based on AMD64 architecture with quad core. The performance is the highest among all the machines in which the test was conducted.

Conclusion

From the above inferences, it is vividly clear that the system of linear equations can be solved efficiently using desktop machines equipped with core2duo or multi core processors. This is to be compared with the results of the network of clusters and the cluster of workstations where the speedup is directly proportional to the number of nodes. i.e for a n node cluster the speedup is less than or equal to n only. Whereas the speedup in core2duo processors is 8 to 10 times, and the speedup in systems with AMD optronD250 dual processor (64 bit processor) is 25 to 107. From the above experiments it is clear that for computationally intensive applications, systems with multi core processors are more efficient than network of clusters or cluster of workstations with single core processors.

References

[1] Numerical Methods for Engineers, Steven C. Chapra, Raymond P. Canale, TMH, 2002.

[2] Karen H. Warren, Eugene D. Brooks III, Gauss Elimination : A case study on Parallel Machines, IEEE International Conference, Compcon Spring '91, March 1991.

[3] Gabriel Dimitriu, Felicia Ionescu, Solving system of linear equations in a Network of Workstations, Proceedings of the Fifth International Symposium on Parallel and Distributed Computing (ISPDC'06), 2006

[4] Lamine M. Aouad, Serge G. Petiton, Mitsuhisa Sato, Grid and Cluster Matrix Computation with Persistent Storage and Out-of-core Programming, IEEE Conference Proceedings on Cluster Computing, Sept.2005

[5] Lamine M. Aouad, Serge G. Petiton, Parallel Basic Matrix Algebra on the Grid'5000 Large Scale Distributed Platform, IEEE Conference Proceedings on Cluster Computing, Sept.2005

K. Sridhar (*), B. Abirami (*), A. Rajaraman (**) and T.R. Sivaramakrishnan (***)

(*) School of Computing/SASTRA E-mail: ks@cse.sastra.edu, abiramisampath@gmail.com

(**) Visiting Professor, IIT, Madras, Arar

(***) School of EEE/SASTRA
Table 1 : Details of the floating point instructions used in the
programs

Matrix         Total                    Arithmetic
               No. of                   Instructions
               Instructions.
                              Fdiv      Fmul      Fsub

Gauss Jordan   168            14        22        22
5 x 5 Tri
diagonal

Gauss Jordan   438            20        70        69
5 x 5

Gaussian       406            23        60        57
Elimination
5 x 5

Matrix         Total          Data Transfer
               No. of         Instructions
               Instructions.
                              Fld       Fst
                              (load)    (store)

Gauss Jordan   168            60        50
5 x 5 Tri
diagonal

Gauss Jordan   438            160       119
5 x 5

Gaussian       406            157       109
Elimination
5 x 5

Matrix         Total          Number of
               No. of         Instructions in
               Instructions.  Percentage (%)

                              ArithmeticData
                                        Trans

Gauss Jordan   168            34.52     65.48
5 x 5 Tri
diagonal

Gauss Jordan   438            36.30     63.70
5 x 5

Gaussian       406            34.48     65.52
Elimination
5 x 5

Table 2 : Execution times and speedup of the programs on various
machines

                                   Time taken to execute the program
                                   once
                                   (time in Micto seconds)

CPU                                Gauss
                                   Elimination

Desktop system with P IV 1.5 GHz   117.25
processor (Taken as reference)

Desktop system with P IV 1.6 GHz   94.59      (1.24)
processor

Desktop system with P IV 3 GHz     51.66      (2.27)
processor

LapTop with Core 2 DUO, 1.6GHz,    19.80      (5.92)
(On Battery Power)

LapTop with Core 2 DUO, 1.6 GHz    21.38      (5.49)
(On Mains)

LapTop Core 2 DUO, 2.6GHz,         41.31      (2.84)
(On Battery Power)

LapTop Core 2 DUO, 2GB RAM         12.73      (9.21)
(On Mains)

Server: Intel Xeon Quad core,     16.21      (7.23)
2.66GHz

Server: AMD Optron D250 Dual       1.09       (107.28)
Processor, 2.39 GHz

CPU                                Gauss
                                   Jordan Tri
                                   Diagonal

Desktop system with P IV 1.5 GHz   4.09
processor (Taken as reference)

Desktop system with P IV 1.6 GHz   2.72       (1.5)
processor

Desktop system with P IV 3 GHz     1.86       (2.2)
processor

LapTop with Core 2 DUO, 1.6GHz,    1.56       (2.62)
(On Battery Power)

LapTop with Core 2 DUO, 1.6 GHz    1.17       (3.49)
(On Mains)

LapTop Core 2 DUO, 2.6GHz,         2.09       (1.96)
(On Battery Power)

LapTop Core 2 DUO, 2GB RAM         0.72       (5.70)
(On Mains)

Server: Intel Xeon Quad core,     0.83       (4.94)
2.66GHz

Server: AMD Optron D250 Dual       1.05       (3.91)
Processor, 2.39 GHz

CPU                                Gauss
                                   Jordan

Desktop system with P IV 1.5 GHz   43.484
processor (Taken as reference)

Desktop system with P IV 1.6 GHz   35.41      (1.23)
processor

Desktop system with P IV 3 GHz     18.74      (2.32)
processor

LapTop with Core 2 DUO, 1.6GHz,    11.31      (3.84)
(On Battery Power)

LapTop with Core 2 DUO, 1.6 GHz    11.06      (3.93)
(On Mains)

LapTop Core 2 DUO, 2.6GHz,         17.11      (2.54)
(On Battery Power)

LapTop Core 2 DUO, 2GB RAM         5.34       (8.14)
(On Mains)

Server: Intel Xeon Quad core,     6.81       (6.38)
2.66GHz

Server: AMD Optron D250 Dual       1.72       (25.31)
Processor, 2.39 GHz
COPYRIGHT 2008 Research India Publications
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Sridhar, K.; Abirami, B.; Rajaraman, A.; Sivaramakrishnan, T.R.
Publication:International Journal of Applied Engineering Research
Article Type:Report
Geographic Code:1USA
Date:Jun 1, 2008
Words:2234
Previous Article:Approximate bit error probability analysis for DS-CDMA systems.
Next Article:Mechanical properties of high-strength concrete with hybrid fibre reinforcement.
Topics:

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters