Multipliers based on Urdhva Tiryagbhyam algorithm: a survey.
Multiplication has its vast significance in digital signal, bio-signal and image processing applications. Many researchers have developed a number of multiplication techniques with the advancement of technology to meet the required constraints such as high speed, low power consumption and reduced area. Most of the DSP algorithms perform the multiplication and accumulation processes. Thus the demand arises for the emerging of high speed multipliers. The speed is always an important constraint and it determines the performance of the multipliers. Thus the number of steps in the multiplication process must be reduced (Premannda et al, 2013). The demand for high speed processing has been increasing as a result of expanding computer and signal processing applications. The development of fast multipliers is a subject of interest over a decade of years. In many DSP applications development by reducing the time delay and power consumption is also very essential (Booth, 1951). For efficient multiplier accuracy, speed, power and area are the important characteristics. This led to the development of many multipliers such as array multiplier, Booth Multiplier, Wallace Multiplier, Baugh-Wooley Multiplier and Vedic Multiplier. Generally the multiplier architecture can be categorized into three types (Kunchigi et al, 2012). First is Serial multiplier-which emphasizes on the minimum amount of area. Second is Parallel multiplier which carries out high speed mathematical operations. But it consumes the larger chip area. Third is Serial -Parallel Multiplier which serves as a tradeoff between the serial and parallel multiplier. The multiplication process has three main steps to follow (Soniya and Suresh Kumar, 2013)
1. Partial Product generation
2. Partial Product reduction
3. Final addition
This paper deals with the performance comparison of many multipliers and proves the method based on Urdhva Tiryagbhyam sutra (Vedic Mathematics) is the best to offer better results in terms of speed, area, power and delay. It is found to be more convenient when compared to the conventional type of multipliers.
In array multiplier the partial products are computed independently. The partial products are generated in parallel and accumulated using adder circuit. A low power and low area array multiplier with CSA (Nirlakalla Ravi, et al, 2011) is proposed. It eliminates the final stage of the multiplier which is present in the conventional parallel array multipliers. The use of 16 transistors full adder shows good efficiency. In the developed 4 x 4 multiplier to add carry bits without using RCA all the carries are given to input of next left column input. Because of this about 56 transistors are reduced. From the design they proved this method offers 13.91% less power, 34.09% more speed and 59.91% less energy consumption for the 0.18nm technology.
For 32nm MOSFET technology array multiplier with 10T full adder is used. In this design 96 transistors are less in number when compared to the array multiplier with 16T full adder cell. It offers 2.82% less of total power, 13.24% of more speed (Kripa Mathew, et al, 2013).
To minimize the delay many multiplication algorithms have been framed out of which Array Multiplier with Compressors (Jasbir, et al. 2012) proved to be efficient. In higher level multiplications many numbers of adders are used to perform the partial product addition. In order to reduce the number of partial products the compressors are used which are capable of adding 5 or 6 0r 7 bits. It reduces the stage operations. These types of adders are very fast compared to the conventional adders. The authors compared the performance of multipliers with Cadence tool. For 4 x 4 multiplier they used 4:2 and 5:2 compressors whereas for 8 x 8 they used 5:2, 6:2 and 7:2 compressors 6:2 and 7:2 proved to give less delay and less power consumption.
Wallace Tree Multiplier:
Wallace Tree multipliers are the extended version of tree multipliers. It uses carry save adders in addition process which is used to reduce the latency. An improved method is developed to reduce the latency further by using the 4:2 and 5:2 compressors and carry select adder (Surekha.N, et al,2013). It provides 44% fast speed and 11% of reduced power. Another method was also implemented to reduce the latency with the use of the Sklansky adder (divide and conquer adder) to perform the final stage addition (Dakupati, et al, 2013). About 3.46% of the decrease in delay and 11.6% of reduced power consumption were experienced. The existing latency rate was found to be 27 and the latency arrived was 15. In RISC processor about 44.4% faster and 11% of reduced power consumption at 200MHz in 320nm technology was developed using Sklansky adders (Vinoth. C, et al, 2011).
The ripple carry adder and the Sklansky adders were designed and compared before the designing of a Wallace multiplier. This proved that Sklansky adders occupied larger area with high speed. But the ripple carry adders offered less speed with small area. Using 5:2 compressors with Sklansky adders the multipliers produces a reduced latency of 6 (Gopi Krishna, et al, 2013). In the compressors the XOR blocks are replaced with MUX blocks. The select lines to the multiplexer are available ahead of the inputs, so the path delay is minimized. At an operating frequency of 50MHz and 3.3V the power was found to be 1.436mW and 4.57% of power reduction. At 400MHz power was found to be 11.402mW and 6.36% of power reduction.
Normally half adders and full adders are used for the addition process in the Wallace multipliers. The half adder does not reduce the number of stages (number of partial products). This considerably increases the complexity of the circuit as more number of half adders are needed. But using full adders reduces the complexity and also the number of partial bits (Waters and Swartzlander, 2010).
In Booth multiplication the multiplication of two numbers is performed with 2's complement method (Booth, 1951). Neeta and Sindal, 2013 proposed two methods of multiplication. First one is the Radix-4 with 4:2 compressors and SQRT CSLA. Second is the Radix-4 with 4:2 compressors and modified SQRT CSLA. The former gives an increased delay reduction and the latter has reduced logic levels. The main drawback of both is there is a slight increase in the area. The Booth algorithm is used for the reduction of partial products and the Wallace structure for the accumulation.
A new kind of architecture based on a Radix-4 Booth multiplication algorithm was developed for high speed applications. (K. Babulu and G. Parasuram, 2011). In this technique the accumulation part is done with the hybrid adders. The hybrid adders consist of two carry look ahead adders and a multiplexer. With these adders the computation is performed twice, initializing the carry as zero first and as one next. The MUX is used to select the exact sum and carry. The propagation delay is 29.198ns (8 x 8 multiplier) and it is less compared to Radix-2 form. But the area increases when hybrid adders are used.
To reduce dynamic power and to increase the speed glitches (1 to 0 transition) and spikes (0 to 1 transition) have to be reduced. SPST (Spurious Power Suppression Technique) adders can be used to obtain the above constraints. Comparing with array MAC, SPST adders gives 5 less delay and 7% less power. The estimated delay and power consumption for this Radix-2 modified Booth algorithm is 39.69ns and 144mW (Addanki, et al, 2012).
The signed, unsigned modified Booth encoding (SUMBE) multiplier (Rajput and Swamy, 2012) is a unique multiplier which can perform the multiplication of both signed and unsigned numbers. By using this method half of the partial products are generated. Then extending the sign bit of the operands the additional partial products generated and accumulated using CSA and CLA. Since both signed and unsigned operations are performed in one multiplier the chip area is reduced and thus minimizing the cost of the system.
Multiplication Using Urdhva Tiryagbhyam:
Vedic Mathematics is the name given to the ancient system of calculation. Jagadguru (1986) constructed 16 sutras (formulae) and 16 Upa sutras (sub formulae) after extensive research in Atharva Veda. According to him, Vedic mathematics is based on the Sutras, or word-formulae. These formulae are intended to describe the easiest way of calculations. The Sanskrit word Urdhva Tiryagbhyam is one of those 16 sutras which means "Vertical and Crosswise". The multiplication is based on the concept that the generation of partial products along with the concurrent addition of these partial products is performed parallel (Harpreet, 2008). The generation of partial products and their summation in parallel using vertical and crosswise multiplication is explained in Fig.1.
To illustrate this multiplication consider multiplication of two decimal numbers (526 * 749). Initially the LSB of both the numbers are multiplied and added with the previous carry. This generates one of the bits of the result and a carry. In the next step the LSB and the bit next to LSB are multiplied in a crosswise manner and the previous carry is added with it. The same process of vertical and crosswise multiplication is repeated to get the final answer. Initially the carry is taken to be zero. Based on this concept of vertical and crosswise multiplication many researchers had developed multipliers for various applications. They compared the performance of this Vedic multiplier with other available conventional multipliers and it is proved that it is more efficient than others. The architecture of 2 x 2 Vedic multiplier is shown in Fig.2.
Vaijyanath et al (2012) had proposed a new multiplier based on Vedic algorithms for increasing the speed by reducing the delay and also reducing the area in the sequential circuits. In conventional method for n bit multiplication n*4 numbers of AND gates were used for implementation. But the authors had proposed a new type of multiplier which uses n AND gates for n bit multiplication.
A new methodology was developed by Nivedita et al (2013) for high speed multiplication to produce a 2n bit product of two integers of n bit size. As this proved an efficient method for performing multiplications to design higher level multipliers from the lower level multipliers whereas in other methods of multiplication a different logic is used for designing the higher bit multipliers. The main operation in the multiplication process is the addition of partial products. In this method they used a ripple carry adder which occupies less area. They had given a comparative study of Wallace and Vedic multiplier in terms of delay in table I. The power consumption of Wallace multiplier is more and also the complexity of the circuit is large when compared to the Vedic multiplier.
Anju (2013) made a comparative study of Vedic multiplier and the Booth multiplier by implementing on FPGA. The Urdhva Multiplier is applied to binary multiplication, which involves the carry save adders. The author had chosen 8 x 8 multiplier and compared the results in terms of area, power consumption and speed which are shown in table II. Finally the author proved multiplication using Urdhva Tiryagbhyam is the best which has less delay when compared to Booth multiplier.
Neeraj and Subodh Wairya (2013) developed the low power 32 x 32 bit multiplier based on the Vedic mathematics. In this paper they proposed architecture for 32 bits inclusive of both multiplier and multiplicand were grouped as 16 bit numbers. It was decomposed into 16 x 16 modules which were very easy to develop the VHDL coding. The architecture was implemented using FPGA device family Virtex 7 low power XC7V285TL, package FFG1925, speed grade-1L. Also the synthesis was performed using Xilinx ISE Design Suite 13.1. The author proved that it is the optimized method for speed and area. The functional verification of the VHDL code was done with ModelSim simulator. The Synthesis result showed that the computation time for calculating the product of 32 x 32 bits is delay 29.256ns.
Ankit and Arvind (2014) devoted a significant type of Vedic Multiplier which is quite different from that of the conventional multiplier. The most important aspect of this proposed architecture is the use of Carry look ahead adder. So it gives room to break up the entire unit into smaller structures. Thus the author was intended to prefer the structural modeling of VHDL coding since it is very easy to design larger units from smaller units and thus the complexity of the design gets reduced. The code was synthesized and simulated using Xilinx ISE 8.1I and downloaded to the Spartan2 FPGA device. In the ripple carry adder carries the output from one adder block is being added to the next adder block to generate the sum and carry output. But the carry look adder generates the carry bits based on the data inputs. It does not depend on the carry bit which is generated by the previous one. The carry look ahead adder is faster than the ripple carry adder as it is eliminating the carry generation gate delays. Thus the author proved that the Vedic multiplier using carry look ahead adder has its significance in terms of execution time when compared to the other conventional multipliers like an array multiplier. The comparison of different multipliers in terms of path delay is given in table III.
Based on ancient Vedic mathematics (Saha, et al 2011a, b) a model was proposed for 32 x 32 bit low power and high speed multiplier. The architecture was implemented in Spice Spectre and compared with mostly used multipliers like Wallace tree multiplier (WTM), Modified Booth Multiplier (MBM), Baugh Wooley Multiplier (BWM) and Row Bypassing and parallel Architecture (RBPA). The Vedic architecture showed 29%, 31%, 35% and 23% improvement in terms of propagation delay and 17%, 26%, 29%, 21% in terms of power when compared with the WTM, MBM, BWM and RBPA respectively. Also they proved there was vast reduction in the propagation delay in 16 x 16 bit numbers. The delay is only 4ns and the power consumed is of 6.5mW.
Kavita and Nilay (2006) devoted a work on the comparison of Pipelined Floating point Multiplier and the Unpipelined Multiplier uses Vedic Mathematics. They designed both types of multipliers and came to a conclusion that the pipelined multipliers proved best results in terms of device utilization, speed and power consumption. Thus the authors finalized that if the number of pipeline stages are increased the speed can also be increased.
Riya and Daruwala (2013) implemented pipelined double precision floating point multiplier. It provided low latency and high throughput. A comparison was done with the simulated results of available approach in addition to the scalability; the proposed design proved that there is significant improvement in the area and speed. The concurrent processing i.e. pipelined architecture shows better response when compared to the sequential processing.
RESULTS AND DISCUSSIONS
The results of the survey are shown below. Different approaches of multiplication for various applications are discussed. The Array multiplier has regular structure and sequence of operation. But it consumes more power and offers less delay compared to Wallace Tree Multiplier. In Wallace type the partial adders can be rearranged and reduce the number of cells. It can be used for n bits as operands. But it has the drawback of irregular structure and thus making layout becomes difficult. The Booth multiplier gives maximum speed and the modified Booth algorithm is helpful to save the area as its computation stage is reduced as half. The Vedic multiplier is found to be best in all constraints that it has less area, consumes less power as it has fewer devices. It operates with greater speed and provides less delay when compared to all other multipliers.
From the survey of various multiplication techniques, the Urdhva Tiryagbhyam (Vedic multiplier) multiplication gives better results in developing the multipliers. The other multipliers such as array multiplier, Wallace tree multiplier and Booth multiplier are focusing on any one or two of the constraints like speed, delay, power, area and efficiency. But the beauty of Vedic multiplier is that it can satisfy all the constraints mentioned above which are considered to be the most important factors in designing a multiplier. It dominates other multipliers as the number of bits can be increased to n size and it is very easy to design high level multipliers from low level multipliers. The Vedic Multiplier is explored with better performance and it is found to be the best technique in all kinds of applications.
Received 3 September 2014
Received in revised form 30 October 2014
Accepted 4 November 2014
Addanki Purna Ramesh, Dr. A.V.N. Tilak and Dr. A.M. Prasad, 2012. Efficient Implementation of 16 bit Multiplication Accumulator using Radix-2 Modified Booth Algorithm and SPST adder using Verilog, International Journal of VLSI Design and Communication Systems, 3(3): 107-118.
Anju, 2013. Performance Comparison of Vedic Multiplier and Booth Multiplier. International Journal of Engineering and Advanced Technology, 2(5).
Ankit Chouhan and Arvind Pratap Singh, 2014. Implementation of an Efficient Multiplier based on Vedic Mathematics using High Speed Adder. International Journal of Innovative Science, Engineering and Technology, 1(6).
Babulu, K. and G. Parasuram, 2011. FPGA Realization of Radix-4 Booth Multiplication Algorithm for High Speed Arithmetic Logics, International Journal of Computer Science and Information technologies, 2(5): 2102-2107.
Booth, A., D1951. Assigned binary multiplication technique, Quarterly Journal of Mechanics and Applied Mathematics, 4(2): 236-240.
Dakupati, Ravi Sankarand Shaik Ashraf Ali, 2013. Design of Wallace Tree Multiplier by Sklansky Adder, International Journal of Engineering Research andApplications, 3(1): 1036-1040.
Ganesh Kumar, G. and V. Charisma, 2012. Design of High Speed Vedic Multiplier using Vedic Mathematics techniques. International Journal of Scientific Research Publications, 2(3).
Gopi Krishna, K., B. Santhosh and V. Sridhar, 2013. Design of Wallace Tree Multiplier using Compressors, International Journal of Engineering Sciences and Research, 2(9): 2249-2254.
Harpreet Singh Dhillon Abhijit Mitra, A Digital Multiplier Architecture using Urdhva Tiryabhyam Sutra of Vedic Mathematics, Indian Institute of Technology, Guwahatti.
Harpreet Singh Dhillon and Abhijit Mitra, 2008. A Reduced-Bit Multiplication Algorithm for Digital Arithmetic, International Journal of Computational and Mathematical Sciences.
Jagadguru Swami Sri Bharati Krishna Tirthji Maharaja, 1986. Vedic Mathematics, Motilal Banarsidas, Varanasi, India.
Jasbir Kaur, Naveen Kr. Gahlan and PRabhat Shukla, 2012. Delay -Power Performance Comparison of Array Multiplier in VLSI Design, International Journal of Advanced Research in Computer Science and Electronics Engineering, 1(3): 41-44.
Kavita Khare, R.P. Singh and Nilay Khare, 2006. Comparison of pipelined IEEE 754 standard floating point multiplier with unpipelined multiplier, Journal of Scientific and Industrial Research, pp: 65.
Korra Tulasi Bai and J.E.N. Abhilash, 2013. A New Novel Low Power Floating Point Multiplier Implementation Using Vedic Multiplication Techniques. International Journal of Engineering Research and Applications, 3(4): 2248-9622.
Kripa Mathew, S. Asha Latha, T. Ravi and E. Logashanmugam, 2013. Design and analysis of an Array Multiplier using an Area Efficient Full Adder Cell in 32nm CMOS technology, The International Journal of Engineering and Science, 2(3): 8-16.
Neeraj Kumar Mishra and Subodh Wairya, 2013. Low Power 32 x 32 bit multiplier Architecture based on Vedic Mathematics using Virtex 7 Low Power Device, International Journal of Research Review in Engineering Science and Technology, 2(2).
Neeta Sharma and Ravi Sindal, 2013. Modified Booth Multiplier using Wallace Structure and Efficient Carry Select Adder, International Journal of Computer Applications, 68(13): 39- 42.
Nirlakalla Ravi, A. Satish, T. Jayachandra Prasad and T. Subba Rao, 2011. A New Design for Array Multiplier with Trade off in Power and Area, International Journal of Computer Science, 8(3): 533-537.
Nivedita, A Pande, Vaishali Niranjane and Anagha V. Choudhari, 2013. Vedic Mathematics for Fast Multiplication in DSP. International Journal of Engineering and Innovative Technology (IJEIT), 2 (8).
Premannda, B.S., S. Samarth Pai, B. Shasank and S. Shashank Bhat, 2013. Design and Implementation of 8 bit Vedic Multiplier. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(12): 2278-8875.
Ravindra, Rajput, P., M.N. Shanmukha Swamy, 2012. High Speed Modified Booth Encoder multiplier for signed and unsigned numbers. In: proceedings of 14th International Conference on Modeling and simulation, pp: 649-654
Riya Saini, R.D. Daruwala, 2013. Efficient Implementation of Pipelined Double Precision Floating Point Multiplier. International Journal of Engineering Research and Applications (IJERA), 3(1): 2248-9622.
Saha, P., A. Banerjee and A. Dandapat, 2009. High Speed Low Power Complex Multiplier Design Using Parallel Adders and Subtractors, International Journal on Electronic and Electrical Engineering (IEEE), 7(2): 3846.
Saha, P., A. Banerjee, A. Dandapat and P. Bhattacharyya, 2011b. Vedic Mathematics Based 32-Bit Multiplier Design for High Speed Low Power Processor. International Journal on Smart Sensing and Intelligent Systems, 4(2).
Saha, P. A. Banerjee, A. Dandapat and P. Bhattacharyya, 2011a. High Speed ASIC Design of Complex Multiplier using Vedic Mathematics, IEEE International Conference.
Soniya and Suresh Kumar, 2013. A Review of Different Type of Multipliers and Multiplier-Accumulator Unit, International Journal of Emerging Trends and Technology in Computer Science, 2(4): 364-368.
Sureka, N., R. Porselvi and K. Kumuthapriya, 2013. An Efficient High Speed Wallace Tree Multiplier, In: Proceedings IEEE International Conference on Information Communication and Embedded Systems, pp: 1023-1026.
Vaijyanath Kunchigi, Linganagouda Kulkarni and Subhash Kulkarni, 2012. High Speed and Area Efficient Vedic Multiplier. In: Proceedings IEEE International Conference on Devices, Circuits and Systems, pp: 360364.
Vinoth, C., V.S.K. Bhaaskaran, B. Brindha, 2011. A Novel Low Power and High Speed Wallace Tree Multiplier for RISC processor, In: Proceedings, IEEE International Conference on, 1: 330-334.
Waters, R.S. and E.E. Swartzlander, 2010. A Reduced Complexity Wallace Multiplier Reduction, IEEE Transactions, 59(8): 1134-1137.
P. Sakthi and P. Yuvarani
Assistant Professor, Department of EIE, M. Kumarasamy College of Engineering,
Corresponding Author: P. Sakthi, Assistant Professor, Department of EIE, M. Kumarasamy College of Engineering, Karur, India.
Tel: +91 98433 58512 E-mail: email@example.com
Table I: Comparison Between 4-Bit Wallace Multiplier And Vedic Multiplier Number of bits Wallace Multiplier Vedic Multiplier 2-bit 8.7 ns 8.7 ns 4-bit 14.071 ns 14.118 ns 8-bit 25.923 ns 23.973 ns Table II: Comparison of Booth Multiplier And Vedic Multiplier FPGA Device package Area in LUT's Delay (ns) Speed (MHz) Booth Multiplier 9 11.176 89.477 Urdhva Multiplier 86 8.460 118.203 FPGA Device package Memory (Kb) Power (mW) Booth Multiplier 156440 11.30 Urdhva Multiplier 166744 9.29 Table III: Comparison of Different Multipliers in Terms of Delay. Device Spartan Array Conventional Multiplier Vedic Multiplier Path delay for n = 8 bit 27.796ns 27.463ns Path delay for n = 16 bit 52.708ns 50.299ns Path delay for n = 32 bit 101.486ns 98.067ns Device Spartan Vedic Multiplier using Carry Look Ahead Adder Path delay for n = 8 bit 27.903ns Path delay for n = 16 bit 47.613ns Path delay for n = 32 bit 86.495ns Table IV: Performance Comparison of different Multipliers. Parameter Array Multiplier Wallace Tree Multiplier Operating Speed Less High Delay More Less Power Consumption Consume more power Consume more power Area Greater area(many Medium area(computation adders used) stages reduced) Structure Regular Irregular Complexity Less complex More complex Logic Levels More Less than array structure Type of operand Unsigned. Signed Signed and unsigned (reduces speed) FPGA Implementation Efficiency less Not efficient Parameter Booth Multiplier Operating Speed Higher Delay Less Power Consumption Consume less power Area Minimum area Structure Irregular Complexity Most complex Logic Levels Increases Type of operand Signed Unsigned(convert into signed) FPGA Implementation Most efficient Parameter Vedic Multiplier (Urdhva Tiryagbhyam multiplier) Operating Speed Highest as it has less devices Delay Very less Power Consumption Consume less power Area Minimum area(less number of adders) Structure Regular Complexity Less complex (less logic devices) Logic Levels Reduced to great extent Type of operand Signed and unsigned FPGA Implementation Most efficient
|Printer friendly Cite/link Email Feedback|
|Author:||Sakthi, P.; Yuvarani, P.|
|Publication:||Advances in Natural and Applied Sciences|
|Date:||Nov 1, 2014|
|Previous Article:||MCKELM-IDS: efficient feature transformation & optimal feature subset selection based intrusion detection approach using MCKELM.|
|Next Article:||A novel image thresholding method.|