Printer Friendly

Transform based speech enhancement using firefly algorithm.


Speech enhancement algorithms attempt to improve the performance of communication systems when their input or output signals are corrupted by noise. The presence of background noise causes the quality and intelligibility of speech to degrade. Here, the quality of speech refers how a speaker conveys an utterance and includes such attributes like naturalness and speaker recognisability. Intelligibility is concerned with what the speaker had said, that is, the meaning or information content behind the words. Therefore, a noisy environment reduces the speaker and listeners ability to communicate. To reduce the impact of this problem speech enhancement can be performed. There are many ways to classify speech enhancement methods. It is usually difficult for a typical algorithm to be able to perform homogenously across all noise types. Therefore, usually a speech enhancement system is based on certain assumptions and constraints that are typically dependent on the application and environment. The speech enhancement systems can be classified based on number of input channels (one/two/multiple), domain of processing (time/frequency) and type of algorithm (Non adaptive/Adaptive).There are lots of literature works regarding speech enhancement methods. The Particle Swarm Optimization Algorithm and MMSE filter technique were compared with MMSE and BNMF for effective enhancement[1]. Bat algorithm is analyzed with the standard PSO[2] where the BA excels in the obtained results. Bat Algorithm (BA) and Particle Swarm Optimization Algorithm (PSO) comparison where the algorithms are trained to Radial Basis function network (RBF) [3] to classify the efficient algorithm with the benchmarking dataset. A Comprehensive review of Firefly algorithm is made which will be our proposed technique [4]. Three types of metaheuristic algorithms called firefly algorithm, bat algorithm and cuckoo search algorithm were compared to find optimal solutions and series of computational experiments using each algorithm were conducted and after the analyzation it is observed that firefly algorithm seems to perform better than bat algorithm and cuckoo search.[5]A comparative study of different window functions such as Hanning, Hamming, Blackman, Cosh, kaiser and Exponential windows and other transforms such as discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete wavelet transform (DWT), Ridgelet transform (RT) has been made[6][7]. Hybrid Wiener spectrogram filter (HWSF) for effective noise reduction, followed by a multiblade[8].The existing approaches are PSO and BA algorithm, where the novel approaches have their disadvantage such as complexity, functional difficulty and slow convergence speed. Hence the present study focuses to overcome the disadvantages in the existing algorithms with PSO and BA which use the MMSE filtering technique in the post processing module. The metaheuristic, nature inspired, stochastic Firefly algorithm overcomes the drawback of PSO and BA algorithm. FA algorithm is a metaheuristic stochastic optimization algorithm which is based on the light intensity and attractiveness of the fireflies. The FA algorithm has a property of randomization in searching a set of solutions. This algorithm is used with the cosh window in the pre- processing module and multiblade filtering technique in post processing module.

In this paper, the Firefly algorithmis implemented to find the effectiveness in speech enhancement. The rest of the paper is organised as follows. Section II describes the proposed speech enhancement algorithm. Section III, describes the objective measures used for the evaluation results of algorithm. Section IV conclusions are given finally.

Methodology Used:

In this study pre- processing module, optimization module and spectral filtering modules are carried out and the block diagram is given for this proposed technique in Fig. 1.


A. Pre-Processing Module:


Initially, framing is carried out where the input signals are divided into frames to achieve stationary. The process of division of signals into frames are carried out and the resultant frames of 20ms corresponds to n samples.

n = [] (1)

The Speech is divided into four frames. Each frames first part will be shared with the previous frame and the last part with the next frame. The time frame step [t.sub.fs] indicates the start time of each frame. The overlap starts at the time where a new frame starts and the current frame stops. [t.sup.f1] = [t.sup.fs] + [t.sup.0] (2)

where, In Eqn. 2, frame length [t.sup.f1] is given.

Cosh Window:

The shape of a window in time domain decides the characteristics of resultant filter in frequency domain. Several window functions are available in literature. For the present work, we have selected three parameter cosh window for truncation of infinite impulse response. Cosh window can defined as


The hyperbolic cosine of y is defined in Eqn. 4, Cosh (q) = ([e.sup.q] + [e.sup.-q])/2 (4)

The proposed window has been derived in the similar way as that of Kaiser Window, but it has computational cost advantage since because it has no power series expansion in its time domain representation. On obtaining the spectrum design equations for the proposed system function, it is compared with the Kaiser and ultra spherical windows in terms of various spectral characteristics. Simulation results show that the proposed window performs better on consideration of roll-off ratio of the side lobe compared to Kaiser window for the same window length and normalized main lobe width. This window provides better ripple ratio as well as side lobe roll-off ratio when compared with the ultra-spherical window. Performance of this system seems to be better when it is combined with the Hamming window in terms of the ripple ratio, than the performances of Kaiser and Hamming window combination. Moreover, the paper presents the relevance of the proposed window in FIR filter design. The result shows that the filters designed using cosh window provide better stop band attenuation than the filters designed using the Kaiser window.

Fast Fourier Transform:

Decomposition of N point time domain signal into N point frequency domain signals and each composed of a single point. Secondly, N frequency spectra calculation corresponding to the time domain signals. Lastly, synthesized into a single frequency spectrum.

B. Optimization Module:

Firefly Algorithm:

Firefly are unisex so that one fire fly will be attracted to another regardless of sex. Attractiveness is proportional to the brightness and hence the less brighter one will move towards the other. Attractiveness decreases as their distance increases and if there is no brighter one than a particular firefly.

There are two important issues

* variation of light intensity and

* formulation of attractiveness

For simplicity we can always assume that attractiveness of a firefly is determined by its brightness which in turn associated with the objective function.

Light Intensity and Attractiveness:

For optimization problems, the brightness I of the firefly at location X can be equated as I(X) [alpha] f(X) (5)

The attractiveness [beta] is relative and is determined by the following Eqn. 6.

[beta](x) = [[beta].sup.0 e.sup.-[gamma]x2] (6)

where [[beta].sup.0] is the attractiveness at the position x=0. In Eqn. 7, distance [r.sup.ij], between any two fireflies at i and j with [x.sup.i] and [x.sup.j] respectively. [r.sup.ij] = [parallel][x.sup.i] - [x.sup.j] [parallel] (7)

Light intensity decreases with the distances from the source and the light is also absorbed by the media. The attractiveness varies with the degree of absorption. The global maxima and minima can be found simultaneously. In a further improvement on the convergence speed of the algorithm, randomization parameter [alpha] can be varied. Movement of firefly i that is located in place [x.sup.i], toward brighter firefly j that is located in place xj is shown by the Eqn. 8, [x.sup.i] (t+1)=[x.sup.i] (t) + [[beta].sup.0 e.sup.-[gamma]r2] ([x.sup.j] - [x.sup.i]) (8)

C. Spectral Filtering Module:

Multiblade Filtering:

Residual noise is removed. In this post processing step, the leftover non speech and noise components are replaced by the median values. It is observed from the spectrograms and identified through listening that the musical tones appear randomly in the spectrum as isolated peaks or short ridges. Moreover, the speech content in the spectrograms exhibit distinguishing features. These observations prompted us to use 2D spectrogram for enhancement. The spectrogram of a speech signal has distinct characteristics, and we need to focus on different directions relative to a point of interest (POI) for nonspeech components. A total of 16 blades are used to indicate the different orientations. The first 6 blades focus on different 7-by-7 orientations with the POI in the centre. The other which includes five left blades and five right blades. Along with these orientations, investigation on the properties are carried out in order to classify the POI into speech components or nonspeech components accordingly.


Five clean speech sentences are selected randomly from NOIZEUS database for the evaluation of the proposed system performance. This database contains 30 IEEE sentences produced by three male and three female speakers (five sentences /speaker), and was corrupted by eight different real-world noises at different SNRs. The noise signals were added to the speech signals at SNRs of 0, 5, 10, and 15dB. From NOIZEUS database, different noise signals are added to the speech signal and are denoised using FA algorithm with different Signal to Noise Ratio (SNR) levels.





Fig 2 shows the Waveform of Input Noisy Speech Signal. Fig 3 shows the Spectrogram of Input Noisy Speech Signal. The spectrogram of noisy signal corrupted with babble noise of 0dB SNR. The speech signal is separated into several frames. For this 0 dB input SNR, when the corrupted signal is run through the FA algorithm, the output SNR is increased. Fig 4 shows the Waveform of Enhanced Speech Signal. The figure shows that the harmonic part is visible and hence the harmonic part of the signal is restored. Fig 5 gives the enhanced speech spectrogram of the proposed technique where the visible harmonics shows the noise reduction of the babble noise of 0dB.

A. Objective Measures:

The objective measures are made to Signal to noise ratio (SNR) and Perceptual evaluation of speech quality (PESQ) of Firefly algorithm.

B. SNR Estimation:

SNR is one of the evaluation parameter for measuring the enhancement of the algorithm. SNR is defined as the difference between the SNR of the enhanced speech and that of the noisy speech and is calculated by

S/N = 10 log [summation][x.sup.2](n)/[summation](x(n) - y[(n)).sup.2]] (9)

where, In Eqn. 9, x(n) is the clean speech and y(n) is the distorted speech. Different noises at various levels can be obtained.

C. PESQ Estimation:

Computation of PESQ is done by equalising the clean signal and the degraded signal to a standard listening level and then passed through the filter. Symmetric disturbance which is the measure of absolute audible error. Asymmetric or additive disturbance which are the measure of the audible errors that are louder than the reference.

PESO = 4.5 - [a.sup.1][D.sup.ind] - [a.sup.2][A.sup.ind]

where [a.sup.1] = -0.1 and [a.sup.2] = -0.0309 (10)

Fig 10 shows the representation of PESQ (out of 4.5) for different input noises level in the firefly algorithm.

Table I, shows the output SNR values of babble noise, airport noise and car noise of 0, 5, 10 and 15dB it can be seen that the output SNR level is 37.15dB of babble noise and the quality in speech is achieved.


Fig.6 shows the SNR for different input SNR such as airport, babble, car, noises at 0,5,10 and 15dB in the Firefly algorithm.

Table II, shows the output PESQ values of different noise levels such as babble noise, airport noise and car noise of 0, 5, 10 and 15dB it can be seen that the output PESQ level is 1.51 out of 4.5 of car noise which is reliably good.


Fig 7 shows the representation of PESQ (out of 4.5) for different input noises levels such as airport, babble and car noise at 0,5, 10 and 15 dB in Firefly algorithm.


The Firefly algorithm offers a new direction for speech enhancement as it exhibits many advantages such as high convergence rate and minimum computation. Complexity and difficulty level of functions had no effect to FA. FA also proves better in terms of speed of convergence which is due to the effect of generating completely different random numbers to be used in iterative procedures of algorithm. FA seems to be potentially more powerful and favorable optimization tool due to effect of attractiveness function which is unique to firefly behavior FA performs local search well as in some algorithms, it is unable to completely get rid of local search. FA remembers the history of better solution so as to achieve the high optimization solution.


[1.] Senthamizh Selvi, R., G.R. Suresh, 2015. 'Hybridization of Spectral Filtering with Particle Swarm Optimization for Speech Signal Enhancement', Research Journal of Applied Sciences Engineering and Technology

[2.] Prajna, K., G. Sasibhushana Rao, Senior Member IEEE, K.V.V.S. Reddy, R. Vma Maheswari, 2014. 'Application of Bat Algorithm in Dual Channel Speech Enhancement', IEEE.

[3.] Ruba Talal, 2014. 'Comparative Study between the (BA) Algorithm and (PSO) Algorithm to Train (RBF) Network at Data Classification',International Journal of Computer Applications.

[4.] Iztok Fistera, Iztok Fister Jr.a, Xin-She Yangb, Janez Bresta, 2013. 'A comprehensive review of firefly algorithms', Elsevier.

[5.] Sankalap Arora, Satvir Singh, 2013. 'A Conceptual Comparison of Firefly Algorithm, Bat Algorithm and Cuckoo Search ', International Conference on Control, Computing, Communication and Materials (ICCCCM).

[6.] Verma, A.R., R.K. Singh, A. Kumar, 2013. 'An Improved Method for Speech Enhancement Based on Ridge let Transform', IEEE.

[7.] Verma, R., R.K. Singh, A. Kumar, 2012. 'A Comparative Study Of Performance Of Different Window Functions for Speech Enhancement', Advances In Intelligent Systems And Computing, 236: 993-1002.

[8.] Huijun Ding A., Ing Yann Soon a, Soo Nee Koh A. Chai Kiat Yeo, 2009. 'A spectral filtering method based on hybrid wiener filters for speech enhancement ',Elsevier.

[9.] Xiao, X. and R.M. Nickel, 2010. Speech enhancement with inventory style speech resynthesis. IEEE T. Audio Speech 18(6): 1243-1257

[10.] Yang, X.S., 2010. Nature-Inspired Metaheuristic Algorithms, Luniver Press, 2nd edition.

[11.] Yang, X.S., S. Deb, S. Fong, 2011. Accelerated Particle Swarm Optimization and Support Vector Machine for Business Optimization and Applications, in: NDT2011, CCIS 136, Springer, 53-66.

[12.] Mohammadiha, N., P. Smaragdis and A. Leijon, 2013. "Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization," IEEE Trans. Audio, Speech, and Language Process., 21(10): 2140-2151.

[13.] Eberhart, R.C. and J. Kennady, 1995. A new optimizer using particles swarm theory, International Symposium on Micro Machine and Human Science, Nagoya, Japan, 39-43.

[14.] International Telecommunications Union, 2001. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T Recommendation P.862.

[15.] Laleh Badri Asl and Vahid Mjid Nezhad, 2010. "Improved Particle Swarm Optimization for Dual-Channel Speech Enhancement," International Conference on Signal Acquisition and Processing, 13-17.

(1) Dhivya Rajeswari J., (2) Senthamizh Selvi R. and (3) Suresh G.R.

(1) Electronics and Communication Engineering, Easwari Engineering College, Chennai, India

(2) Electronics and Communication Engineering, Easwari Engineering College, Chennai, India

(3) Electronics and Communication Engineering, Easwari Engineering College, Chennai, India

Received 25 April 2016; Accepted 28 May 2016; Available 5 June 2016

Address For Correspondence:

Dhivya Rajeswari J., Electronics and Communication Engineering, Easwari Engineering College, Chennai, India

Table 1: Results of output SNR with different Input SNR.


BABBLE NOISE        0 dB            37.15
                    5 dB            32.50
                    10 dB           32.82
                    15 dB           31.79

AIRPORT NOISE       0 dB            33.48
                    5 dB            33.32
                    10 dB           32.38
                    15 dB           32.97

CAR NOISE           0 dB            35.13
                    5 dB            31.41
                    10 dB           33.52
                    15 dB           33.75

Table 2: Results of PESQ with different Input SNR.


BABBLE NOISE        0 dB             1.32
                    5 dB             1.26
                    10 dB            1.30
                    15 dB            1.36

AIRPORT NOISE       0 dB             1.45
                    5 dB             1.37
                    10 dB            1.35
                    15 dB            1.40

CAR NOISE           0 dB             1.51
                    5 dB             1.41
                    10 dB            1.37
                    15 dB            1.39
COPYRIGHT 2016 American-Eurasian Network for Scientific Information
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Dhivya, Rajeswari J.; Senthamizh, Selvi R.; Suresh, G.R.
Publication:Advances in Natural and Applied Sciences
Date:Jun 15, 2016
Previous Article:Reduction of impulse noise using improved weighted average filter.
Next Article:Design and implementation of virtua linvigilation system and smart exam scheduler.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters