Printer Friendly

3D moving sound source localization via conventional microphones.


In this paper, the generic idea of sound localization process is examined. Sound source localization is the determination of exact locations of sound sources using some appropriate signal processing techniques [1], [2]. The location of a sound source can be determined by 2D or 3D observation space [3], [4]. Generally, 2D location information is adequate for some of the basic applications [5]. Furthermore, some applications such as robotics, drones, etc. require the information on 3D positions. Before discussing the concept of sound location, we need to explain some of the basic parameters of sound signals. The basic definition of a sound signal is an air pressure changing with time. The pressure change can be easily converted to electrical signals via microphones which are three basic types based on the application area [6]. The first one based on the generic principle of generator systems is dynamic microphones.

The vibration of the air can be easily transformed into an electric signal via vibrating diaphragm and the coil structure. The second type is capacitor microphones affected by the changes in capacitance due to the changes of sound pressure. The last type is special purpose microphones that are specialized for different areas. In this work, the dynamic and basic microphone structure is used. As known, the sound signals vary with the time. The processing of such changes can be useful for different types of signal processing application such as sound recognition and localization, voice-based emotion detection, trajectory estimation etc. [7], [8]. The aim of this paper is to investigate the main sound localization techniques and also the whole sound process of sound source localization. Moreover, a novel sound database that can be used for developing moving sounds source localization systems is presented. A part of the presented database is used to investigate several selected sound localization techniques.


In this section, some terms on sound processing are explained. The sound localization is the determination of sound sources via signal processing techniques. In literature, this concept is implemented in different ways. Generally, the sound localization techniques are inspired by the hearing system of a human. The main terms that are related to sound localization are as follows:

1. Interaural Time Differences (ITD);

2. Interaural Level Differences (ILD).

The parameter of ITD is commonly used on basic works [9]. The negative part of ITD is the sensitivity to ambient noise and it also doesn't provide the exact location of the sound source. This method gives an idea about azimuth angle of the sound source. The ILD method is related with sound pressure level [10]. This parameter also gives an idea about the location of the sound source via sound level difference. Besides, this parameter is more sensitive to environment and hardware condition. The ITD and ILD parameters can be used together. This combination is provided to determine the exact location of sound source [11]. ILD and ITD parameters are shown in Fig. 1 and Fig. 2, respectively.


In this section, we explain the basic parameters of our test database. In the presented research, one of the scenarios is tested by our databases. The scenario involves walking off one person and saying the ID numbers of location, simultaneously. These data sets are obtained from four inductive types of microphones and a sound recording system. The sampling rate of the dataset is [f.sub.s] = 44100 Hz and Scarlett18i8 desktop interface model is used for sound recording. The 2D coordinates of microphones and locations are shown in Fig. 3.

The second microphone is selected as the origin of the coordinate system to reference point. The coordinates of microphones and locations are shown in Table I and Table II, respectively.

The 3D modelling of the test room and the environment is shown in Fig. 4.

The location of the speaker and distances are shown in Fig. 5. The distances between ceiling and floor, mount of the speaker to microphone, and ceiling and microphones are defined as follows [], [d.sub.z], and [] , respectively.

The front view of the recording environment used in sound source localization is shown in Fig. 6.


In this section, we explain some methods of the sound localization process. The most common and basic one is ITD based sound localization which is used by the time lag between the receivers of sound. Several algorithms have been developed to estimate TDOA in the ideal propagation situation. The best known of these algorithms is time domain based cross-correlation [12]. The basic definition of cross correlation is shifted one waveform to other waveform and determined to provide maximum similarity point for obtained time lag between the signals. The cross-correlation formula for continuous time domain is shown in (1). The parameter of t is the shift parameter for continuous signal


The test sound signal is integrated with the reference sound signal and the test signal is shifted according to time to analyse the similarity. At the end of this process, the amount of delay between two signals is obtained according to time. Besides, time-based cross-correlation algorithm is not robust enough to environmental noise and echo effects. Hence, Generalized cross-correlation (Gcc) algorithm is preferred by researchers on sound localization process [13]. The brief explanation of cross-correlation and Gcc method is as follows

Assuming that there is only one (unknown) sound source in the field, the output of receiver n(n = 1,2, ..., N) can be written as

[x.sub.n] (k) = [a.sub.n]s(k - [D.sub.n]) + [b.sub.n] (k), (2)

where [[alpha].sub.n], which satisfies 0 [less than or equal to] [[alpha].sub.n] [less than or equal to] 1, is an attenuation factor due to propagation effects, [D.sub.n] corresponds to the propagation time from unknown sound source to the receiver n, and s(k) which is often speech from either a talker or loudspeaker is broadband in nature. The [b.sub.n] (k) represents Gaussian random noise and it is uncorrelated with both sound source signal and noises at the other sensors. The function of cross correlation between two signals is shown in (4):

p = [D.sub.2] - [D.sub.1], (3)

[r.sub.12](P) = E[[x.sub.1](k) [x.sub.2](k + P)]. (4)

The parameter of TDOA between received sound signals can be calculated by


where p [member of] [-[[tau].sub.max] ,[[tau].sub.max]] and [[tau].sub.max] is the maximum possible delay about signals. In digital implementation of TDoA calculation, we need some approximations. Supposing at time instant t, we have a set of observation samples of [x.sub.n], {[x.sub.n] (t), [x.sub.n] (t + 1), ..., [x.sub.n](t + k -1), ..., [x.sub.n] (t + K -1)}, n = 1, 2, corresponding CCF can be estimated by (6) and (7), respectively:



Another way to estimate cross-correlation between signals is the use of Discrete Fourier Transform (DFT) and the Inverse Discrete Fourier Transform (IDFT) as shown in (8) and (9):


[w.sub.k'] = 2[pi]k'/K, (9)

where k' = 0,1, ..., K - 1, [w.sub.k'] is the angular frequency


In the frequency domain, DFT of [x.sub.n] (k) is formed as [14]:



which is a Gaussian random variable. It can be illustrated by:



where n = 1,2, ..., N the power spectral densities of s(k), [b.sub.n](k), [x.sub.1](k), and [x.sub.2](k) are shown in (15)-(18) respectively:

[P.sub.s] ([w.sub.i]) = E [ S ([w.sub.i])[S.sup.*]([w.sub.j])], (15)




The commonly used weighting functions in GCC method are shown in Table III.

In this paper, we have a moving sound source for localization. Excitation Source Information (ESI) based time-delay estimation algorithm is used for determining the time lag between sound channels [14], [15].

The optimal GCC method is determined by some experimental studies for this database. The error percentage is equal to the difference between the real number of sample delay and estimated sample delay. According to this information, we calculate the error percentage versus Signal-to-Noise Ratio (SNR) to determine the robustness of GCC methods. This comparison is implemented by our sound recording database. The result of this comparison is shown in Fig. 7. The approach of ESI has more successful results than other GCC methods on our sound dataset as shown in Fig. 7. The mean error percentage values for different GCC approach is illustrated in Table IV.

The accuracy of sample delay estimation has a vital importance for determining the exact location of sound sources. The user only wants to know the distance between microphones and time difference between received sounds [16], [17]. By means of this information, the azimuth angle of the sound source can be calculated easily. The illustration of ITD based on the determination of azimuth angle is shown in Fig. 8.

One of the most important parameters for sound localization is the speed of the sound. This parameter directly affects the performance of sound source localization.

The speed of sound is highly sensitive to environment conditions. The generic application of sound localization techniques, sound speed is selected as a constant ([V.sub.s] [approximately equal to]~ 343 meter / second). In addition, the speed of sound can be calculated as a function of environment temperature. The calculation of sound speed is shown in (19). The estimation of azimuth angle via ITD is also given in (20). In this equation, delay between samples, time of sample and distance between receivers are represented by [s.sub.d], [T.sub.s], and d(meter), respectively [18]. The parameter of Ctrepresents ambient temperature in Celsius:

[V.sub.s] = 20.05 [square root of (273.15 + [C.sub.t])], (19)

[alpha] = arcsin([V.sub.s][S.sub.d][T.sub.s]/d). (20)

The ITD based sound localization is illustrated in Fig. 9. The intersection points of azimuth angles provide the exact sound source location. Generally, the estimation of correct delay between sound channels is very hard due to environmental noise and echo effects. Since there will be a deviation between estimated and actual locations. It depends on the estimation success of the algorithm. In this approach, the users have to know only the delay between sound channel and locations of microphones.


Another approach for sound source localization is the time difference of arrival. The generic idea of Time Difference of Arrival (TDOA) is to determine the relative arrive time differences between receivers. This approach can be easily applied to the time difference between receiving channels and exact location of receivers [19], [20]. This approach is commonly used in the area of military, sound localization, GSM, and wireless sensor networks etc. The TDOA based 3D sound source localization can be defined as optimization problem [21].

The optimal estimated location is the value minimizing the expression in (21)


where ([x.sub.i], [y.sub.i], [z.sub.i]) - location of microphones, [d.sub.i] - distance between sound source and microphones, ([x.sub.s], [y.sub.s], [z.sub.s]) location of sound the source, M - number of microphones (M [greater than or equal to] 3).

This is a quadratic and unconstrained optimization problem to solve [22]. The open form of (21) is defined as in (22)-(25) for four microphones:





where ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) - estimated location of sound source.

in this work, four microphones are used for sound source localization and six different combinations are obtained for this relationship. This combination based distances are shown in (26)

[d.sub.ij] = [square root of ([([d.sub.i] - [d.sub.j]).sup.2])]. (26)

These distances can be easily transformed to sample delay difference [S.sub.ij] between microphones as (27) and [V.sub.s] is easily calculated depending on ambient temperature via (19). The sampling frequency ([f.sub.s]) is 44100 Hz for this study

[S.sub.ij] = [d.sub.ij][f.sub.s]/[V.sup.s]. (27)

The TDOA approach allows determining the exact location of the sound source using the delays between the audio channels. The realization of the positioning problem with the TDOA approach is equivalent to an optimization problem. The solution of TDOA problem is performed by modified Levenberg-Marquardt algorithm [22]-[24].


In this section, we mention about our approach for 3D moving sound source localization with some of the data mining and signal processing methods. The process of 3D sound source localization consists of several steps. In the first step, signals which increase SNR ratio for better signal representation and correct time delay estimation are smoothed by Savitzky-Golay filter [25]. The sample result of Savitzky-Golay filter is shown in Fig. 10. As shown in the figure, Savitzky-Golay filter is suitable for signal smoothing and the user can obtain original signal without greatly distorting the signal.

The second stage of the process is to improve difference between the sound source signal and environment noises. We use a threshold for the filtered signal in order to clarify info of locations. Local maxima points are determined by adaptive thresholding method [26]. The results of filtered signal and threshold applied signal are shown in Fig. 11.

After this stage, it is required to determine a reference point for calculating correct time delay between sound channels. The k-medoids based clustering is applied to each sound signal outputs with thresholding applied [27]. centroid points are also determined for all sound channels and calculated mean points for centroid. K-medoids algorithm is utilized for two purposes in this work. The first one is to determine the reference point for all sound channels and the other one is to obtain the exact number of sound source location.

The determination of exact location number is shown in Fig. 12. This graphic is obtained by a number of locations (cluster) versus total distances between medoids and observation points for summation of all sound channels. As shown in the figure, there is an obvious elbow point, which gives an idea about an exact number of location adaptively. As expected, the number of sound location is N=10 in this study.

Furthermore, the centers of clusters which are our mask to obtain a reference for all the locations for sound signals are obtained by clustering. These reference points are calculated as shown in (28). The centroid points of sound channels are defined as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The number of sound channel is followed as [c.sub.n] = 4. The size of sample windows can be easily selected by these reference points. As known the number of location is calculated as N = 10:



where i = 1,2,3, ..., N - 1. The size selection of sample windows (W) is given in (29). The differences between reference points are calculated and a small safe merge is added to avoid overlap on sample window. The size of sample window and the reference line are represented in Fig. 13.

The estimated reference lines and sample windows are also shown in Fig. 14.

In the proposed algorithm, -D moving sound source sequence localization can be described as 7 steps as follows:

Step 1. Implementation of Savitzky-Golay pre-filter for input sound signal smoothing.

Step 2. Determination of local maxima points about sound signals via adaptive threshold algorithm.

Step 3. Data clustering algorithm is performed on calculated local maxima to determine optimal masking parameters.

Step 4. Determine an exact number of location for localization process.

Step 5. Obtain time delay between sound channels via excitation source information based time-delay estimation.

Step 6. Implementation of TDOA algorithm via modified Levenberg-Marquardt and determination of sound source location points in -D space as coordinates ([x.sub.s], [y.sub.s], [z.sub.s]).

In this paper, only ITD parameter is used for determining sound source localization. The parameter of ITD is much more robust and reliable when compared to a parameter of ILD. The difference between the outputs of our algorithm and the results of real coordinates is given in Table V.

The mean errors in 2D and 3D distance estimation are 59.814 cm and 87.902 cm, respectively. These results are acceptable since the sound source is moved during the whole process and the reference points for the time delay are estimated adaptively.

The 2D visualization of sound source localization results is shown in Fig. 15. The 3D illustration of localization results is shown in Fig. 16.


In this paper, we presented an algorithm to determine the moving sound sequence source localization for static systems. The generic idea of this study is to define suitable observation screen via basic data mining methods and implemented on sound sources. The excitation source information based on time-delay estimation algorithm is used for determining the time lag between sound channels. We describe some of the basic terms of sound signals and sound source localization methods. Besides, TDOA based on sound source localization is utilized for sound sequence localization. The results of axis Z are not good enough since all microphones are on the same plane. Also, 2D results are very acceptable for moving sound sources. The solution of TDOA problem is performed by modifying Levenberg-Marquardt algorithm in this work. After some processes, the clustering algorithm is used to obtain an exact number of locations of sound source adaptively. In addition, we share a new example for sound source localization, separation, and determination for different types of scenarios and explain all of the specifications about this database. It is observed that this paper is very helpful for researchers worked on sound processing and localization especially.

Manuscript received 19 November, 2016; accepted 4 May, 2017.


[1] A. N. Popper, R. R. Fay, Sound source localization. New York, USA: Springer, 2005. [Online]. Available:

[2] A. Deleforge, R. Horaud, "2D sound-source localization on the binaural manifold", IEEE Int. Workshop on Machine Learning for Signal Processing, 2012, pp. 1-6. [Online]. Available:

[3] D. Pavlidi, S. Delikaris-Manias, V. Pulkki, A. Mouchtaris, "3D localization of multiple sound sources with intensity vector estimates in single source zones", 23rd European Signal Processing Conf. (EUSIPCO), Nice, 2015, pp. 1556-1560. [Online]. Available:

[4] S. T. Birchfield, R. Gangishetty, "Acoustic localization by interaural level difference", IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2005. [Online]. Available: ICASSP.2005.1416207

[5] M. C. Catalbas, M. Yildirim, A. Gulten, H. Kurum, S. Dobrisek, "Estimation of trajectory and location for mobile sound source", International Journal of Advanced Computer Science and Applications (IJACSA), vol. 7, no. 9, pp. 237-241, 2016. [Online]. Available:

[6] D. A. Boyd, C. Hardy, Understanding microphones. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services, 2012. [Online]. Available: microphones

[7] T. Lukaszewicz, Z. Kidon, D. Kania, K. Pethe-Kania, "Postural symmetry evaluation using wavelet correlation coefficients calculated for the follow-up posturographic trajectories", Elektronika ir Elektrotechnika, vol. 22, no. 5, pp. 84-88, 2016. [Online]. Available:

[8] H. Ziegelwanger, P. Majdak, W. Kreuzer, "Numerical calculation of listener-specific head-related transfer functions and sound localization: Microphone model and mesh discretization", The Journal of the Acoustical Society of America, vol. 138, no. 1, pp. 208-222, 2015. [Online]. Available:

[9] B. Laback, "Sensitivity to interaural level and envelope time differences of two bilateral cochlear implant listeners using clinical sound processors", pp. 488-500, 2004. [Online]. Available:

[10] T. Hidaka, "Interaural cross-correlation, lateral fraction, and low-and high-frequency sound levels as measures of acoustical quality in concert halls", The Journal of the Acoustical Society of America, vol. 98, no. 2, pp. 988-1007, 1995. [Online]. Available:

[11] A. Pourmohammad, S. M. Ahadi, "N-dimensional N-microphone sound source localization", EURASIP Journal on Audio, Speech, and Music Processing, vol. 27, no. 1, pp. 1-19, 2013. [Online]. Available:

[12] T. Padoisa, F. Sgardb, O. Doutresa, A. Berryc, "Acoustic source localization using a polyhedral microphone array and an improved generalized cross-correlation technique", Journal of Sound and Vibration, vol. 386, pp. 82-99, 2017. [Online]. Available:

[13] Z. S.Velickovic, V. D. Pavlovic, "The performance of the modified gcc technique for differential time delay estimation in the cooperative sensor network", Elektronika ir Elektrotechnika, vol. 19, no. 8, pp. 119-122, 2013. [Online]. Available: j01.eee.19.8.2445

[14] V. C. Raykar, R. Duraiswami, B. Yegnanarayana, S. R. Mahadeva Prasanna, "Tracking a moving speaker using excitation source information", European Conf. Speech Communication and Technology, pp. 69-72, 2003.

[15] V. C. Raykar, B. Yegnanarayana, S. R. M. Prasanna, R. Duraiswami, "Speaker localization using excitation source information in speech", IEEE Trans. Speech and Audio Processing, vol. 13, no. 5, pp. 751-761, 2005. [Online]. Available: 851907

[16] W. G. Gardner, 3-D audio using loudspeakers. Springer Science & Business Media, 1998.

[17] L. Calmes, "Biologically inspired binaural sound source localization and tracking for mobile robots", PhD dissertation, RWTH Aachen Univ., 2009.

[18] C. Rascon, H. Aviles, L. A. Pineda, "Robotic orientation towards speaker for human-robot interaction", Ibero-American Conf. Artificial Intelligence, 2010, pp. 10-19. [Online]. Available: 10.1007/978-3-642-16952-6_2

[19] F. Gustafsson, F. Gunnarsson, "Positioning using time-difference of arrival measurements", in Proc. Acoustics, Speech, and Signal Processing, (ICASSP 2003), 2003. [Online]. Available: https://doi .org/10.1109/ICASSP.2003.1201741

[20] ] S. Hamdoun, A. Rachedi, A. Benslimane, "Comparative analysis of RSSI-based indoor localization when using multiple antennas in Wireless Sensor Networks", in Int. Conf. Selected Topics in Mobile and Wireless Networking, (MoWNeT 2013), 2013, pp. 146-151.

[21] K. Shoda, M. Arakawa, M. Morikawa, T. Hisano, K. Matsumura,"A 3D location estimation method using the Levenberg-Marquardt method for real-time location system", WCSMO-10, 2013.

[22] J. E. Dennis, R. B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations. Siam, 1996. [Online]. Available:

[23] L. Chen, "A modified Levenberg-Marquardt method with line search for nonlinear equations", Computational Optimization and Applications, vol. 65, no. 3, pp. 753-779, 2016. [Online]. Available:

[24] M. Balda, "LMFsolve. m: Levenberg-Marquardt-Fletcher algorithm for nonlinear least squares problems", 2009. [Online]. Available: 16063-lmfsolve-m--levenberg-marquardt-fletcher-algorithm-for-n[Online] ar-least-squares-problems

[25] R. W. Schafer, "What is a Savitzky-Golay filter? [Lecture notes]", IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 111-117, 2011. [Online]. Available:

[26] T. O'Haver, "A pragmatic introduction to signal processing with applications in scientific measurement", University of Maryland at College Park, 2015.

[27] P. N. Tan, M. Steinbach, V. Kumar, Introduction to data mining. India: Pearson Education, 2006.

Mehmet Cem Catalbas (1), Simon Dobrisek (2)

(1) Faculty of Electrical and Electronics Engineering, Firat University, Elazig, Turkey

(2) Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia

Caption: Fig. 1. Interaural level differences (ILD).

Caption: Fig. 2. Interaural time differences (ITD).

Caption: Fig. 3. Location map.

Caption: Fig. 4. 3D illustration of test environment.

Caption: Fig. 5. Location of the speaker.

Caption: Fig. 6. Recording environment for sound source localization.

Caption: Fig. 7. Comparison of GCC methods.

Caption: Fig. 8. ITD and azimuth angle.

Caption: Fig. 9. ITD based 2D dimensional sound localization.

Caption: Fig. 10. Signal smoothing via Savitzky-Golay filter.

Caption: Fig. 11. Filtered signal (a); Threshold applied signal (b).

Caption: Fig. 12. Exact number of location determination.

Caption: Fig. 13. Sample window size and Reference line.

Caption: Fig. 14. Reference lines and optimal masking for sound signals.

Caption: Fig. 15. 2D sound source localization results.

Caption: Fig. 16. 3D sound source localization results.

Microphone   X(cm)   Y(cm)   Z(cm)

1              0       0       0
2             530      0       0
3              0      600      0
4             530     600      0


Location   X(cm)   Y(cm)   Z(cm)

1           370      0      153
2           188      0      153
3           370     161     172
4           123     161     172
5            0      161     172
6           492     345     172
7           123     345     172
8            0      345     172
9           492     529     172
10          123     529     172
11          370     660     172
12          123     660     172


Method Name               Weighting Function

Cross correlation                  1
  (ROTH)                REPRODUCIBLE IN ASCII]
  (n=scale factor)      REPRODUCIBLE IN ASCII]


Method Name         Mean Error (%)

Cross correlation       14.505
ROHT Filter             13.101
SCOT                    20.511
SCOT-Modified           15.052
ESI                     12.627


Number of   Exact Position   2D distance (cm)   3D distance (cm)

1                 1               62.846             77.792
2                 2               29.392             46.591
3                 5               87.648            107.306
4                 8               42.151             91.197
5                 7               26.710             54.154
6                 6               38.188             59.702
7                 9               58.525             92.207
8                 10              81.199             98.872
9                 11              78.118            129.786
10                12              93.358            121.419
COPYRIGHT 2017 Kaunas University of Technology, Faculty of Telecommunications and Electronics
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Catalbas, Mehmet Cem; Dobrisek, Simon
Publication:Elektronika ir Elektrotechnika
Article Type:Report
Date:Apr 1, 2017
Previous Article:Wireless monitoring system for fireman's competence objective assessment.
Next Article:Performance analysis with wireless power transfer constraint policies in full-duplex relaying networks.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters