Printer Friendly

Low Cost Skin Segmentation Scheme in Videos Using Two Alternative Methods for Dynamic Hand Gesture Detection Method.

1. Introduction

Dynamic hand gesture segmentation is the foundation step of the whole hand gesture tracking and recognition system. Since the qualitative results will affect the follow-up procedures of hand gesture recognition system [1], dynamic hand gesture segmentation process requires improving the segmentation accuracy, achieving real-time segmentation, reducing the influence of lighting conditions on dynamic hand gesture segmentation process, and precisely segmentally moving the hand gesture from a complicated environment or different brightness conditions.

In previous years, skin colour feature segmentation played a pivotal role in dynamic hand gesture segmentation since skin is invariant to hand scale changes and posture variations [2]. The existing body of studies on skin colour segmentation has suggested the use of colour spaces to perform human skin segmentation by particular skin colour threshold [3-5]. This is because utilising skin colour threshold approach can easily and simply segment the skin features including hand skin from the background to be used within dynamic hand gesture segmentation methods [6]. However, skin colour threshold approach is limited by illumination variations, as well as the interference with similar skin colour objects from the background [7]. Therefore, many researchers have proposed approaches depending on a set of conditions derived from the skin spot in 2D or 3D colour spaces. Such methods have their conditions for any given pixel to be investigated in making the decision on the class of that pixel. A number of methods have been developed to segment human skin colour including hand skin, such as that conducted by Mahmoodi et al. [8] using the combination of ternary images with the outcome of frame differencing technique in skin-motion segmentation scheme, which has improved the Bayesian classifier and feedback mechanism. However, they [8] stated that the initial seed generator based on skin-motion segmentation is very crucial and requires further improvement. In addition, there is a need for designing a mechanism to cope with highly lightened areas.

Qiu-yu et al. [9] presented a method based on YCbCr colour space and K-means clustering algorithm. On the contrary, their method is unsuitable under complex situations and is restricted to real-time performance.

Tan et al. [10] developed a framework based on smoothed 2D histogram and Gaussian model. Additionally, the success of their method relies on eye detector algorithms. Besides, they [10] mentioned that the framework is still limited and is subjected to further enhancement.

Gupta and Chaudhary [11] employed automated colour space switching method based on various colour spaces, the statistical mean of the skin pixels value in the image and Bayesian approaches. Nonetheless, their [11] model consumes a high computational time. Moreover, it is limited by various lighting conditions and different backgrounds.

Yeo et al. [12] have utilised the skin colour in Y, Cb, and Cr colour space components, which were further extracted and smoothed to reduce clutter via applying morphology operation and finally combined using logical "AND" operation. Nonetheless, the algorithm was found to function better in indoor situation under normal lighting condition and may be degraded under very dark or very bright illumination conditions.

Asaari et al. [13] used generic thresholding skin segmentation algorithm based on skin thresholds in Cb-Cr colour spaces, which was calculated using an equation from mean and standard deviation of cropped skin regions pixels. Unfortunately, the obtained threshold factors were detected with a reduced capability in detecting and segmenting skin colour features including hand skin under low illumination condition.

Kawulok et al. [14] introduced two strategies combined in hybrid adaption system: the first adaption strategy uses detected facial area and the other strategy utilises a self-adaptive scheme that uses global model response to create local skin colour. The extracted local skin colour model was used to obtain seeds for the geodesic distance transform that defines the skin region boundaries. However, the introduced method offers essential improvement of accuracy of skin detection, but still the results are unsatisfactory and the algorithm is addressed for further improvement regarding reducing the false positive rate and developing the algorithm to exhibit more adaption.

Thus, to ensure correct, simple, and low computation time segmentation to skin colour including hand skin against illumination conditions variations and complex environment, this paper proposes the enhancement thresholds selection technique through skin colour segmentation scheme. This scheme includes two procedures for calculating generic threshold ranges in Cb-Cr colour space. The first procedure considers skin segmentation using threshold values trained online from nose pixels of face region based on Viola-Jones method and Fast Marching Method (FMM) [15, 16]. The second procedure, which is named offline training procedure, was executed as an alternative since the online training procedure has failed to detect face region under particular circumstances where Viola-Jones algorithm is degraded.

This is because the offline training procedure calculates the threshold range offline out of skin samples using weighted equation.

2. The Proposed Skin Segmentation Scheme

Skin colour is considered a significant feature used to discriminate between skin and nonskin regions in an image. Skin colour information is more robust against geometric variations caused by scaling, rotation, or translation. Basically, the RGB colour space is the default colour space that is often utilised and visualised particularly for digital images processing and storing. Any other colour spaces that can be derived from RGB colour space through a linear or nonlinear transformation can be retrieved back again with different quantities of computational power required for various colour spaces. The colour space transformation acts to reduce the overlay between skin and nonskin distributions, thus facilitating skin pixel classification and improving the efficiency against illumination influences. Prior studies have revealed that the skin colours are brighter than that of chrominance or colour components [17,18].

In this study, YCbCr colour space was advocated to represent skin colour since the transformation from RGB to YCbCr is less complicated than other colour spaces, which makes YCbCr a favourable choice for a skin segmentation task [13, 17]. YCbCr colour space is derived in such a way that the illumination (luminance) component is focused in a single component (Y), while colour (chrominance) components are concentrated in Cb and Cr components. The conversion from RGB to YCbCr can be gained from the linear relationship as described in

Y = 0.299R + 0.587G + 0.11B Cb = B - Y Cr = R - Y. (1)

Y is the point representing the grey level information, whereas Cb and Cr describe the colour variance with respect to blue and red colour channels.

To reduce the effects of brightness on skin colour regions, this study has adapted the same view of previous studies [13,17,19] that consider only chrominance components (Cb and Cr) while the brightness components (Y) are discarded for the skin segmentation process. The cause behind the discarding of brightness component (Y) is the illumination that can vary greatly over the region of skin under various lighting effects, which makes it difficult to select the range of skin colour values.

To segment skin regions, the study in [13] has used the mean and standard deviation for skin region samples in Cb-Cr colour spaces to calculate the decision boundary (thresholds values) for skin colour segmentation. However, the calculated thresholds gave weak or noncorrect segmentations to skin region under low lighting situation including the misdetection of skin colour pixels in somewhat dim illumination or high light situation. This has caused the degradation of skin colour segmentation scheme, which directly affected the results of hand gesture segmentation algorithm.

To cope with illumination problem, this study has proposed a new skin colour information segmentation scheme. This scheme calculates a new boundary decision (thresholds values) based on maximum and minimum range values of CrCb colour space. The thresholds of range based segmentation were calculated utilising either online training procedure from nose pixels of face region or offline training procedure from a number of skin samples. Figure 1 displays the flowchart of skin colour segmentation scheme.

First and foremost, the proposed skin colour segmentation scheme began with online training process. In this process, Viola-Jones algorithm [20, 21] in YCbCr colour space was used to detect face region and the region of nose, respectively. The threshold values were selected from detected nose region by calculating the minimum and maximum values for Cb and Cr chrominance components, respectively. Considering that the threshold values taken from nose region are in range (minCr, maxCr, minCb, and maxCb), Figure 2 shows the sample of online training thresholds. Moreover, to cope with high complexity drawback, the online training procedure was applied only once. The face region was later discarded from image. Consequently, binary information of skin areas in the image was segmented via thresholding operation depicted in (4).

Furthermore, Fast Marching Method (FMM) [15,16] was applied into segmented skin feature from the previous step to correct the boundary and compensate the holes or missing pixels parts of hands and other skin color regions in the image frame due to unequal brightness distribution over face parts and other skin color regions in the image frame of video sequence. The FMM was able to track the boundary of moving objects and segment them from image with low computation time [22] using the formula illustrated below:


BW represents the segmented image; W is the weights for every pixel defined in the input array; MASK is the seed locations; THRESH is the positive scalar in the period [0 1]. THRESH identifies the level at which the outcome of FMM performs a thresholding operation to obtain the output binary image BW. Here, THRESH is set to 0.001 by experiments and observations.

However, under the situations of low lighting conditions and face rotation, Viola-Jones algorithm exhibited a decrease in its performance to detect face or/and nose regions in the video frames. This flaw further made the online training procedure fail and stop. To deal with such issue, the offline-training procedure was used to run based on thresholds (values range) that were calculated separately (in the offline procedure) for such obstacle. Hence, in the offline procedure, the thresholds values were calculated by weighted equation (3) and from a number of picked skin colour samples in Cb-Cr colour space. In fact, 11 picked skin samples were manually cropped from the face region of 11 individuals. These individuals were randomly selected from video files of the IBGHT dataset. Maximum and minimum values in each Cb and Cr for every skin sample were taken out as threshold values with Figure 3 illustrating the skin colour samples. Finally, the threshold values for Cb and Cr components required for skin colour segmentation were obtained using

thresholdsvect = alpha * thr2 + (1- alpha) * thrl. (3)

[alpha] is the weight and value by experiments and observation set to 0.02; thr1 and thr2 are the first and second extracted threshold vectors based on minimum and maximum range values of Cr_min, Cr_max, Cb_min, and Cb_max of skin samples (Figure 3). thresholdsvect represents calculated thresholds for skin colour segmentation based on Cr-Cb range. The used equation is inspired by [23] for adaptive human motion feature extraction. Figure 4 shows the threshold values calculated for skin segmentation.

Finally, image binarisation was performed to obtain the skin regions extracted from images by a thresholding operation depicted in

[mathematical expression not reproducible]. (4)

3. The Experimental Results and Discussion

For performance evaluation, unfortunately, no standard datasets appropriate for the video skin colour segmentation algorithm have been yet collected [8]. The dataset of Feeval [24] is the only one available; however, this dataset is not qualitative enough with imprecise ground truths. Additionally, Mahmoodi et al. [8] used their self-made SDD dataset in [25] including 33 videos. Yet, their dataset was not made available for other researchers.

Therefore, video files of IBGHT dataset [26] were used as they contain the highlighted challenges for hand gesture segmentation and detection approach. The proposed skin segmentation scheme in this present study concentrates on skin feature segmentation including hand gesture skin in video frames for dynamic hand gesture segmentation and detection method. The IBGHT video sequences were captured by low cost USB camera with 352 x 288 image resolution [26]. In addition, the IBGHT dataset comprised 60 video sequences with indoor and outdoor scenes.

However, the IBGHT dataset does not include ground truth for skin feature segmentation. Thus, the performance of the proposed scheme was subjectively compared with skin segmentation based on thresholds of previous studies. Besides, comparison upon computation time was conducted with the skin segmentation scheme developed by [13].

As depicted in Figure 1, the proposed algorithm for the skin colour segmentation started with an online training procedure where the input image frame of the video sequences was already converted into the YCbCr colour space. Thereafter, as observed in Figure 5, the Viola-Jones algorithm was applied to detect the face region where the output parameter of face detection is the boundary box around the region of the face (face_box). The face_box parameter was represented using a numeric vector in which x and y represent the corner points of the boundary box in x and y directions. Additionally, the width and height of the boundary box were included. Next, after the face region was successfully detected, Viola-Jones algorithm was applied to detect the nose region. The output of the nose detection is the boundary box around the nose region (nose_box), as shown in Figure 5.

As illustrated in Figures 5 and 6, the Cb and the Cr nose images were cropped based on the nose boundary box (nose_box) in terms of x, y, width, and height. After that, the threshold values for Cb and Cr were trained online by calculating the maximum and minimum value ranges inside each cropped Cb and Cr image of the detected nose region to be represented in the required (minCr, maxCr, minCb, and maxCb) parameters.

As depicted in Figure 7, the binary information of skin regions was then segmented from Y CbCr input image of video sequences using the calculated thresholds in (4). However, there is a problem that occurs under unequal brightness distribution over face parts and other skin areas in the image, which resulted in a loss of boundary and/or wholes inside the segmented regions (inaccurate segmentation). It was observed in Figure 7 that this problem was handled by applying the Fast Marching Method (FMM) into the segmented skin image. Consequently, FMM has contributed to enhancing and manipulating the wholes and/or missing parts of skin for hands and other regions inside the image.

On the other hand, the offline training procedure is switched on under low illumination conditions and face rotation situation where the Viola-Jones face detection algorithm failed to detect face and nose regions in skin colour segmentation by online training thresholds procedure. Figure 8 illustrates the skin colour segmentation results using offline training thresholds in Cb-Cr chrominance components where the threshold values are calculated by (3) and from a number of skin colour samples taken from advocated dataset IBGHT, as depicted in Figures 3 and 4. Finally, skin feature was segmented into binary image using (4) where white pixels match the skin information.

As shown in Figure 8 on the first vertical line image, the Viola-Jones face detection algorithm was seen to detect the face region excluding the nose region since hand and arm practice partial occlusion with the face, leading to a failure in segmenting skin information using the online training procedure. Therefore, the offline training threshold procedure was switched on as an alternative to handle such problem.

Figure 9 demonstrates that the face region was already detected and that the skin colour segmentation algorithm has managed to remove the face and reduces the noise caused by unrelated objects.

For further evaluation, Figure 10 illustrates the comparison between the proposed skin segmentation scheme and other state-of-the-art approaches based on pretrained thresholds in the Cb-Cr range. The comparison displayed the difference in performance of the threshold values calculated here by the proposed skin segmentation scheme and that by previous studies [13, 27] upon the video frames of IBGHT dataset [26].

It can be noticed from the images in the second column of Figure 10 that the proposed skin segmentation scheme possessed a better performance in comparison with the previous methods of [13, 27]. In addition, the proposed scheme has shown a better potential in various lighting and background conditions. For example, the proposed skin segmentation scheme has correctly segmented skin colour including that of the hand region in comparison with skin segmentation procedure of [13] that failed to detect the hand region. Moreover, the method of [27] has detected skin region with further noise affecting the hand gesture segmentation. Thus, the comparison is based on computational time, evaluated against the study results by [13, 27] skin segmentation methods, as shown in Table 1.

In summary, it can be seen from the results in Figure 10 and Table 1 that the developed skin colour segmentation algorithm has achieved the trade-off between accuracy and computational time.

4. Conclusion

In this paper, a skin segmentation scheme has been proposed, which successfully and accurately classified every pixel of the random selection video frame into skin and nonskin classes. The proposed scheme included two alternatively running procedures based on threshold factors in Cb-Cr colour spaces trained either using an online training procedure or using an offline training procedure. Online training procedure has calculated the thresholds factors ranging from nose pixels to face in which Viola-Jones algorithm was used to detect the face followed by the nose region to calculate min and max values for every threshold factor of Cb and Cr. However, alternatively, in out-of-plane rotation and different lighting situations, the Viola-Jones algorithm may fail to detect the nose region; in that case, to prevent degradation and failure, this paper proposed an offline training procedure. The offline training procedure segments skin information using skin samples data and weighted equation to obtain threshold factors of Cb-Cr. Two calculated thresholds ranges in max-min values for Cb-Cr were used in the weighted equation based on alpha weight to make the process adaptive to skin colour variations under different circumstances. As a qualitative measurement, the proposed method was subjectively compared with previous studies. Comparison upon computational time was also set. Consequently, the experimental results showed the ability of the proposed scheme to adapt to different illumination conditions and complicated environment by achieving a balance performance between time and sufficiency. In our future work, we intend to perform more qualitative experiments based on precision and recall with previous studies so that the latter may lead to the enhancement and reduction of the false positive percentage for the images that do not represent the skin somewhat. In addition, applying the extracted skin feature is yet to be applied into other features for dynamic hand gesture segmentation method.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.


[1] H. Li and M. Greenspan, "Model-based segmentation and recognition of dynamic gestures in continuous video streams," Pattern Recognition, vol. 44, no. 8, pp. 1614-1628, 2011.

[2] M. R. Mahmoodi and S. M. Sayedi, "Leveraging spatial analysis on homogonous regions of color images for skin classification," in Proceedings of the 4th International Conference on Computer and Knowledge Engineering (ICCKE '14), pp. 209-214, IEEE, Mashhad, Iran, October 2014.

[3] H. Duan and Y. Luo, "A method of gesture segmentation based on skin color and background difference method," in Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering, Atlantis Press, Hangzhou, China, March 2013.

[4] A. S. Ramakrishnan and M. Neff, "Segmentation of hand gestures using motion capture data," in Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems 2013 (AAMAS '13), pp. 1249-1250, ACM, Saint Paul, Minn, USA, May 2013.

[5] C.-L. Hwang, K.-D. Lu, and Y.-T. Pan, "Segmentation of different skin colors with different lighting conditions by combining graph cuts algorithm with probability neural network classification, and its application," Neural Processing Letters, vol. 37, no. 1, pp. 89-109, 2013.

[6] E. Stergiopoulou, K. Sgouropoulos, N. Nikolaou, N. Papamarkos, and N. Mitianoudis, "Real time hand detection in a complex background," Engineering Applications of Artificial Intelligence, vol. 35, pp. 54-70, 2014.

[7] M. R. Mahmoodi and S. M. Sayedi, "A comprehensive survey on human skin detection," International Journal of Image, Graphics and Signal Processing (IJIGSP), vol. 8, no. 5, p. 1, 2016.

[8] M. R. Mahmoodi, S. M. Sayedi, and F. Karimi, "Color-based skin segmentation in videos using a multi-step spatial method," Multimedia Tools and Applications, pp. 1-17, 2016.

[9] Z. Qiu-yu, L. Jun-chi, Z. Mo-yi, D. Hong-xiang, and L. Lu, "Hand gesture segmentation method based on YCbCr color space and K-means clustering," Interaction, vol. 8, pp. 106-116, 2015.

[10] W. R. Tan, C. S. Chan, P. Yogarajah, and J. Condell, "A fusion approach for efficient human skin detection," IEEE Transactions on Industrial Informatics, vol. 8, no. 1, pp. 138-147, 2012.

[11] A. Gupta and A. Chaudhary, "Robust skin segmentation using color space switching," Pattern Recognition and Image Analysis, vol. 26, no. 1, pp. 61-68, 2016.

[12] H.-S. Yeo, B.-G. Lee, and H. Lim, "Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware," Multimedia Tools and Applications, vol. 74, no. 8, pp. 2687-2715, 2015.

[13] M. S. M. Asaari, B. A. Rosdi, and S. A. Suandi, "Adaptive kalman filter incorporated eigenhand (AKFIE) for real-time hand tracking system," Multimedia Tools and Applications, vol. 74, no. 21, pp. 9231-9257, 2014.

[14] M. Kawulok, J. Kawulok, J. Nalepa, and B. Smolka, "Hybrid adaptation for detecting skin in color images," Intelligent Data Analysis, vol. 20, no. 1, pp. S121-S139, 2016.

[15] J. A. Sethian, Level Set Methods and Fast Marching Methods, vol. 3 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, UK, 2nd edition, 1999.

[16] J. A. Sethian, "A fast marching level set method for monotonically advancing fronts," Proceedings of the National Academy of Sciences of the United States of America, vol. 93, no. 4, pp. 1591-1595, 1996.

[17] J. Yang, W. Lu, and A. Waibel, "Skin-color modeling and adaptation," in Computer Vision--ACCV'98: Third Asian Conference on Computer Vision Hong Kong, China, January 8-10, 1998 Proceedings, Volume II, vol. 1352 of Lecture Notes in Computer Science, pp. 687-694, Springer, Berlin, Germany, 1997.

[18] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, "Face detection in color images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 696-706, 2002.

[19] X. Zou, H. Wang, H. Duan, and Q. Zhang, "A hand model updating algorithm based on mean shift," in Proceedings of the International Conference on Information Computing and Applications, pp. 651-660, Springer, Singapore, August 2013.

[20] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, Kauai, Hawaii, USA, December 2001.

[21] R. Lienhart and J. Maydt, "An extended set of Haar-like features for rapid object detection," in Proceedings of the International Conference on Image Processing (ICIP '02), USA, September 2002.

[22] R. Monneau, "Introduction to the fast marching method," Tech. Rep., Centre International de Mathematiques Pures et Appliquees, Nice, France, 2010.

[23] J. Park, Y. Lee, and H. Ko, "Dynamic time warping based identification using gabor feature of adaptive motion model for walking humans," International Journal of Control, Automation and Systems, vol. 7, no. 5, pp. 817-823, 2009.

[24] R. Khan, A. Hanbury, J. Stottinger, and A. Bais, "Color based skin classification," Pattern Recognition Letters, vol. 33, no. 2, pp. 157-163, 2012.

[25] M. R. Mahmoodi, S. M. Sayedi, F. Karimi, Z. Fahimi, V. Rezai, and Z. Mannani, "SDD: a skin detection dataset for training and assessment of human skin classifiers," in Proceedings of the 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI '15), pp. 71-77, November 2015.

[26] M. S. M. Asaari, B. A. Rosdi, and S. A. Suandi, "Intelligent biometric group hand tracking (IBGHT) database for visual hand tracking research and development," Multimedia Tools and Applications, vol. 70, no. 3, pp. 1869-1898, 2014.

[27] D. Chai and K. N. Ngan, "Face segmentation using skin-color map in videophone applications," IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 4, pp. 551-564, 1999.

Eman Thabet, Fatimah Khalid, Puteri Suhaiza Sulaiman, and Razali Yaakob

Faculty of Computer Science and Information Technology, Universiti Putra

Malaysia, Selangor, Malaysia

Correspondence should be addressed to Eman Thabet;

Received 22 November 2016; Revised 4 January 2017; Accepted 26 January 2017; Published 2 March 2017

Academic Editor: Martin Reisslein

Caption: FIGURE 1: Skin colour segmentation scheme.

Caption: FIGURE 2: Using detected nose region of user in YCbCr to train skin threshold factors online.

Caption: FIGURE 5: The output of face and nose regions detection using Viola- Jones algorithm.

Caption: FIGURE 6: Sample of skin values extracted from the nose region: (a) threshold values derived from the nose region in the Cb image represented as minCb and maxCb and (b) threshold values derived from the nose region in the Cr image represented as minCr and maxCr.

Caption: FIGURE 7: Skin colour segmentation using online training procedure before and after applying FMM.

Caption: FIGURE 8: Experimental results of skin colour segmentation using offline training thresholds procedure.

Caption: FIGURE 9: Sample of skin segmentation after removing face region.

Caption: FIGURE 10: Comparison between different skin colour segmentation methods based on pretrained thresholds in Cb- Cr components.
TABLE 1: Comparison based on computational time for skin
segmentation scheme.

The method                              Time in seconds

The proposed skin segmentation scheme       0.79149
The skin segmentation scheme of [13]        0.7724

FIGURE 3: Manually extracted skin colour samples as maximum and
minimum values in each Cb and Cr component from face region of
eleven users.

          1     2     3     4     5     6     7     8     9    10

Cb_max   119   115   116   117   117   121   117   125   119   127
Cb_min   117   114   113   113   113   100   114   123   115   118
Cr_max   147   147   149   149   149   148   147   141   146   148
Cr_min   143   145   145   143   142   138   145   140   143   134


Cb_max   121
Cb_min   116
Cr_max   165
Cr_min   140

FIGURE 4: The threshold factors obtained using offline training
procedure for skin segmentation.

                    1     2     3     4

Final thresholds   134   164   100   126
thr1               134   165   100   127
thr2               138   149   113   125
COPYRIGHT 2017 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Thabet, Eman; Khalid, Fatimah; Sulaiman, Puteri Suhaiza; Yaakob, Razali
Publication:Advances in Multimedia
Article Type:Report
Date:Jan 1, 2017
Previous Article:Vehicle Plate Detection in Car Black Box Video.
Next Article:A No-Reference Modular Video Quality Prediction Model for H.265/HEVC and VP9 Codecs on a Mobile Device.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters