Printer Friendly

Salient object detection using segmentation process.

INTRODUCTION

In recent years images have gained attention in the social media due to the advancement of smart phones and digital cameras. Instead of processing an entire image, investigating the informative objects from the image is of high importance. Lots of research has been undergoing in salient object detection and salient object segmentation. In the field of image analysis, image retrieval, object recognition and computer vision research, multimedia, object segmentation is one of the most important and demanding issue. In the existing methods, Chanho Jung [1] has proposed a graph cut-based segmentation algorithm to extract the salient object from the scene in which an improved spectral domain saliency detection method is used. This method has a poor visual contrast. Then according to Imen Karoui [2] a region level variation approach for segmentation is used in which it as a drawback that it does not deal with other computer vision applications. Then Jin-Gang Yu [3] used maximal entropy random walk (MERK) model to measure the saliency which is much efficient than previous methods but it cannot detect more regional features. Later Wenbin Zou [4] proposed a new colour-to-gray conversion method for saliency detection in which it has a limited accuracy in the performance evaluation of salient detection and object detection. Based on the relationship between the saliency estimation and Markov absorption probability, an algorithm that separates salient region from its neighbourhood is introduced by Jigging Sun but [5]only drawback is that it does not work well for images containing long range homogeneous salient regions. Hence this paper aims in overcoming all these limitations.

According to the information processing mechanisms, existing saliency estimation algorithms can be broadly categorized as either bottom-up (stimuli-driven), or top-down (goal-directed) approaches. Without the specific task guidance, saliency detection largely relies on the bottom-up model which mainly focuses on some low-level visual features like intensity, pattern, or orientation from pixels or regions. The most representative principle is center-surround contrast which measures the saliency of image regions based on their distinctness to surroundings. The contrast can be investigated from the local or global perspective according to the scale of the neighbourhoods. Additionally, lack of the prior knowledge about the size of the objects-of interest in advance, the contrast is usually computed at multiple scales.

On the contrary, the top-down model is closely related to a specific task and the resulting map indicates the possible position where the salient objects in an image are likely to occur.

Compared to the bottom-up model which is absent of high level knowledge, this kind of model can quickly and efficiently find the salient object if the basic properties of the target, such as colours or shapes are grasped beforehand. Nevertheless, the salient object class must be a particular species contained in the training images, which restricts its application seriously. Recently, works belonging to the bottom-up class have made significant progress as it is easy and fast to select attentional regions for subsequent image processing. Consequently, enormous number of computational models has been proposed to estimate the difference of a region from its neighbourhoods in an image within different frameworks.

The main objective of this paper is to evolve a segmentation strategy that can effectively describe the boundaries of Region of Interest (ROI). There are two ways that are widely used for the segmentation algorithms. It includes bottom up and top down approach. In this paper bottom up approach is used which is the stimuli driven mechanism. This method mainly focuses on the intensity, orientation from pixels or regions and other visual features. The contrast can be explored from the local or global outlook according to the measure of the neighbourhoods. Hence instead of pixel to pixel mapping, region mapping is implemented in order to segment object with well defined boundaries.

The work of this paper is organized as follows: The related work and the contributions of the work used for the salient detection is explained in Section II. Then, the concept of the algorithm and the technique used is presented in Section III. Experimental results are described in Section IV. Finally conclusions will be presented in Section V.

II. Related Work And Contribution:

A. Related Work:

In the existing methods, Chanho Jung [1] have proposed a graph cut-based segmentation algorithm to extract the salient object from the scene in which an improved spectral domain saliency detection method is used. This method has a poor visual contrast. Then according to Imen Karoui [2] a region level variation approach for segmentation is used in which it as a drawback that it does not deal with other computer vision applications. Then Jin-Gang Yu [3] used maximal entropy random walk (MERK) model to measure the saliency which is much efficient than previous methods but it cannot detect more regional features. Later Wenbin Zou [4] proposed a new colour-to-gray conversion method for saliency detection in which it has a limited accuracy in the performance evaluation of salient detection and object detection. Based on the relationship between the saliency estimation and Markov absorption probability, an algorithm that separates salient region from its neighbourhood is introduced by Jigging Sun but [5]only drawback is that it does not work well for images containing long range homogeneous salient regions. Hence this paper aims in overcoming all these limitations.

B. Contributions:

The contributions of the proposed method includes modified Fuzzy C Means (FCM) and expectation maximization clustering method which segments the image with much accuracy. The features are extracted effectively by HOG and DOG feature extraction process. In the filtering section enhanced adaptive bilateral filter is used which effectively removes the noise from the image. A SIFT transform is used to segment the interest object from the video files.

III. Technical Approach:

A. Representation:

Salient object is generally represented as a binary mask as A = {ax}. For each pixel X, ax[] {1, 0} is a binary label to indicate whether the pixel X belongs to the salient object or not.

B. Work flow:

In this section the work flow for the segmentation process as represented in the Fig. 1. The existing methods are based on the similarity of features. The background across the image is matched with the boundary of the image.

The input image is given from the MSRA dataset which contains thousands of images for salient object detection. The given input image undergoes double conversion in order to increase the precession of the image. Here double precession is made to the input image. By doing this the size of the image gets increased. Then row column and the dimension of the image i.e RCD value are calculated. Then the image undergoes the enhancement process in which enhanced adaptive bilateral filter with the window size of 2 is used followed by the segmentation process. The M x N segmented image undergoes feature extraction process.

The feature extraction involves HOG and DOG process. The HOG feature extraction takes place as follows [7].

The generalized histogram equalization formula is given in the equation (1) below,

h(v) = round(cdf(v) - [cdf.sub.min]/(M x N) - [cdf.sub.min] x (L - 1)) (1)

Where'd can be normalized to [0 255]

[cdf.sub.min] Denotes the minimum non zero value of the cumulative distribution Mx N gives the number of pixel in the image. L gives the number of gray levels used. The gradient of the image is given as

[nabla]f = [[p.sub.x]/[p.sub.y]] = df/dx/dy/dy (2)

Where, df/dx is the gradient in the X direction, df/dy is the gradient in the Y direction.

Hence by using equation (2), the information about the image is extracted. The gradient direction can be calculated using the equation (3) given below,

[theta] = [tan.sup.-1][[p.sub.x]/[p.sub.y]] (3)

Hence by this DOG feature is extracted.

After the feature extraction process, the clustering process takes place. In this the image is divided into RGB (red, green and blue) frames from which each channel is separated. Then the algorithm for the clustering process is given as below.

Algorithm:

The algorithm for the modified FCM is as follows,

1. Initialize the class.

2. Specify the weight index for the clustering. Here it is given as c = 3.

3. Calculate the minima of the pixel (mipix).

4. Using the minima values calculate the posterior probability using Baye's rule.

5. Calculate the length of the matrix.

6. Estimate the maxima of the image pixel and then histogram of the maxima pixel by adding 1.

7. Estimate the intensity pixels by computing the histogram of each pixel.

8. Centre point is calculated using histograms.

9. Calculate the absolute centroid position between central point and the location of the intensity pixels.

10. Find the locations of classified pixels.

11. Calculate the mean of the classified pixels and then the absolute value of the image.

12.

By applying this algorithm the salient object is segmented with high accuracy. With the flourishing rise of video data, automatic and coherent extraction of the salient object from the video is quite challenging and very important. The method used to segment the object in the video files is explained here. The block diagram that describes the segmentation process is given in the fig 2.

The block given in the fig. 2 describes the process involved in the video image segmentation. The input is given from the safari dataset which consist of many videos for single object segmentation. Few datasets for video image segmentation is given in the table 1. In the pre processing step, the noise is reduced and the global characteristics like scene cuts, global motion etc is extracted. Texture analysis module plays a vital role in video analysis in which the video is converted to various frames from which the spatial features like edges, regular textures, colour etc are recognized. Then the motion analysis module uses SIFT process that performs the analysis in the temporal domain in order to provide the motion information. The frame with maximum variation in the motion is selected for segmentation process. At last the object is segmented with well defined boundaries.

Thus SIFT transform is used which is the best descriptor and it achieves high accuracy than the previous existing methods.

Results:

There are many databases for the saliency object detection like MSRA, CCSD, and DUT-OMRON etc. The databases and the details about it are given in the table 1[6]. For the video object segmentation, safari, Freiburg Berkeley etc is the few datasets.

The curve generated in the fig.4 is the Receiver Operating Characteristic (ROC) curve that is plotted between true positive accuracy in the Y direction and the false positive values in the X direction. This gives the difference in accuracy level of existing and the proposed method. For video object detection, the input video sequence is split into frames as shown in the fig 5 (a).

From the input sequence the frame where maximum change in the motion occurs is identified and it is segmented as of in the fig 5(b). Then co segmentation occurs to segment the main object.

The final segmented output is given in the fig 5(c). Thus the giraffe from the video is segmented with much accuracy.

Conclusion and Future Work:

Thus this proposed method produces accuracy of about 97.66% which is comparatively higher than the previous existing methods. By this the segmented object is obtained with well defined and detailed boundary. This method can support even video content to some extent. This method holds good for the dataset images and the future work involves developing the algorithm for the camera images suitable for various applications with even more accuracy and in case of video object segmentation, co segmentation of multiple video object will be developed.

REFERENCES

[1.] Chanho Jung and Changick Kim, 2012. A Unified Spectral-Domain Approach for Saliency Detection and Its Application to Automatic Object Segmentation, IEEE Transactions on Image Processing, 21(3): 1272-1283.

[2.] Jean-Marc Boucher, Jean-Marie Augustin, 2010. Variational Region-Based Segmentation Using Multiple Texture Statistics, IEEE Transactions on Image Processing, 19(12): 3146-3156.

[3.] Jin-Gang Yu, Ji Zhao, Jinwen Tian and Yihua Tan, 2014. Maximal Entropy Random Walk for Region-Based Visual Saliency, IEEE Transactions on Cybernetics, 44(9): 1661-1672.

[4.] Hao Du, Shengfeng He, Bin Sheng, Lizhuang Ma and Rynson W.H. Lau, 2015. Saliency-Guided Color-to-Gray Conversion Using Region-Based Optimization, IEEE Transactions on Image Processing, 24(1): 434-443.

[5.] Jingang Sun, Huchuan Lu and Xiuping Liu, 2015. Saliency Region Detection Based on Markov Absorption Probabilities, IEEE Transactions on Image Processing, 24(5): 1639-1649.

[6.] Ali Borji, 2015. What is a Salient Object? A Dataset and a Baseline Model for Salient Object Detection, IEEE Transactions on Image Processing, 24(2): 742-755.

(1) Anita Titus and (2) Sakthi Preethi R

(1, 2) Electronics and Communication Engineering, Easwari Engineering College, Chennai, India.

Received 7 June 2016; Accepted 12 October 2016; Available 20 October 2016

Address For Correspondence:

R. Thamizhselvan. Assistant Professor, Department of Electrical Engineering, Annamalai University, Tamil nadu -608002, India.

E-mail: tamil2012au@gmail.com, phone: 8015333735

Caption: Fig. 1: Overview of the proposed system

Caption: Fig. 2: Block for video object segmentation

Caption: Fig. 3: Saliency based object detection (a) original image. (b) Filter output image. (c) Amount of noise removed from the original image. (d) Image enhancement output. (e) Edge detection in the object (f) Result of clustering (e) Ground truth .The evaluation of the proposed algorithm is done using the MSRA database which contains 10,000 images. In the fig. 3 the saliency object detection process is explained and the final ground truth will be of in the fig.3 (g). The output image obtained has a accuracy of about 97.66% which is higher than the previous existing methods.

Caption: Fig. 4: ROC Curve

Caption: Fig. 5: (a) Input sequence of the video

Caption: Fig. 5: (b) Segmentation process

Caption: Fig. 5: (c) Final output
Table 1: Datasets for salient object detection

NAME OF THE   NO OF    RESOLUTION
DATABASE      IMAGES   OF THE IMAGE

MSRA10K       10,000   400x300
CCSD          ~200     400x300
DUT-OMRON     5172     400x400
ECSSD         10,000   400x300
PASCAL        ~850     Variable
UCSB          700      800x600
SOD           ~300     481x321
COPYRIGHT 2016 American-Eurasian Network for Scientific Information
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Titus, Anita; Sakthi, Preethi R.
Publication:Advances in Natural and Applied Sciences
Article Type:Report
Date:Sep 15, 2016
Words:2354
Previous Article:Error analysis of data throughput and BER for the wireless OFDM PSK network.
Next Article:Performance analysis of fast and slow moving vehicles using Cdma techniques for underwater wireless communication network.
Topics:

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters