Printer Friendly

Complexity analysis of vision functions for comparison of wireless smart cameras.

1. Introduction

Vision systems implemented on wireless smart cameras have recently been the focus of research for many applications including surveillance [1], recognition [2], traffic monitoring, personal care [3], and industrial monitoring [4]. Often, a number of wireless smart cameras are spread over an area to form a network called a Wireless Vision Sensor Network (WVSN). In the WVSN context, the individual wireless smart camera is referred to as a Wireless Vision Sensor Node (VSN). Each VSN consists of an image sensor, an embedded processing platform, memory, battery or an alternative energy source, and a wireless link. Designing a VSN on an embedded platform is a challenging task because resources are limited compared to those for general purpose platforms. General purpose platforms offer greater design and implementation flexibility; however, these systems are often considered as being unsuitable for real time applications. Therefore, our focus is on vision systems implemented on embedded platforms. When designing a VSN for a particular application, the designers must firstly investigate the complexity of the design, and a failure to do this may result in both increased design costs and a longer developmental cycle. The vision functions can be implemented on microprocessors, dedicated hardware such as Application Specific Integrated Circuits (ASICs), or on programmable hardware such as Field Programmable Gate Arrays (FPGAs).

The microprocessors are widely used as General Purpose Processors (GPPs) and Application Specific Instruction-set Processors (ASIPs). The general purpose processors are focused for average performance and greater flexibility while the ASIPs, such as the Digital Signal Processor (DSP), are focused for specific performance and specific flexibility. The GPPs and ASIPs are software based and it is easyto be certain of the correctness of the code and the system performance by means of simulations in the software. The performance of microprocessor based platforms is often limited as compared to that for an ASIC and reconfigurable hardware platforms because, in general purpose processors, every instruction must be retrieved and decoded before execution [5].

An ASIC is customized for a specific application; therefore it will provide a better performance and low power consumption but the design cost is high due to nonrecurring engineering (NRE) and manufacturing costs associated with small volumes. Moreover, the ASIC solutions are customized for specific applications. Programmable logic devices (PLDs), which include Field Programmable Gate Arrays (FPGAs), are replacing traditional logic circuits by offering advantages such as small size, low power, and high performance in relation to the disadvantages associated with custom ASICs. The reconfigurable computing allows the user to program at a low level and supports general purpose computing by virtue of reconfigurability [5]. The choice of processing platform for a particular system is dependent on the performance requirement and constraints of the particular application. The features including smaller cost for low volume products, reconfigurability, and parallelism make FPGA a suitable platform for wireless smart cameras. Therefore, in this work we will consider complexity estimation for wireless smart camera functions on FPGA based platforms.

Currently, the trend in WVSN is to propose specific solutions and each solution requires a great deal of design and development effort and cost on FPGA based platforms. Proposing generic solutions which fulfills the needs of different solutions would reduce these efforts and cost. In this way, the efforts being utilized for individual solutions can be diverted to development of single optimized solutions. For proposing generic solutions for existing different individual solutions, the investigated solutions need to be implemented on each other architectures. This requires a lot of design and development efforts. Our goal is to propose a mechanism in the form of complexity model and system taxonomy which can assist the comparison of existing solutions and development of generic solutions without the need for actual implementation. To the best of our knowledge, no model exists which can facilitate the researchers/designers in comparison and generalizations of different smart camera solutions.

The comparison of vision solutions is necessary for the improvement of current research and for proposing generic solutions within this field. In relation to the complexity analysis, the arithmetic complexity, memory requirement, and suitability of vision functions for vision systems are investigated. To illustrate the use of a complexity model, a number of actual systems have been classified with the assistance of the system taxonomy and the resources are then estimated by using a complexity model. After the complexity estimation, the vision systems are compared and evaluated for implementation on a single generic architecture. It is worth mentioning that this paper is extended version [6]. Following this, Section 2 presents related work, Section 3 describes the problem, Section 4 presents the complexity model, Section 5 presents the case study, Section 6 shows the comparison of the vision systems, and Section 7 illustrates the future challenges while Section 8 provides the conclusion.

2. Related Work

Before describing the problem area, some of the related work published in the literature is presented in this section. Different solutions are proposed by researchers for WVSN problems. Currently, the focus is on particular solutions for particular problems in WVSN. Hengstler and Aghajan [3] proposed an application oriented design methodology for a VSN. However, the authors consider only the specific case of tracking objects using a single camera and a stereovision VSN. Dieber et al. [7] presented a formulation and approximation method for the camera selection and task assignment in order to achieve the required monitoring activities. The tradeoff between surveillance quality and resource utilization has been investigated. The design aspects and software modelling for a ubiquitous real time camera system are investigated by Lin et al. [8]. The authors divide the design aspects into two categories, namely, the general and the application specific. Taxonomy for a VSN is proposed by Rinner et al. [9] based on platform capabilities, the degree of distribution processing, and system autonomy. Some of the existing vision systems implemented on VSNs are described and classified according to the proposed taxonomy. In addition, some of the challenges associated with VSNs are described. Tilak et al. [10] classified the wireless microsensor networks from a communication protocol perspective. Different types of communication functions, data delivery models, and network dynamics are discussed for wireless sensor networks; however, there is no discussion in relation to camera based sensor systems. Generally, the camera based sensors produce twodimensional data which requires greater processing capabilities, greater memory, high power consumption, and a high communication bandwidth as compared to traditional sensor networks. Therefore, the requirements of visual sensor networks are different.

3. Problem Statement and State of the Art

An investigation of the related work shows that the focus is on particular aspects of the vision systems. There is no common mechanism for complexity estimation and for comparison of different vision systems, which is necessary for proposing generic solutions and the improvement of research within this field. When comparing two different vision systems, each of the vision system is required to be implemented on the other's architecture as depicted in Figure 1. This requires a great deal of effort in relation to both the design and its development. In some cases, a researcher does not have access to another researcher's architecture. As the number of systems increases for comparison, it becomes less feasible to implement all the vision systems. This necessitates the building of a common tool, which can facilitate the researchers in both the benchmarking and development of different classes of vision systems. In order to meet this demand, we have proposed an abstract model for complexity analysis with the assistance of system taxonomy.

The mechanism that has been adopted in order to develop this model is shown in Figure 2. The problem space is identified, which is to propose a mechanism or at least an abstract model for comparison of different wireless smart camera systems. In relation to this, a number of published vision systems, wired and wireless, individual standalone vision systems, and vision nodes in Wireless Vision Sensor Networks (WVSNs) have been surveyed and a large number of functions were extracted. Similar functions were grouped together to make an abstraction of vision functions. The abstracted vision functions are then used to develop our proposed system taxonomy. The vision functions are used for building the taxonomy because the majority of vision applications in wireless smart cameras focus on target detection, analysis, and the recognition of objects present in the field of view [11]. The taxonomy building is an iterative process and it can be modified to accommodate future developments within the field so there are back and forth arrows. The taxonomy may not be exhaustive, but it does cover both the fundamental and the common vision functions which are required for a wide range of VSN vision systems. The study [12] showed that our proposed taxonomy covers 95 percent of the investigated systems. After the system taxonomy, the complexity in terms of arithmetic operations and memory of vision functions are investigated for the complexity model. The complexity model together with the taxonomy can be used for comparison and development of a generic architecture for different classes of VSN. The proposed taxonomy is shown in Figure 3. In this, some of the functions are labeled by capital letters, which are further expanded in Figures 4, 5, 6, 7, 8, 9, and 10.

The system taxonomy is grouped into 9 levels including data triggering type, data sources, data depth, learning, storage requirement, vision processing, postprocessing, output type, and feedback. A VSN is categorized into three types, based on how the system is triggered to start processing. In time driven systems, the processing is performed after a certain time duration. In event/query driven systems, processing starts when a system is triggered by a certain event or query. Periodic systems start processing after a fixed duration of time. The system can receive data using three types of sources including: area scan, line scan, or from another system. Images can be captured in binary, grey scale, or in colour format. Conversion from one format to another would require additional resources. Therefore, a better strategy is to capture the image in the relevant format. For some applications, the systems learn about the environment in order to adapt to it, while, in some applications, there is no need for learning. Similarly, there could be a requirement for storage in some systems in order to store frames, for example, for subtraction, for temporal filtering, or for template matching, while in others, there is no need of storage.

The vision processing level in Figure 3 shows the abstraction of vision functions which, while not exhaustive, include typical processing functions required for VSN. After the processing level, postprocessing occurs, which includes functions for the reformatting of data, that is, compression algorithms in order to make the output data suitable for communication purposes. Note that, in general, functions have an alternative path, which is able to circumvent them, in order to show that they might not be required for some VSNs. A VSN is also characterized by the output type it produces. The output can be a matrix, vector, scalar, or flag and can be sent directly to the user or can be used for feedback. The dashed line represents the system's flow for one experimental system, presented in Section 5 for illustration purposes. Following on from the system taxonomy, the complexity analysis of vision functions used in the system taxonomy is presented.

4. Complexity Model

The complexity analysis is a challenging task due to the large number of influencing factors such as the specific requirements of an application, the number of vision functions and the external environment. There is no standard definition in relation to measuring the vision system complexity.

However, in order to provide an abstract model, we have investigated both the arithmetic complexity and the memory requirements of the vision functions, employed in different classes of VSN systems with the assistance of the system taxonomy. The complexity analysis for some of the vision functions depends both on the situation and on the incoming data from the previous stage. Therefore, the absolute parameter for complexity measurements is a challenging task and it is not intuitive to draw quantitative conclusions. In these cases, we have analyzed and discussed the suitability of the functions for VSNs. Other parameters, such as registers and latency, are design dependent. The abstract model of complexity analysis is provided in Table 1. It is worth mentioning that for some tasks there are a number of algorithms, each with varying complexity. In this paper, we have investigated the complexity of functions that are commonly used in wireless smart camera systems [12]. However, in this case, the system taxonomy needs to be updated periodically in order to make it more exhaustive for the current systems. Nonetheless, this complexity model can be used together with the system taxonomy for classification, comparison, and complexity estimation of vision systems, implemented on VSNs. Equation (1) is used in Table 1 with the Hough transform, while (2) is used with labelling:

[rho](x,y) = x cos [theta] + y sin [theta], (1)

[] = [me.sub.mBUF] + [mem.sub.LOOKUP] + [mem.sub.DATA], where [mem.sub.BUF] = ([log.sub.2]([CC.sub.max] + [CC.sub.col] + l)) x C,[mem.sub.LOOKUP] = [mem.sub.DATA], and [mem.sub.DATA] = (2 ' ([log.sub.2](R))). (2)

In (2), C represents column, R represents Row, CCmax is the maximum number of connected components, CCcol is the number of label collisions, and +1 is because the 0 is a preoccupied label [13]. Following this, the complexity on actual hardware is discussed.

4.1. Complexity on Different Processing Platforms. Early vision processing tasks such as background subtraction, spatial filtering, have inherited parallelism. This parallelism can be exploited by using either a hardware platform or processors with multicores. With the advancements in technology, parallel machines with multicores have now spread from supercomputing to embedded computing. The recent evolutions of parallel machines have drawn the attention of researchers. There is a great potential for the multicore systems to be used for WVSN as the raw performance increases come from the increasing number of cores instead of the frequency. This approach will result in low power consumption [14]. However, there are different challenges in exploiting the parallelism in this emerging technology. The challenges include the parallel programming techniques, compilers for these architectures, and the management of memory hierarchy. The available vision libraries on uniprocessors are required to be tailored for multicore processors in order to exploit the parallel nature of image processing operations such as spatial filtering [5,15].

The implementation of vision processing on reconfigurable platforms offers performances which are competitive with ASICs and, at the same time, providing flexibility in relation to design changes. In WVSN, the application requirement is often to capture the data for a particular time and then switch the node to a sleep mode so as to conserve the energy. This low duty cycling approach is suitable when the platform has a small sleep power. With the advent of FLASH based FPGAs with a small sleep power consumption [4] and development of techniques to use SRAM based FPGAs [16] effectively for duty cycle applications, the reconfigurable platforms are the choice of WVSN with regard to data intensive tasks. Uniprocessors such as embedded processors are commonly employed for vision processing because of the availability of ready-to-use libraries. In our previous work [17], a system has been implemented on both software and hardware platforms.

The functions of this system can be used in this case to provide a comparison of the processing complexity on the software and hardware platforms. By software, we mean microcontroller implementation and by hardware, we mean FPGA implementation. The vision functions used in the system include background subtraction, segmentation, morphology, filtering, labelling, feature extraction, and compression. For background subtraction, thebackgroundisstoredin the initial stage in the FLASH memory and then subtracted from the current frame. After this, manual segmentation is applied in order to segment the objects from the background.

In morphology, a 3 x 3 erosion followed by dilation is applied and, in low pass spatial filtering, the previous binary frame is stored and then subtracted from the current frame to remove the unwanted objects. In labelling and feature extraction, objects are first assigned unique labels after which features information in the form of area and centre of gravity is calculated. The input sample image used in this experiment is shown in Figure 12(a). The input image used for this experiment has a size of 640 x 400 and contains real objects in the form of magnetic particles. These particles are used to predict failure in hydraulic machines. The processing time and energy consumed by each of the vision functions are given in Table 2. The processing time of these vision functions on the software and hardware platforms is represented by ThAVR32 and ThFPGA, respectively. The power consumption of each of the algorithms on the hardware platform is represented by ThFPGA, the energy of the software platform is represented by ThAVR32, and the energy consumption on the hardware platform is represented by ThAVR32. Table 2 shows that the vision functions require a small execution time on the hardware because of inherent parallelism as compared to the software platform. This results in a small energy consumption on the hardware platform. It must be noted that implementing more vision functions on the hardware will require both high design and development time.

On the contrary, the software platform has both a small design and development time because of the availability of ready-to-use libraries. Depending on the requirements and constraints of the application, it is possible to select any of the platforms.

4.2. Energy Consumption. For battery operated wireless smart camera systems, the lifetime is an important consideration [3, 18, 19]. Battery lifetime can be extended by reducing energy consumption. The energy reduction can be achieved by reducing the processing time and/or the average power consumption. Processing time can be reduced by efficient implementation techniques and by introducing high performance embedded platforms. The power consumption in embedded platforms can be categorised into dynamic and static power. Dynamic power depends on the design and is related to switching signals from 0 to 1 or vice versa. Static power is related to power consumption when there is no switching on the signals and it is a function of the physical implementation. The dynamic power consumption is given by

[p.sub.dynamic] = Cf[V.sup.2.sub.dd], (3)

where C is capacitance, f is frequency, and [V.sub.dd] is voltage.

In some applications, the peak performance is not always required so the operating frequency can be reduced for the time during which the node is in low performance node [20]. This will linearly decrease the power consumption as is evident in (3) shown by symbol f. However, in some real time applications, the timing constraints of the system may be violated by lowering the frequency. When the frequency is reduced for a design, requiring the same performance all the time, the frequency reduction would not affect the energy consumption because the design would take a longer execution time with a small frequency. The two factors, namely, the design complexity and voltage scaling, can offer a reduction in energy consumption. The design complexity is related to the capacitance C and voltage scaling is related to [V.sub.dd] as shown in (3).

On system level, the voltage and frequency parameters for VSN's architecture are fixed [2, 4, 19, 21] because the components such as the interconnection between the devices, lighting, and memory require a fixed voltage. However, other alternatives such as better duty cycling approach, in node processing and suitable devices with low active and sleep power consumption can be investigated for extending the battery lifetime. For in node processing, complexity information of vision processing algorithms can be investigated with the help of proposed complexity model. This complexity information can be used as input for preimplementation evaluation tools such as Xilinx Xpower estimator [22] and Altera Early Power estimators [23] for power measurement and resource utilization before actual implementation.

Following this, we will investigate a number of systems as a case study in order to provide a comparison of the vision systems. For this purpose, we will use the system taxonomy in order to identify a common class of systems with respect to the experimental system. After classification, a complexity model is used for resource estimation and then the vision systems are compared for implementation on a single generic architecture. The description of the experimental system is now provided.

5. Case Study: Particle Detection

To demonstrate the use of the system taxonomy and complexity model, a vision system [4], which we have developed for failure prediction of industrial machines, is selected as a reference system. The main focus in this system is to develop image processing/analysis methods to automatically detect magnetic particles, which are detached from machines and then transmit the information of these particles over a wireless link to the user. The vision function flow is shown in Figure 11 and the images at different stages of the algorithm are shown in Figure 12. By employing the approach of partitioning the vision functions between the VSN and the server [4], the vision functions such as image capturing, background subtraction, segmentation, filtering, and post-processing function, that is, ITU-T G4 compression, are performed on the VSN.

The compressed data is transmitted to the server in order to process the remainder of the vision functions. This system is classified by using the system taxonomy, depicted by means of a dashed line in Figure 3. The extended functions are shown with labels such as A for storage and B for segmentation. After the system classification, the complexity of vision functions is analyzed with the assistance of the complexity model of Table 1 and the outcome of this analysis is concluded in Table 3. After the classification and complexity analysis of the reference system, the architecture for this system is presented. In Section 6, we will identify systems with similar requirements with the assistance of taxonomy and will evaluate the target architecture for their implementations. In this manner, a single generic architecture can be identified/proposed for systems with similar requirements.

5.1. Target Architecture. The target architecture is presented in Figure 13 which includes a CMOS Micron Imaging sensor, MT9V032, and can be programmed through an I2C interface for different resolutions in real time. The camera has a maximum clock frequency of 26.6 MHz and is able to produce 60 frames per second. For vision processing, the Xilinx Spartan6 XC6SLX9L FPGA [28] is selected, which has 5720 logics, having 32,18 Kbits block rams and 90 Kbits distributed rams. The vision functions include capturing, background subtraction from a frame stored at the initial stage in the FLASH memory, segmentation, filtering, and ITU-T G4 compression.

A serial FLASH memory [29] of 64 Mbits is used for background storage. For handling transmission, a software platform SENTIO32 [4] is usedwhichhasa32bitAVRmicrocontroller, AT32UC3B0256 [30] and an IEEE 802.15.4 compliant transceiver (CC2520) [31]. In the proposed target architecture, the FPGA has 12.5 mW static power and a dynamic power of 16.92 mW for the design, which includes algorithms such as background subtraction, filtering to remove noise, segmentation, and compression, the AVR32 microcontroller has 77.5 mW active power, the camera has 160 mW, and the radio transceiver has 132 mW. By evaluating this architecture with a small static power consumption such as 5 [micro]W, which is claimed in FLASH based ACTEL AGL600V5 FPGA [32], a greater lifetime can be achieved for the VSN. It is concluded [18] that a VSN with this architecture results in a lifetime of approximately 5 years for a sample period of 1.5 minutes.

Tipping Points of Failure. It is important to know the conditions under which the architecture fails to offer the desired functionality. This happens because, at some point, the resources available on the architecture could not accommodate the expected design. By looking at a number of factors such as clock frequency, memory, logics, communication among devices, and field of view, it is apparent that the clock frequency could not be a failure point in the architecture when there are no real time constraints. Depending on the speed, the clock frequency may increase or decrease the frame rate. For example, by lowering the clock frequency from 26.6 MHz to 13.3 MHz for the proposed target architecture in Figure 13, the frame rate is reduced from 60 to 30. It is important to note that in real time systems, where the timing constraints are important, it could result in a failure of the system. The architecture will fail to offer the required functionality and performance when the available resources, that is, internal memory, external memory, and logics are limited. Suppose, in the experimental system, the image sensor is changed to a size of 2000 x 3000 and the vision function, low pass filtering of Figure 10, is moved from the server to the VSN. This function requires the storage of a complete binary image in the internal memory in order to perform the operation. The complete binary image for the specific resolution would require 5859.37 Kbits in the internal memory, while the total internal memory available is 576 Kbits (32, 18 Kbits block rams). Similarly, when the RGB image is required to be stored for background, it would require 137.32 Mbits and the total available FLASH is 64 Mbits.

The communication among different devices, that is, hardware, software, and transceiver is important for a stable architecture because different devices are running at different speeds. There is a requirement to select a suitable size buffer in order to handle device communication, failing which could cause overflow/underflow for the data in a pair device. In real time systems, this could cause system failures in the systems. One of the critical factors of a VSN is the coverage area of the image sensor. A VSN is able to monitor a limited number of objects within the field of view and the missing of some of the objects may lead to a failure of the system, that is, in surveillance applications.

6. Comparison of Vision Systems

A comparison of different vision solutions is essential for the improvement of the current research work. In a traditional method of comparison, the systems under consideration must be implemented on each other's architecture. Suppose we have selected six sample systems (V1, V2, V3, V4, V5, and V6). The system V1, selected as the reference system, must be compared with the other five systems. In the traditional approach, depicted by Figure 14, five systems (V2, V3, V4, V5, and V6) must be implemented on the reference system's architecture. Similarly, the reference vision system is required to be implemented on the corresponding architectures of the five vision systems. This requires a significant amount of design effort and time. When the reference system is changed, the aforementioned process is repeated once more. It means that, for comparing N systems, [(N - 1).sup.2] implementations are required. By employing the system taxonomy and complexity model for comparison, the need for actual implementation can be circumvented. This approach is depicted in Figure 15. The six systems (V1, V2, V3, V4, V5, and V6) are firstly classified by using the system taxonomy to identify systems with similar functionality with regard to reference system V1.

The systems such as system V5 and V6, having different functionality, are dropped from any further investigation. In this way, larger problem space is reduced to a smaller one. After classification, the complexity model is used to generate complexity parameters, that is, arithmetic complexity, memory resources, and device selection, that is, processing platform, radio transceiver, and microcontroller for control functions. After the comparison, a single generic architecture can be proposed or an existing architecture can be employed for real implementation.

6.1. Example for Comparison of Systems. In this example, an actual vision system V1 [4] is compared with the other five systems (V2 [2], V3 [26], V4 [27], V5 [21], and V6 [19]) by using our proposed approach of Figure 15. We need to identify the systems which have similar functionality with respect to system V1. After this, the complexity model is used to estimate the resources and then a single generic architecture is evaluated for these systems.

Systems Classification. The systems are firstly classified by using the system taxonomy, and based on those systems having a common functionality, are grouped into one class. By looking at all six systems, it is concluded that systems V2 [2], V3 [26], and V4 [27] have similar functionality with respect to V1 [4]. The common functionality is classification on binary data because vision functions include background subtraction, segmentation, spatial filtering, and segmentation. The system V3 [26] is placed in this group because its vision functions are similar to those of the reference system V1, but background storage is not used by the authors. This is, however, required for real time implementation. The vision systems V5 [21] and V6 19] are dropped from further investigation because in V5 [21] the background is updated periodically and in V6 19], and the system does not use segmentation function which converts data into binary format. The taxonomy showed that these systems do not involve classification on binary data. Therefore, these systems cannot utilize the architecture, proposed in Section 5.1.

Resource Estimation. Following the system classification, the resource requirements for the systems are estimated by using the complexity model. After this, the Xilinx synthesis tool [28] is used to generate the resource information, that is, logic and memory. The arithmetic operations are converted into logics per sample; for example, for subtraction with 8 bits pixel, 9 logics are required and the total logics of the design include logics for arithmetic, synchronization, and compression. The logics used for the synchronization of data are considered to be similar for systems when the pixel depth and line size are similar; otherwise, they were obtained separately from the synthesis tool. The line memory (L_MEM), frame memory (F_MEM), and logics are in close proximity with respect to the reference system V1 resources. Therefore, the systems are evaluated for implementation on a single architecture.

Evaluation on Single Generic Architecture. Following the resource estimation, the investigated systems are evaluated for implementation on a single generic architecture. We can develop a new architecture or can use the existing architecture of Section 5.1 which is suitable for all of these systems. In these systems, the approach used by the system V1 can be employed in which vision functions such as background subtraction, segmentation, spatial filtering, and binary compression are implemented on VSN.

The compressed data is transmitted to the server for further processing. In Table 4, the percentage of resources used by each system is given with respect to the resources of architecture, presented in Section 5.1. The proposed architecture has 5720 logics, 32,18 Kbits block rams, and 90 Kbits distributed rams. The systems V1 [4], V2 [2], V3 [26], and V4 [27 ] have resource requirements within the range of the architecture's resources and can therefore be implemented on the target architecture.

The performance parameters of different systems, given in Table 4, show that the system V1 has a greater sampling pixel frequency of 15.2 MHz. The pixels frequencies for other systems are given for their respective older implementations, which can be scaled to 15.2 MHz when implemented on the architecture of Section 5.1. This demonstrates that, with the assistance of our proposed approach, the system complexity can be easily estimated and a number of systems can be compared without the need for actual implementation. After this, a single generic architecture can be proposed for the same class of systems. This concludes that taxonomy together with complexity model can be effectively used for classification and comparison of solutions which in turn helps in proposing generic solutions.

7. Future Research

With the advancement in technology, the multicore processors are expected to grow in the future within the field of embedded vision systems. There are a number of challenges in employing these parallel machines directly for WVSN applications. The vision functions available for software implementation were developed for single processor platforms. These algorithms must be redesigned to exploit the parallelism. The early vision functions such as background subtraction and temporal filtering require data storage in an external memory. These tasks require the processing architecture with faster storage, high bandwidth memory interfaces, and data buses. To reduce the memory latency, the embedded multicore systems are required to have Direct Memory Access (DMA) based data transfer so that the cores are executing the tasks and data transfers between various modules such as cores, internal memory, and external memories are handled by DMA [15]. This will create new design challenges in relation to creating parallel programs for multicore systems. To extend the battery lifetime, suitable dynamic frequency scaling and voltage scaling techniques must be considered for the VSN architecture.

8. Conclusion

In this paper, we have presented an abstract model for the comparison and generalization of vision solutions for wireless smart cameras, with the assistance of system taxonomy. To develop this model, we have performed a detailed analysis of the vision functions, used in the system taxonomy, in order to predict the arithmetic complexity and memory requirements. To illustrate the use of the proposed model, we have analyzed a number of published systems as a case study and classified them by using the system taxonomy. Following this, the resources have been estimated with the assistance of the complexity model. It has been demonstrated that the proposed model helps in comparison of vision systems without the requirement for any actual implementation. After comparison, a single generic architecture can be proposed for the same class of vision systems. This work may not be exhaustive; however, it will provide an abstract reference model for benchmarking and development of efficient generic solutions.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


[1] C. Leistner, P. M. Roth, H. Grabner, H. Bischof, A. Starzacher, and B. Rinner, "Visual on-line learning in distributed camera networks," in Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC '08), pp. 110, Stanford, Calif, USA, September 2008.

[2] A. Orfy, A. El-Sayed, and M. ElHelw, "WASP: wireless autonomous sensor prototype for visual sensor networks," in Proceedings of the IFIP Wireless Days (WD '10), pp. 1-5, Venice, Italy, October 2010.

[3] S. Hengstler and H. Aghajan, "Application-oriented design of smart camera networks," in Proceedings of the 1st ACM/IEEE International Conference on Distributed SmartCameras (ICDSC '07), pp. 12-19, Vienna, Austria, September 2007

[4] M. Imran, K. Khursheed, M. O'Nils, and N. Lawal, "Exploration of target architecture for a wireless camera based sensor node," in Proceedings of the 28th Norchip Conference (NORCHIP '10), pp. 1-4, Tampere, Finland, November 2010.

[5] N. K. Ratha and A. K. Jain, "Computer vision algorithms on reconfigurable logic arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 1, pp. 29-43, 1999.

[6] M. Imran, K. Khursheed, N. Ahmad, M. A. Waheed, M. O'Nils, and N. Lawal, "Complexity analysis of vision functions for implementation of wireless smart cameras using system taxonomy," in Real-Time Image and Video Processing, vol. 8437 of Proceedings of SPIE, Brussels, Belgium, 2012.

[7] B. Dieber, C. Micheloni, and B. Rinner, "Resource-aware coverage and task assignment in visual sensor networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 10, pp. 1424-1437, 2011.

[8] C. H. Lin, W. Wolf, A. Dixon, X. Koutsoukos, and J. Sztipanovits, "Design and implementation of ubiquitous smart cameras," in Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, vol. 1, pp. 32-39, Taichung, Taiwan, June 2006.

[9] B. Rinner, T. Winkler, W. Schriebl, M. Quaritsch, and W. Wolf, "The evolution from single to pervasive smart cameras," in Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC '08), pp. 1-10, Stanford, Calif, USA, September 2008.

[10] S. Tilak, N. Abu-Ghazaleh, and W. Heinzelman, "A taxonomy of wireless micro-sensor network models," ACM SIGMOBILE Mobile Computing and Communications Review, vol. 6, no. 2, pp. 28-36, 2002.

[11] S. Hengstler, "Stereo vision in smart camera networks," in Stereo Vision, A. Bhatti, Ed., pp. 73-90, InTech, Rijeka, Croatia, 2008.

[12] M. Imran, K. Benkrid, K. Khursheed, N. Ahmad, M. O'Nils, and N. Lawal, "Analysis and characterization of embedded vision systems for taxonomy formulation," in Real-Time Image and Video Processing, vol. 8656 of Proceedings of SPIE, Burlingame, Calif, USA, 2013.

[13] R. Walczyk, A. Armitage, and T. D. Binnie, "Comparative study on connected component labeling algorithms for embedded video processing systems," in Proceedings of the International Conference on Image Processing, Computer Vision and Pattern Recognition (IPCV '10), p. 7, Las Vegas, Nev, USA, 2010.

[14] G. Blake, R. G. Dreslinski, and T. Mudge, "A survey of multicore processors: a review of their common attributes," IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 26-37, 2009.

[15] S. Apewokin, Efficiently mapping high-performance early vision algorithms onto multicore embedded platforms [Ph.D. thesis], Georgia Institute of Technology, Savannah, Ga, USA, 2009.

[16] J. Valverde, A. Otero, M. Lopez, J. Portilla, E. de la Torre, and T. Riesgo, "Using SRAM based FPGAs for power-aware high performance wireless sensor networks," Sensors, vol. 12, no. 3, pp. 2667-2692, 2012.

[17] K. Khursheed, M. Imran, A. W. Malik, M. O'Nils, N. Lawal, and B. Thrnberg, "Exploration of tasks partitioning between hardware software and locality for a wireless camera based vision sensor node," in Proceedings of the 6th International Symposium on Parallel Computing in Electrical Engineering (PARELEC '11), pp. 127-132, Luton, UK, April 2011.

[18] M. Imran, K. Khursheed, N. Lawal, M. O'Nils, and N. Ahmad, "Implementation of wireless vision sensor node for characterization of particles in fluids," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 11, pp. 1634-1643, 2012.

[19] A. Kerhet, M. Magno, F. Leonardi, A. Boni, and L. Benini, "A low-power wireless video sensor node for distributed object detection," Journal of Real-Time Image Processing, vol. 2, no. 4, pp. 331-342, 2007.

[20] P. Pillai and K. G. Shin, "Real-time dynamic voltage scaling for low power embedded operating systems," in Proceedings of the 18th Symposium on Operating System Principles (SOSP '01), vol. 35, pp. 89-102, NewYork, NY, USA, 2001.

[21] M. Rahimi, R. Baer, O. I. Iroezi et al., "Cyclops: in situ image sensing and interpretation in wireless sensor networks," in Proceedings of the 3rd international Conference on Embedded Networked Sensor Systems (SynSes '05), San Diego, Calif, USA, 2005.

[22] Xilinx, Xpower estimator user guide, 2012, Xilinx power tools tutorial, 2010,

[23] Altera, Powerplay early power estimators, 2012, http://www

[24] A. Hornberg, Handbook of Machine Vision, John Wiley & Sons, Weinheim, Germany, 2006.

[25] Z.-H. Chen, A. W. Y. Su, and M.-T. Sun, "Resource-efficient FPGA architecture and implementation of Hough transform," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 8, pp. 1419-1428, 2011.

[26] A. C. Bianchi and A. H. Reali-Costa, "Implementing computer vision algorithms in hardware: an FPGA/VHDL-based vision system for a mobile rob," in RoboCup 2001, Lecture Notes in Computer Science, pp. 281-286, 2002.

[27] C. B. Margi, R. Manduchi, and K. Obraczka, "Energy consumption tradeoffs in visual sensor networks," in Proceedings of the 24th Brazilian Symposium on Computer Networks (SBRC '06), Curitiba, Brazil, 2006.

[28] Spartan-6 family overview, 2010,

[29] Micron, Numonyx Serial Flash Memory, 2007, http://www

[30] AT32UC3B0256, AVR32,

[31] CC2520 transceiver,

[32] IGLOO video kit, 2009,

Muhammad Imran, Khursheed Khursheed, Naeem Ahmad, Mattias O'Nils, Najeem Lawal, and Malik A. Waheed

Department of Electronics Design, Mid Sweden University, Holmgatan 10, 851 70 Sundsvall, Sweden

Correspondence should be addressed to Muhammad Imran;

Received 31 May 2013; Revised 7 December 2013; Accepted 12 December 2013; Published 9 January 2014

Academic Editor: Hongkai Xiong

TABLE 1: An abstract complexity model of vision functions for VSN.

                               Line memory          Frame memory

Data triggering                N.A.                 N.A.

Data sources                   W                    W

                               H                    H

Data depth                     b                    b

Colour transforms              mem = bits x 9,      N.A.
                               where bits are
                               coefficient bits
                               width and 9 are

Learning                       Application and      Application and
                               algorithm            algorithm
                               dependent.           dependent.

Frame storage                  N.A.                 mem = W x H xb

Frame subtraction              Size depends on      Frame storage is
                               the background       required.

  Thresholding                 N.A.                 N.A.

  Adaptive Thresholding        mem = [(n-1) x W x   N.A.
                               b] bits where n is
                               the number of rows
                               for a

  Edge based *                 mem = [(n-1) x W x   N.A.
                               b] bits.

  Region based                 Size depends on      Size depends on
                               the selection of     the selection of
                               seed region.         seed region.

  Watershed *                  Internal memory      mem = W x H xb
                               required depending   bits.
                               on technique used.

  Linear spatial               mem = [(n-1) x W x   N.A.
                               b] bits

  Order statistic              mem = [(n-1) x W x   N.A.
                               b] bits

  Binary morphology            mem = [(n -1) x W    N.A.
                               x b] bits

  Grey scale morphology        mem = [(n -1) x W    N.A.
                               x b] bits

Labelling                      [mem.sub.tot] =      In classical
                               [mem.sub.BUF] +      approach, mem = W
                               [mem.sub.LOOKUP] +   x H xb bits.
                               (see (2))

Feature extraction             [mem.sub.area] =     N.A.
                               [N.sup.2] x no of
                               [mem.sub.cog] = (2
                               x no of objects.
                               [] =
                               [C.sub.i])) x no
                               of objects.

Classification                 Application and      Application and
                               machine learning     machine learning
                               algorithm            algorithm
                               dependent.           dependent

Recognition                    Depends on design.   Depends on the
                                                    number of objects,
                                                    training samples,
                                                    and algorithm.

Tracking                       Application and      Application and
                               learning algorithm   learning algorithm
                               and objects          and objects
                               dependent.           dependent

Intensity transforms           Histograms mem =     N.A.
                               1L x bw where bw =
                               log,(VE x H) and
                               bw is bits for W x
                               H pixels and L is
                               intensity levels.

Spatial transforms             Memory buffers       mem = W x H xb
                               required depending   bits. In case of
                               on the technique.    multiple rotation
                                                    and image

Maths transforms

  Fast Fourier transform       Memory buffers       mem = W x H xb
                               required depending   bits. For
                               on the design.       coefficients
                                                    storage mem = 2((W
                                                    x H)I2) x

  Discrete cosine transform    [mem.sub.coeff] =    mem = W x H xb
                               bits xN x N where    bits. In the case
                               bits are             of parallel block
                               coefficient bits     processing.
                               and N x N is pixel
                               [] = N
                               x W x b

  Discrete wavelet transform   mem = Lx N x b
                               where L is
                               the number of rows   mem = W x H xb
                               and N is the         bits.
                               number of pixels.

  Hough transform              Memory buffers       mem = W x H xb
                               required for         bits. W xHxrxlx
                               intermediate         sqrt([W.sup.2] x
                               operations.          [H.sup.2]) x K
                                                    where r is ratio
                                                    of nonzero pixels
                                                    in a binary image
                                                    and K is number of
                                                    angles [25].

  Lossy compression            Depends on the IP    Depends on the IP
                               cores being used.    cores being used.

  Lossless compression         Depends on the IP    Depends on the IP
                               cores being used.    cores being used.

                               Arithmetic           Comments

Data triggering                N.A.                 Depending on the
                                                    triggering can be

Data sources                   W                    W is width and H
                                                    is image height.

                               H                    Scaling of W x H
                                                    affects mem. and

Data depth                     b                    b is bits per
                                                    pixel. Scaling b
                                                    affects the

Colour transforms              RGB to YCrCb         Depending on the
                               conversion require   application,
                               W x H x 9            images can be
                               multiplications      captured in
                               and W x H x 6        binary, grey
                               additions/           scale, or colour
                               subtractions.        format.

Learning                       Application and      Offline classifier
                               algorithm            with online
                               dependent            training is
                                                    preferable [1],

Frame storage                  N.A.                 Memory size
                                                    depends on the
                                                    pixel and image

Frame subtraction              Technique            Static background
                               dependent. Simple    or generated using
                               frame subtraction    modelling
                               of size W x H,       technique is used.
                               require W x H

  Thresholding                 Complete image       Thresholding is
                               require W x H        used for simple
                               logical              cases.

  Adaptive Thresholding        A technique based    Complete image
                               on a neighbourhood   requires
                               of n x n requires    Wx.Hx.11x.11 add.,
                               n x n additions, 1   W x H divs., and
                               division and 1       comp, required.
                               comparison is

  Edge based *                 Filter mask of m x   Filtering an image
                               n requires nm        W x H, with a mask
                               additions, nm        m x 11, requires W
                               multiplications.     xHxmx n adds, and
                                                    W xHxmx 11 mults.

  Region based                 Complexity depends   These techniques
                               on the               require much
                               selection of seed    computational
                               and order in which   time.
                               pixels and region
                               are examined.

  Watershed *                  Depends on the       Having efficient
                               technique such as    techniques but
                               immersion/           still not suitable
                               flooding based and   for real time.

  Linear spatial               The avg. filtering   For complete
                               mask requires        image, multiplied
                               weighted avg.        image size with
                               requires mults,      mask such as, for
                               adds., and div.      averaging W xHxmx
                                                    11 adds., W x H

                               The complexity of
  Order statistic              these filters        The bit level
                               depends on the       implementation is
                               selection of         usually better for
                               window size and      hardware.
                               the sorting

  Binary morphology            Erosion and          The run time
                               dilation with a      complexity of
                               mask m x n require   dilation of two
                               m x n operations     images is O (WH).

  Grey scale morphology        Dilation and         Mathematical
                               erosion algorithms   morphology is
                               have runtime         extended to grey
                               complexity of 0(     scale images which
                               WHn) where n is      is based on the
                               roughly the number   notion of minimum
                               of points on the     and maximum.
                               boundary of flat

Labelling                      Arithmetic           Single pass is
                               complexity depends   preferred for real
                               on the algorithm,    time component
                               image size, and      labelling [13].
                               number of objects
                               in the image.

Feature extraction             Complexity           Different types of
                               requirements         features include
                               involved in          position, width,
                               feature              height, bounding
                               calculation depend   box, area, and
                               on the number and    centre of gravity
                               size of objects.     [13, 24].

Classification                 This field is        SVM is suitable
                               still at an          for on-node
                               immature stage so    operation due to
                               it is a              its low arithmetic
                               challenging task     complexity [2].
                               to give absolute

Recognition                    The runtime          To avoid design
                               complexity depends   complexities, it
                               on the number of     is better to use
                               poses which can be   standard software
                               reduced to gain      packages.
                               the speed.

Tracking                       Tracking             Light weight
                               algorithms can be    algorithms are
                               simplified by        developed with
                               imposing             certain
                               constraints on the   constraints.
                               motion and
                               appearance of the

Intensity transforms           Complexity depends   These spatial
                               on the technique     domain processings
                               such as pricewise    are
                               linear transform     computationally
                               or histogram         efficient and
                               processing being     require less
                               selected.            resources for

Spatial transforms             The arithmetic       This transform
                               complexity depends   includes matrix
                               on the operations    transposes, image
                               performed.           rotation, scaling,
                                                    and registration.

Maths transforms

  Fast Fourier transform       Direct               Fast algorithms
                               implementation of    like radix-
                               DFT requires         [2.sup.m], FPA,
                               [(WH).sup.2]         and FHT are good
                               operations while     candidates in real
                               FFT reduces it to    time systems.
                               WHlog WH

  Discrete cosine transform    Direct               [mem.sub.coef] is
                               implementation of    memory for
                               2-D 8x8 DCT          precomputed
                               requires 1024        coefficients and
                               multiplications      [] is
                               and 896 additions.   memory for

  Discrete wavelet transform   The arithmetic       There are
                               complexity           different
                               depends on the       implementation
                               method being         methods. Lifting
                               selected.            based is a widely
                                                    used method.

  Hough transform              For calculating      The Hough
                               10% feature points   transform requires
                               by using (1), an     complex
                               image size of 256    computations. Some
                               x 256 requires 2.3   efficient
                               M multiplications    solutions include
                               and 1.1 M adders     line based,
                               with an angle of     multiplierless,
                               180[degrees] using   incremental [25],
                               a voting             and a Hough
                               algorithm.           transform using

  Lossy compression            Lossy compression    IP core is
                               is based on DCT      preferred because
                               and DWT which are    of the
                               discussed above.     availability.

  Lossless compression         Depends on the
                               algorithm and
                               image contents as
                               well as image

* The memory requirements for edge based segmentation are given for
techniques involving local processing. The memory requirements for
edge based segmentation involving global processing are given in the
column for Hough transforms. The memory requirements for watershed
segmentation are given for techniques involving morphological

TABLE 2: Vision functions on software and hardware platform.

Vision functions          ThAVR32 (ms)   ThFPGA (ms)   ThFPGA (mW) E

Background subtraction       332.5          19.91           0.34
Segmentation                  225           19.91           0.13
Morphology                    2327          19.97           1.14
Low pass filter              202.5          19.91           0.18
Labelling and features        2610          19.91           2.7
ITU-T G4 compression          345           19.97           1.42

Vision functions          AVR32 (mJ)    Th FPGA (J

Background subtraction       25.7          6.76
Segmentation                 17.4          2.58
Morphology                  180.3         22.76
Low pass filter              15.6          3.58
Labelling and features      202.3         53.76
ITU-T G4 compression         26.7         28.35

TABLE 3: Arithmetic complexity and memory estimation of reference VSN.

Functions                 Line memory              Frame memory

Data triggering              N.A.                      N.A.

Data sources            H = 640 W = 400           H =640 W = 400

Data depth                b = 8 bits.               b = 8 bits.

Frame storage                N.A.              mem = 640 x 400 x 8 =
                                                   2048000 bits.

Frame subtraction            N.A.                      N.A.

Segmentation                 N.A.                      N.A.

Filtering            mem = [(3-1)-640-1] =             N.A.
  Binary             1280 bits. mem = [(3-
  Morphology        1) -640-1] = 1280 bits.

Postprocessing      mem = [3-640-1] = 1920             N.A.
  lossless          bits 3 line buffers for
  compression         coding and Internal
                    memory are required for
                    storing 2 (27) + 2 (64)
                      + 13 Huffman codes.

Functions            Arithmetic operations           Comments

Data triggering              N.A.                  Time driven.

Data sources                 N.A.              Area scan, 640 x 400.

Data depth                b = 8 bits.           Grey scale, 8 bits.

Frame storage                N.A.                      N.A.

Frame subtraction     640 x 400 = 256000        Background image is
                    subtractions/additions.   stored in FLASH memory.

Segmentation          640 x 400 = 256000      Manual thresholding is
                    comparison operations.             used.

Filtering           640 x 400 x 9 = 2304000   For a mask of 3x3 being
  Binary             AND operations. 640 x     used, 2 line buffers
  Morphology         400 x 9 = 2304000 OR        are required for
                          operations.           dilation and 2 for

Postprocessing       The ITU-T G4 is used     Objects are in a white
  lossless              which includes            colour and the
  compression        arithmetic operations    background is black so
                     for run length coding       bilevel ITU-T G4
                         and entropy.          compression scheme is
                                                 used, which is a
                                               lossless compression

TABLE 4: Cost and performance parameters of different systems.

Systems/     Image size   L.MEM    F_MEM    Arithmetic   Logic
references                                  operations   used

V1 [4]       640 x 400    4480    2048000    5120000     1873
V2 [2]       640 x 480    1920    7372800     614400     1221
V3 [26]      320 x 240     960       0        384000     1824
V4 [27]      640 x 480    1920    7372800     614400     1221
V5 [21]      128 x 128    N.A.     N.A.        N.A.      N.A.
V6 [19]      640 x 480    N.A.     N.A.        N.A.      N.A.

Systems/     % age   % age   % age   Max.   Max. Pixel
references   LMem    LMem    logic   fps    freq (MHz)

V1 [4]        4.9     3.2     32     59.5      15.2
V2 [2]        2.1    11.5     21      25       7.6
V3 [26]       1.0     0.0     31     N.A.      N.A.
V4 [27]       2.1    11.5     21     1.3       0.4
V5 [21]      N.A.    N.A.    N.A.    4.1       0.06
V6 [19]      N.A.    N.A.    N.A.    5.7       1.7

Systems/       Implement
references    on proposed

V1 [4]          [check]
V2 [2]          [check]
V3 [26]         [check]
V4 [27]         [check]
V5 [21]            x
V6 [19]            x
COPYRIGHT 2014 Sage Publications, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2014 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Imran, Muhammad; Khursheed, Khursheed; Ahmad, Naeem; O'Nils, Mattias; Lawal, Najeem; Waheed, Malik A
Publication:International Journal of Distributed Sensor Networks
Article Type:Case study
Date:Jan 1, 2014
Previous Article:Spatial-temporal correlative fault detection in wireless sensor networks.
Next Article:Smart solutions in elderly care facilities with RFID system and its integration with wireless sensor networks.

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |