Complexity analysis of vision functions for comparison of wireless smart cameras.
Vision systems implemented on wireless smart cameras have recently been the focus of research for many applications including surveillance , recognition , traffic monitoring, personal care , and industrial monitoring . Often, a number of wireless smart cameras are spread over an area to form a network called a Wireless Vision Sensor Network (WVSN). In the WVSN context, the individual wireless smart camera is referred to as a Wireless Vision Sensor Node (VSN). Each VSN consists of an image sensor, an embedded processing platform, memory, battery or an alternative energy source, and a wireless link. Designing a VSN on an embedded platform is a challenging task because resources are limited compared to those for general purpose platforms. General purpose platforms offer greater design and implementation flexibility; however, these systems are often considered as being unsuitable for real time applications. Therefore, our focus is on vision systems implemented on embedded platforms. When designing a VSN for a particular application, the designers must firstly investigate the complexity of the design, and a failure to do this may result in both increased design costs and a longer developmental cycle. The vision functions can be implemented on microprocessors, dedicated hardware such as Application Specific Integrated Circuits (ASICs), or on programmable hardware such as Field Programmable Gate Arrays (FPGAs).
The microprocessors are widely used as General Purpose Processors (GPPs) and Application Specific Instruction-set Processors (ASIPs). The general purpose processors are focused for average performance and greater flexibility while the ASIPs, such as the Digital Signal Processor (DSP), are focused for specific performance and specific flexibility. The GPPs and ASIPs are software based and it is easyto be certain of the correctness of the code and the system performance by means of simulations in the software. The performance of microprocessor based platforms is often limited as compared to that for an ASIC and reconfigurable hardware platforms because, in general purpose processors, every instruction must be retrieved and decoded before execution .
An ASIC is customized for a specific application; therefore it will provide a better performance and low power consumption but the design cost is high due to nonrecurring engineering (NRE) and manufacturing costs associated with small volumes. Moreover, the ASIC solutions are customized for specific applications. Programmable logic devices (PLDs), which include Field Programmable Gate Arrays (FPGAs), are replacing traditional logic circuits by offering advantages such as small size, low power, and high performance in relation to the disadvantages associated with custom ASICs. The reconfigurable computing allows the user to program at a low level and supports general purpose computing by virtue of reconfigurability . The choice of processing platform for a particular system is dependent on the performance requirement and constraints of the particular application. The features including smaller cost for low volume products, reconfigurability, and parallelism make FPGA a suitable platform for wireless smart cameras. Therefore, in this work we will consider complexity estimation for wireless smart camera functions on FPGA based platforms.
Currently, the trend in WVSN is to propose specific solutions and each solution requires a great deal of design and development effort and cost on FPGA based platforms. Proposing generic solutions which fulfills the needs of different solutions would reduce these efforts and cost. In this way, the efforts being utilized for individual solutions can be diverted to development of single optimized solutions. For proposing generic solutions for existing different individual solutions, the investigated solutions need to be implemented on each other architectures. This requires a lot of design and development efforts. Our goal is to propose a mechanism in the form of complexity model and system taxonomy which can assist the comparison of existing solutions and development of generic solutions without the need for actual implementation. To the best of our knowledge, no model exists which can facilitate the researchers/designers in comparison and generalizations of different smart camera solutions.
The comparison of vision solutions is necessary for the improvement of current research and for proposing generic solutions within this field. In relation to the complexity analysis, the arithmetic complexity, memory requirement, and suitability of vision functions for vision systems are investigated. To illustrate the use of a complexity model, a number of actual systems have been classified with the assistance of the system taxonomy and the resources are then estimated by using a complexity model. After the complexity estimation, the vision systems are compared and evaluated for implementation on a single generic architecture. It is worth mentioning that this paper is extended version . Following this, Section 2 presents related work, Section 3 describes the problem, Section 4 presents the complexity model, Section 5 presents the case study, Section 6 shows the comparison of the vision systems, and Section 7 illustrates the future challenges while Section 8 provides the conclusion.
2. Related Work
Before describing the problem area, some of the related work published in the literature is presented in this section. Different solutions are proposed by researchers for WVSN problems. Currently, the focus is on particular solutions for particular problems in WVSN. Hengstler and Aghajan  proposed an application oriented design methodology for a VSN. However, the authors consider only the specific case of tracking objects using a single camera and a stereovision VSN. Dieber et al.  presented a formulation and approximation method for the camera selection and task assignment in order to achieve the required monitoring activities. The tradeoff between surveillance quality and resource utilization has been investigated. The design aspects and software modelling for a ubiquitous real time camera system are investigated by Lin et al. . The authors divide the design aspects into two categories, namely, the general and the application specific. Taxonomy for a VSN is proposed by Rinner et al.  based on platform capabilities, the degree of distribution processing, and system autonomy. Some of the existing vision systems implemented on VSNs are described and classified according to the proposed taxonomy. In addition, some of the challenges associated with VSNs are described. Tilak et al.  classified the wireless microsensor networks from a communication protocol perspective. Different types of communication functions, data delivery models, and network dynamics are discussed for wireless sensor networks; however, there is no discussion in relation to camera based sensor systems. Generally, the camera based sensors produce twodimensional data which requires greater processing capabilities, greater memory, high power consumption, and a high communication bandwidth as compared to traditional sensor networks. Therefore, the requirements of visual sensor networks are different.
3. Problem Statement and State of the Art
An investigation of the related work shows that the focus is on particular aspects of the vision systems. There is no common mechanism for complexity estimation and for comparison of different vision systems, which is necessary for proposing generic solutions and the improvement of research within this field. When comparing two different vision systems, each of the vision system is required to be implemented on the other's architecture as depicted in Figure 1. This requires a great deal of effort in relation to both the design and its development. In some cases, a researcher does not have access to another researcher's architecture. As the number of systems increases for comparison, it becomes less feasible to implement all the vision systems. This necessitates the building of a common tool, which can facilitate the researchers in both the benchmarking and development of different classes of vision systems. In order to meet this demand, we have proposed an abstract model for complexity analysis with the assistance of system taxonomy.
The mechanism that has been adopted in order to develop this model is shown in Figure 2. The problem space is identified, which is to propose a mechanism or at least an abstract model for comparison of different wireless smart camera systems. In relation to this, a number of published vision systems, wired and wireless, individual standalone vision systems, and vision nodes in Wireless Vision Sensor Networks (WVSNs) have been surveyed and a large number of functions were extracted. Similar functions were grouped together to make an abstraction of vision functions. The abstracted vision functions are then used to develop our proposed system taxonomy. The vision functions are used for building the taxonomy because the majority of vision applications in wireless smart cameras focus on target detection, analysis, and the recognition of objects present in the field of view . The taxonomy building is an iterative process and it can be modified to accommodate future developments within the field so there are back and forth arrows. The taxonomy may not be exhaustive, but it does cover both the fundamental and the common vision functions which are required for a wide range of VSN vision systems. The study  showed that our proposed taxonomy covers 95 percent of the investigated systems. After the system taxonomy, the complexity in terms of arithmetic operations and memory of vision functions are investigated for the complexity model. The complexity model together with the taxonomy can be used for comparison and development of a generic architecture for different classes of VSN. The proposed taxonomy is shown in Figure 3. In this, some of the functions are labeled by capital letters, which are further expanded in Figures 4, 5, 6, 7, 8, 9, and 10.
The system taxonomy is grouped into 9 levels including data triggering type, data sources, data depth, learning, storage requirement, vision processing, postprocessing, output type, and feedback. A VSN is categorized into three types, based on how the system is triggered to start processing. In time driven systems, the processing is performed after a certain time duration. In event/query driven systems, processing starts when a system is triggered by a certain event or query. Periodic systems start processing after a fixed duration of time. The system can receive data using three types of sources including: area scan, line scan, or from another system. Images can be captured in binary, grey scale, or in colour format. Conversion from one format to another would require additional resources. Therefore, a better strategy is to capture the image in the relevant format. For some applications, the systems learn about the environment in order to adapt to it, while, in some applications, there is no need for learning. Similarly, there could be a requirement for storage in some systems in order to store frames, for example, for subtraction, for temporal filtering, or for template matching, while in others, there is no need of storage.
The vision processing level in Figure 3 shows the abstraction of vision functions which, while not exhaustive, include typical processing functions required for VSN. After the processing level, postprocessing occurs, which includes functions for the reformatting of data, that is, compression algorithms in order to make the output data suitable for communication purposes. Note that, in general, functions have an alternative path, which is able to circumvent them, in order to show that they might not be required for some VSNs. A VSN is also characterized by the output type it produces. The output can be a matrix, vector, scalar, or flag and can be sent directly to the user or can be used for feedback. The dashed line represents the system's flow for one experimental system, presented in Section 5 for illustration purposes. Following on from the system taxonomy, the complexity analysis of vision functions used in the system taxonomy is presented.
4. Complexity Model
The complexity analysis is a challenging task due to the large number of influencing factors such as the specific requirements of an application, the number of vision functions and the external environment. There is no standard definition in relation to measuring the vision system complexity.
However, in order to provide an abstract model, we have investigated both the arithmetic complexity and the memory requirements of the vision functions, employed in different classes of VSN systems with the assistance of the system taxonomy. The complexity analysis for some of the vision functions depends both on the situation and on the incoming data from the previous stage. Therefore, the absolute parameter for complexity measurements is a challenging task and it is not intuitive to draw quantitative conclusions. In these cases, we have analyzed and discussed the suitability of the functions for VSNs. Other parameters, such as registers and latency, are design dependent. The abstract model of complexity analysis is provided in Table 1. It is worth mentioning that for some tasks there are a number of algorithms, each with varying complexity. In this paper, we have investigated the complexity of functions that are commonly used in wireless smart camera systems . However, in this case, the system taxonomy needs to be updated periodically in order to make it more exhaustive for the current systems. Nonetheless, this complexity model can be used together with the system taxonomy for classification, comparison, and complexity estimation of vision systems, implemented on VSNs. Equation (1) is used in Table 1 with the Hough transform, while (2) is used with labelling:
[rho](x,y) = x cos [theta] + y sin [theta], (1)
[mem.sub.total] = [me.sub.mBUF] + [mem.sub.LOOKUP] + [mem.sub.DATA], where [mem.sub.BUF] = ([log.sub.2]([CC.sub.max] + [CC.sub.col] + l)) x C,[mem.sub.LOOKUP] = [mem.sub.DATA], and [mem.sub.DATA] = (2 ' ([log.sub.2](R))). (2)
In (2), C represents column, R represents Row, CCmax is the maximum number of connected components, CCcol is the number of label collisions, and +1 is because the 0 is a preoccupied label . Following this, the complexity on actual hardware is discussed.
4.1. Complexity on Different Processing Platforms. Early vision processing tasks such as background subtraction, spatial filtering, have inherited parallelism. This parallelism can be exploited by using either a hardware platform or processors with multicores. With the advancements in technology, parallel machines with multicores have now spread from supercomputing to embedded computing. The recent evolutions of parallel machines have drawn the attention of researchers. There is a great potential for the multicore systems to be used for WVSN as the raw performance increases come from the increasing number of cores instead of the frequency. This approach will result in low power consumption . However, there are different challenges in exploiting the parallelism in this emerging technology. The challenges include the parallel programming techniques, compilers for these architectures, and the management of memory hierarchy. The available vision libraries on uniprocessors are required to be tailored for multicore processors in order to exploit the parallel nature of image processing operations such as spatial filtering [5,15].
The implementation of vision processing on reconfigurable platforms offers performances which are competitive with ASICs and, at the same time, providing flexibility in relation to design changes. In WVSN, the application requirement is often to capture the data for a particular time and then switch the node to a sleep mode so as to conserve the energy. This low duty cycling approach is suitable when the platform has a small sleep power. With the advent of FLASH based FPGAs with a small sleep power consumption  and development of techniques to use SRAM based FPGAs  effectively for duty cycle applications, the reconfigurable platforms are the choice of WVSN with regard to data intensive tasks. Uniprocessors such as embedded processors are commonly employed for vision processing because of the availability of ready-to-use libraries. In our previous work , a system has been implemented on both software and hardware platforms.
The functions of this system can be used in this case to provide a comparison of the processing complexity on the software and hardware platforms. By software, we mean microcontroller implementation and by hardware, we mean FPGA implementation. The vision functions used in the system include background subtraction, segmentation, morphology, filtering, labelling, feature extraction, and compression. For background subtraction, thebackgroundisstoredin the initial stage in the FLASH memory and then subtracted from the current frame. After this, manual segmentation is applied in order to segment the objects from the background.
In morphology, a 3 x 3 erosion followed by dilation is applied and, in low pass spatial filtering, the previous binary frame is stored and then subtracted from the current frame to remove the unwanted objects. In labelling and feature extraction, objects are first assigned unique labels after which features information in the form of area and centre of gravity is calculated. The input sample image used in this experiment is shown in Figure 12(a). The input image used for this experiment has a size of 640 x 400 and contains real objects in the form of magnetic particles. These particles are used to predict failure in hydraulic machines. The processing time and energy consumed by each of the vision functions are given in Table 2. The processing time of these vision functions on the software and hardware platforms is represented by ThAVR32 and ThFPGA, respectively. The power consumption of each of the algorithms on the hardware platform is represented by ThFPGA, the energy of the software platform is represented by ThAVR32, and the energy consumption on the hardware platform is represented by ThAVR32. Table 2 shows that the vision functions require a small execution time on the hardware because of inherent parallelism as compared to the software platform. This results in a small energy consumption on the hardware platform. It must be noted that implementing more vision functions on the hardware will require both high design and development time.
On the contrary, the software platform has both a small design and development time because of the availability of ready-to-use libraries. Depending on the requirements and constraints of the application, it is possible to select any of the platforms.
4.2. Energy Consumption. For battery operated wireless smart camera systems, the lifetime is an important consideration [3, 18, 19]. Battery lifetime can be extended by reducing energy consumption. The energy reduction can be achieved by reducing the processing time and/or the average power consumption. Processing time can be reduced by efficient implementation techniques and by introducing high performance embedded platforms. The power consumption in embedded platforms can be categorised into dynamic and static power. Dynamic power depends on the design and is related to switching signals from 0 to 1 or vice versa. Static power is related to power consumption when there is no switching on the signals and it is a function of the physical implementation. The dynamic power consumption is given by
[p.sub.dynamic] = Cf[V.sup.2.sub.dd], (3)
where C is capacitance, f is frequency, and [V.sub.dd] is voltage.
In some applications, the peak performance is not always required so the operating frequency can be reduced for the time during which the node is in low performance node . This will linearly decrease the power consumption as is evident in (3) shown by symbol f. However, in some real time applications, the timing constraints of the system may be violated by lowering the frequency. When the frequency is reduced for a design, requiring the same performance all the time, the frequency reduction would not affect the energy consumption because the design would take a longer execution time with a small frequency. The two factors, namely, the design complexity and voltage scaling, can offer a reduction in energy consumption. The design complexity is related to the capacitance C and voltage scaling is related to [V.sub.dd] as shown in (3).
On system level, the voltage and frequency parameters for VSN's architecture are fixed [2, 4, 19, 21] because the components such as the interconnection between the devices, lighting, and memory require a fixed voltage. However, other alternatives such as better duty cycling approach, in node processing and suitable devices with low active and sleep power consumption can be investigated for extending the battery lifetime. For in node processing, complexity information of vision processing algorithms can be investigated with the help of proposed complexity model. This complexity information can be used as input for preimplementation evaluation tools such as Xilinx Xpower estimator  and Altera Early Power estimators  for power measurement and resource utilization before actual implementation.
Following this, we will investigate a number of systems as a case study in order to provide a comparison of the vision systems. For this purpose, we will use the system taxonomy in order to identify a common class of systems with respect to the experimental system. After classification, a complexity model is used for resource estimation and then the vision systems are compared for implementation on a single generic architecture. The description of the experimental system is now provided.
5. Case Study: Particle Detection
To demonstrate the use of the system taxonomy and complexity model, a vision system , which we have developed for failure prediction of industrial machines, is selected as a reference system. The main focus in this system is to develop image processing/analysis methods to automatically detect magnetic particles, which are detached from machines and then transmit the information of these particles over a wireless link to the user. The vision function flow is shown in Figure 11 and the images at different stages of the algorithm are shown in Figure 12. By employing the approach of partitioning the vision functions between the VSN and the server , the vision functions such as image capturing, background subtraction, segmentation, filtering, and post-processing function, that is, ITU-T G4 compression, are performed on the VSN.
The compressed data is transmitted to the server in order to process the remainder of the vision functions. This system is classified by using the system taxonomy, depicted by means of a dashed line in Figure 3. The extended functions are shown with labels such as A for storage and B for segmentation. After the system classification, the complexity of vision functions is analyzed with the assistance of the complexity model of Table 1 and the outcome of this analysis is concluded in Table 3. After the classification and complexity analysis of the reference system, the architecture for this system is presented. In Section 6, we will identify systems with similar requirements with the assistance of taxonomy and will evaluate the target architecture for their implementations. In this manner, a single generic architecture can be identified/proposed for systems with similar requirements.
5.1. Target Architecture. The target architecture is presented in Figure 13 which includes a CMOS Micron Imaging sensor, MT9V032, and can be programmed through an I2C interface for different resolutions in real time. The camera has a maximum clock frequency of 26.6 MHz and is able to produce 60 frames per second. For vision processing, the Xilinx Spartan6 XC6SLX9L FPGA  is selected, which has 5720 logics, having 32,18 Kbits block rams and 90 Kbits distributed rams. The vision functions include capturing, background subtraction from a frame stored at the initial stage in the FLASH memory, segmentation, filtering, and ITU-T G4 compression.
A serial FLASH memory  of 64 Mbits is used for background storage. For handling transmission, a software platform SENTIO32  is usedwhichhasa32bitAVRmicrocontroller, AT32UC3B0256  and an IEEE 802.15.4 compliant transceiver (CC2520) . In the proposed target architecture, the FPGA has 12.5 mW static power and a dynamic power of 16.92 mW for the design, which includes algorithms such as background subtraction, filtering to remove noise, segmentation, and compression, the AVR32 microcontroller has 77.5 mW active power, the camera has 160 mW, and the radio transceiver has 132 mW. By evaluating this architecture with a small static power consumption such as 5 [micro]W, which is claimed in FLASH based ACTEL AGL600V5 FPGA , a greater lifetime can be achieved for the VSN. It is concluded  that a VSN with this architecture results in a lifetime of approximately 5 years for a sample period of 1.5 minutes.
Tipping Points of Failure. It is important to know the conditions under which the architecture fails to offer the desired functionality. This happens because, at some point, the resources available on the architecture could not accommodate the expected design. By looking at a number of factors such as clock frequency, memory, logics, communication among devices, and field of view, it is apparent that the clock frequency could not be a failure point in the architecture when there are no real time constraints. Depending on the speed, the clock frequency may increase or decrease the frame rate. For example, by lowering the clock frequency from 26.6 MHz to 13.3 MHz for the proposed target architecture in Figure 13, the frame rate is reduced from 60 to 30. It is important to note that in real time systems, where the timing constraints are important, it could result in a failure of the system. The architecture will fail to offer the required functionality and performance when the available resources, that is, internal memory, external memory, and logics are limited. Suppose, in the experimental system, the image sensor is changed to a size of 2000 x 3000 and the vision function, low pass filtering of Figure 10, is moved from the server to the VSN. This function requires the storage of a complete binary image in the internal memory in order to perform the operation. The complete binary image for the specific resolution would require 5859.37 Kbits in the internal memory, while the total internal memory available is 576 Kbits (32, 18 Kbits block rams). Similarly, when the RGB image is required to be stored for background, it would require 137.32 Mbits and the total available FLASH is 64 Mbits.
The communication among different devices, that is, hardware, software, and transceiver is important for a stable architecture because different devices are running at different speeds. There is a requirement to select a suitable size buffer in order to handle device communication, failing which could cause overflow/underflow for the data in a pair device. In real time systems, this could cause system failures in the systems. One of the critical factors of a VSN is the coverage area of the image sensor. A VSN is able to monitor a limited number of objects within the field of view and the missing of some of the objects may lead to a failure of the system, that is, in surveillance applications.
6. Comparison of Vision Systems
A comparison of different vision solutions is essential for the improvement of the current research work. In a traditional method of comparison, the systems under consideration must be implemented on each other's architecture. Suppose we have selected six sample systems (V1, V2, V3, V4, V5, and V6). The system V1, selected as the reference system, must be compared with the other five systems. In the traditional approach, depicted by Figure 14, five systems (V2, V3, V4, V5, and V6) must be implemented on the reference system's architecture. Similarly, the reference vision system is required to be implemented on the corresponding architectures of the five vision systems. This requires a significant amount of design effort and time. When the reference system is changed, the aforementioned process is repeated once more. It means that, for comparing N systems, [(N - 1).sup.2] implementations are required. By employing the system taxonomy and complexity model for comparison, the need for actual implementation can be circumvented. This approach is depicted in Figure 15. The six systems (V1, V2, V3, V4, V5, and V6) are firstly classified by using the system taxonomy to identify systems with similar functionality with regard to reference system V1.
The systems such as system V5 and V6, having different functionality, are dropped from any further investigation. In this way, larger problem space is reduced to a smaller one. After classification, the complexity model is used to generate complexity parameters, that is, arithmetic complexity, memory resources, and device selection, that is, processing platform, radio transceiver, and microcontroller for control functions. After the comparison, a single generic architecture can be proposed or an existing architecture can be employed for real implementation.
6.1. Example for Comparison of Systems. In this example, an actual vision system V1  is compared with the other five systems (V2 , V3 , V4 , V5 , and V6 ) by using our proposed approach of Figure 15. We need to identify the systems which have similar functionality with respect to system V1. After this, the complexity model is used to estimate the resources and then a single generic architecture is evaluated for these systems.
Systems Classification. The systems are firstly classified by using the system taxonomy, and based on those systems having a common functionality, are grouped into one class. By looking at all six systems, it is concluded that systems V2 , V3 , and V4  have similar functionality with respect to V1 . The common functionality is classification on binary data because vision functions include background subtraction, segmentation, spatial filtering, and segmentation. The system V3  is placed in this group because its vision functions are similar to those of the reference system V1, but background storage is not used by the authors. This is, however, required for real time implementation. The vision systems V5  and V6 19] are dropped from further investigation because in V5  the background is updated periodically and in V6 19], and the system does not use segmentation function which converts data into binary format. The taxonomy showed that these systems do not involve classification on binary data. Therefore, these systems cannot utilize the architecture, proposed in Section 5.1.
Resource Estimation. Following the system classification, the resource requirements for the systems are estimated by using the complexity model. After this, the Xilinx synthesis tool  is used to generate the resource information, that is, logic and memory. The arithmetic operations are converted into logics per sample; for example, for subtraction with 8 bits pixel, 9 logics are required and the total logics of the design include logics for arithmetic, synchronization, and compression. The logics used for the synchronization of data are considered to be similar for systems when the pixel depth and line size are similar; otherwise, they were obtained separately from the synthesis tool. The line memory (L_MEM), frame memory (F_MEM), and logics are in close proximity with respect to the reference system V1 resources. Therefore, the systems are evaluated for implementation on a single architecture.
Evaluation on Single Generic Architecture. Following the resource estimation, the investigated systems are evaluated for implementation on a single generic architecture. We can develop a new architecture or can use the existing architecture of Section 5.1 which is suitable for all of these systems. In these systems, the approach used by the system V1 can be employed in which vision functions such as background subtraction, segmentation, spatial filtering, and binary compression are implemented on VSN.
The compressed data is transmitted to the server for further processing. In Table 4, the percentage of resources used by each system is given with respect to the resources of architecture, presented in Section 5.1. The proposed architecture has 5720 logics, 32,18 Kbits block rams, and 90 Kbits distributed rams. The systems V1 , V2 , V3 , and V4 [27 ] have resource requirements within the range of the architecture's resources and can therefore be implemented on the target architecture.
The performance parameters of different systems, given in Table 4, show that the system V1 has a greater sampling pixel frequency of 15.2 MHz. The pixels frequencies for other systems are given for their respective older implementations, which can be scaled to 15.2 MHz when implemented on the architecture of Section 5.1. This demonstrates that, with the assistance of our proposed approach, the system complexity can be easily estimated and a number of systems can be compared without the need for actual implementation. After this, a single generic architecture can be proposed for the same class of systems. This concludes that taxonomy together with complexity model can be effectively used for classification and comparison of solutions which in turn helps in proposing generic solutions.
7. Future Research
With the advancement in technology, the multicore processors are expected to grow in the future within the field of embedded vision systems. There are a number of challenges in employing these parallel machines directly for WVSN applications. The vision functions available for software implementation were developed for single processor platforms. These algorithms must be redesigned to exploit the parallelism. The early vision functions such as background subtraction and temporal filtering require data storage in an external memory. These tasks require the processing architecture with faster storage, high bandwidth memory interfaces, and data buses. To reduce the memory latency, the embedded multicore systems are required to have Direct Memory Access (DMA) based data transfer so that the cores are executing the tasks and data transfers between various modules such as cores, internal memory, and external memories are handled by DMA . This will create new design challenges in relation to creating parallel programs for multicore systems. To extend the battery lifetime, suitable dynamic frequency scaling and voltage scaling techniques must be considered for the VSN architecture.
In this paper, we have presented an abstract model for the comparison and generalization of vision solutions for wireless smart cameras, with the assistance of system taxonomy. To develop this model, we have performed a detailed analysis of the vision functions, used in the system taxonomy, in order to predict the arithmetic complexity and memory requirements. To illustrate the use of the proposed model, we have analyzed a number of published systems as a case study and classified them by using the system taxonomy. Following this, the resources have been estimated with the assistance of the complexity model. It has been demonstrated that the proposed model helps in comparison of vision systems without the requirement for any actual implementation. After comparison, a single generic architecture can be proposed for the same class of vision systems. This work may not be exhaustive; however, it will provide an abstract reference model for benchmarking and development of efficient generic solutions.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
 C. Leistner, P. M. Roth, H. Grabner, H. Bischof, A. Starzacher, and B. Rinner, "Visual on-line learning in distributed camera networks," in Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC '08), pp. 110, Stanford, Calif, USA, September 2008.
 A. Orfy, A. El-Sayed, and M. ElHelw, "WASP: wireless autonomous sensor prototype for visual sensor networks," in Proceedings of the IFIP Wireless Days (WD '10), pp. 1-5, Venice, Italy, October 2010.
 S. Hengstler and H. Aghajan, "Application-oriented design of smart camera networks," in Proceedings of the 1st ACM/IEEE International Conference on Distributed SmartCameras (ICDSC '07), pp. 12-19, Vienna, Austria, September 2007
 M. Imran, K. Khursheed, M. O'Nils, and N. Lawal, "Exploration of target architecture for a wireless camera based sensor node," in Proceedings of the 28th Norchip Conference (NORCHIP '10), pp. 1-4, Tampere, Finland, November 2010.
 N. K. Ratha and A. K. Jain, "Computer vision algorithms on reconfigurable logic arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 1, pp. 29-43, 1999.
 M. Imran, K. Khursheed, N. Ahmad, M. A. Waheed, M. O'Nils, and N. Lawal, "Complexity analysis of vision functions for implementation of wireless smart cameras using system taxonomy," in Real-Time Image and Video Processing, vol. 8437 of Proceedings of SPIE, Brussels, Belgium, 2012.
 B. Dieber, C. Micheloni, and B. Rinner, "Resource-aware coverage and task assignment in visual sensor networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 10, pp. 1424-1437, 2011.
 C. H. Lin, W. Wolf, A. Dixon, X. Koutsoukos, and J. Sztipanovits, "Design and implementation of ubiquitous smart cameras," in Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, vol. 1, pp. 32-39, Taichung, Taiwan, June 2006.
 B. Rinner, T. Winkler, W. Schriebl, M. Quaritsch, and W. Wolf, "The evolution from single to pervasive smart cameras," in Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC '08), pp. 1-10, Stanford, Calif, USA, September 2008.
 S. Tilak, N. Abu-Ghazaleh, and W. Heinzelman, "A taxonomy of wireless micro-sensor network models," ACM SIGMOBILE Mobile Computing and Communications Review, vol. 6, no. 2, pp. 28-36, 2002.
 S. Hengstler, "Stereo vision in smart camera networks," in Stereo Vision, A. Bhatti, Ed., pp. 73-90, InTech, Rijeka, Croatia, 2008.
 M. Imran, K. Benkrid, K. Khursheed, N. Ahmad, M. O'Nils, and N. Lawal, "Analysis and characterization of embedded vision systems for taxonomy formulation," in Real-Time Image and Video Processing, vol. 8656 of Proceedings of SPIE, Burlingame, Calif, USA, 2013.
 R. Walczyk, A. Armitage, and T. D. Binnie, "Comparative study on connected component labeling algorithms for embedded video processing systems," in Proceedings of the International Conference on Image Processing, Computer Vision and Pattern Recognition (IPCV '10), p. 7, Las Vegas, Nev, USA, 2010.
 G. Blake, R. G. Dreslinski, and T. Mudge, "A survey of multicore processors: a review of their common attributes," IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 26-37, 2009.
 S. Apewokin, Efficiently mapping high-performance early vision algorithms onto multicore embedded platforms [Ph.D. thesis], Georgia Institute of Technology, Savannah, Ga, USA, 2009.
 J. Valverde, A. Otero, M. Lopez, J. Portilla, E. de la Torre, and T. Riesgo, "Using SRAM based FPGAs for power-aware high performance wireless sensor networks," Sensors, vol. 12, no. 3, pp. 2667-2692, 2012.
 K. Khursheed, M. Imran, A. W. Malik, M. O'Nils, N. Lawal, and B. Thrnberg, "Exploration of tasks partitioning between hardware software and locality for a wireless camera based vision sensor node," in Proceedings of the 6th International Symposium on Parallel Computing in Electrical Engineering (PARELEC '11), pp. 127-132, Luton, UK, April 2011.
 M. Imran, K. Khursheed, N. Lawal, M. O'Nils, and N. Ahmad, "Implementation of wireless vision sensor node for characterization of particles in fluids," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 11, pp. 1634-1643, 2012.
 A. Kerhet, M. Magno, F. Leonardi, A. Boni, and L. Benini, "A low-power wireless video sensor node for distributed object detection," Journal of Real-Time Image Processing, vol. 2, no. 4, pp. 331-342, 2007.
 P. Pillai and K. G. Shin, "Real-time dynamic voltage scaling for low power embedded operating systems," in Proceedings of the 18th Symposium on Operating System Principles (SOSP '01), vol. 35, pp. 89-102, NewYork, NY, USA, 2001.
 M. Rahimi, R. Baer, O. I. Iroezi et al., "Cyclops: in situ image sensing and interpretation in wireless sensor networks," in Proceedings of the 3rd international Conference on Embedded Networked Sensor Systems (SynSes '05), San Diego, Calif, USA, 2005.
 Xilinx, Xpower estimator user guide, 2012, Xilinx power tools tutorial, 2010, http://www.xilinx.com.
 Altera, Powerplay early power estimators, 2012, http://www .altera.com/support/devices/estimator/pow-powerplay.jsp.
 A. Hornberg, Handbook of Machine Vision, John Wiley & Sons, Weinheim, Germany, 2006.
 Z.-H. Chen, A. W. Y. Su, and M.-T. Sun, "Resource-efficient FPGA architecture and implementation of Hough transform," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 8, pp. 1419-1428, 2011.
 A. C. Bianchi and A. H. Reali-Costa, "Implementing computer vision algorithms in hardware: an FPGA/VHDL-based vision system for a mobile rob," in RoboCup 2001, Lecture Notes in Computer Science, pp. 281-286, 2002.
 C. B. Margi, R. Manduchi, and K. Obraczka, "Energy consumption tradeoffs in visual sensor networks," in Proceedings of the 24th Brazilian Symposium on Computer Networks (SBRC '06), Curitiba, Brazil, 2006.
 Spartan-6 family overview, 2010, http://wwwxilinx.com.
 Micron, Numonyx Serial Flash Memory, 2007, http://www .micron.com.
 AT32UC3B0256, AVR32, http://wwwatmel.com/.
 CC2520 transceiver, http://www.ti.com/.
 IGLOO video kit, 2009, http://www.actel.com/.
Muhammad Imran, Khursheed Khursheed, Naeem Ahmad, Mattias O'Nils, Najeem Lawal, and Malik A. Waheed
Department of Electronics Design, Mid Sweden University, Holmgatan 10, 851 70 Sundsvall, Sweden
Correspondence should be addressed to Muhammad Imran; firstname.lastname@example.org
Received 31 May 2013; Revised 7 December 2013; Accepted 12 December 2013; Published 9 January 2014
Academic Editor: Hongkai Xiong
TABLE 1: An abstract complexity model of vision functions for VSN. Line memory Frame memory Data triggering N.A. N.A. Data sources W W H H Data depth b b Colour transforms mem = bits x 9, N.A. where bits are coefficient bits width and 9 are coefficients. Learning Application and Application and algorithm algorithm dependent. dependent. Frame storage N.A. mem = W x H xb bits. Frame subtraction Size depends on Frame storage is the background required. modelling technique. Segmentation Thresholding N.A. N.A. Adaptive Thresholding mem = [(n-1) x W x N.A. b] bits where n is the number of rows for a neighbourhood. Edge based * mem = [(n-1) x W x N.A. b] bits. Region based Size depends on Size depends on the selection of the selection of seed region. seed region. Watershed * Internal memory mem = W x H xb required depending bits. on technique used. Filtering Linear spatial mem = [(n-1) x W x N.A. b] bits Order statistic mem = [(n-1) x W x N.A. b] bits Binary morphology mem = [(n -1) x W N.A. x b] bits Grey scale morphology mem = [(n -1) x W N.A. x b] bits Labelling [mem.sub.tot] = In classical [mem.sub.BUF] + approach, mem = W [mem.sub.LOOKUP] + x H xb bits. [mem.sub.DATA] (see (2)) Feature extraction [mem.sub.area] = N.A. [log.sub.2] [N.sup.2] x no of objects. [mem.sub.cog] = (2 x ([s_num.sub.bits]) + [S_din.sub.bits]) x no of objects. [memb.sub.box] = [log.sub.2] (max([R.sub.i], [C.sub.i])) x no of objects. Classification Application and Application and machine learning machine learning algorithm algorithm dependent. dependent Recognition Depends on design. Depends on the number of objects, training samples, and algorithm. Tracking Application and Application and learning algorithm learning algorithm and objects and objects dependent. dependent Intensity transforms Histograms mem = N.A. 1L x bw where bw = log,(VE x H) and bw is bits for W x H pixels and L is intensity levels. Spatial transforms Memory buffers mem = W x H xb required depending bits. In case of on the technique. multiple rotation and image registration. Maths transforms Fast Fourier transform Memory buffers mem = W x H xb required depending bits. For on the design. coefficients storage mem = 2((W x H)I2) x [bit.sub.coeff] Discrete cosine transform [mem.sub.coeff] = mem = W x H xb bits xN x N where bits. In the case bits are of parallel block coefficient bits processing. and N x N is pixel block, [mem.sub.pro] = N x W x b Discrete wavelet transform mem = Lx N x b where L is the number of rows mem = W x H xb and N is the bits. number of pixels. Hough transform Memory buffers mem = W x H xb required for bits. W xHxrxlx intermediate sqrt([W.sup.2] x operations. [H.sup.2]) x K where r is ratio of nonzero pixels in a binary image and K is number of angles . Postprocessing Lossy compression Depends on the IP Depends on the IP cores being used. cores being used. Lossless compression Depends on the IP Depends on the IP cores being used. cores being used. Arithmetic Comments operations Data triggering N.A. Depending on the application, triggering can be selected. Data sources W W is width and H is image height. H Scaling of W x H affects mem. and arithmetic operation. Data depth b b is bits per pixel. Scaling b affects the complexity. Colour transforms RGB to YCrCb Depending on the conversion require application, W x H x 9 images can be multiplications captured in and W x H x 6 binary, grey additions/ scale, or colour subtractions. format. Learning Application and Offline classifier algorithm with online dependent training is preferable , Frame storage N.A. Memory size depends on the pixel and image resolution. Frame subtraction Technique Static background dependent. Simple or generated using frame subtraction modelling of size W x H, technique is used. require W x H additions/ subtractions. Segmentation Thresholding Complete image Thresholding is require W x H used for simple logical cases. operations. Adaptive Thresholding A technique based Complete image on a neighbourhood requires of n x n requires Wx.Hx.11x.11 add., n x n additions, 1 W x H divs., and division and 1 comp, required. comparison is required. Edge based * Filter mask of m x Filtering an image n requires nm W x H, with a mask additions, nm m x 11, requires W multiplications. xHxmx n adds, and W xHxmx 11 mults. Region based Complexity depends These techniques on the require much selection of seed computational and order in which time. pixels and region are examined. Watershed * Depends on the Having efficient technique such as techniques but immersion/ still not suitable flooding based and for real time. toboggan. Filtering Linear spatial The avg. filtering For complete mask requires image, multiplied weighted avg. image size with requires mults, mask such as, for adds., and div. averaging W xHxmx 11 adds., W x H divs. The complexity of Order statistic these filters The bit level depends on the implementation is selection of usually better for window size and hardware. the sorting algorithm. Binary morphology Erosion and The run time dilation with a complexity of mask m x n require dilation of two m x n operations images is O (WH). (AND/OR). Grey scale morphology Dilation and Mathematical erosion algorithms morphology is have runtime extended to grey complexity of 0( scale images which WHn) where n is is based on the roughly the number notion of minimum of points on the and maximum. boundary of flat structuring elements. Labelling Arithmetic Single pass is complexity depends preferred for real on the algorithm, time component image size, and labelling . number of objects in the image. Feature extraction Complexity Different types of requirements features include involved in position, width, feature height, bounding calculation depend box, area, and on the number and centre of gravity size of objects. [13, 24]. Classification This field is SVM is suitable still at an for on-node immature stage so operation due to it is a its low arithmetic challenging task complexity . to give absolute complexity. Recognition The runtime To avoid design complexity depends complexities, it on the number of is better to use poses which can be standard software reduced to gain packages. the speed. Tracking Tracking Light weight algorithms can be algorithms are simplified by developed with imposing certain constraints on the constraints. motion and appearance of the objects. Intensity transforms Complexity depends These spatial on the technique domain processings such as pricewise are linear transform computationally or histogram efficient and processing being require less selected. resources for implementation. Spatial transforms The arithmetic This transform complexity depends includes matrix on the operations transposes, image performed. rotation, scaling, and registration. Maths transforms Fast Fourier transform Direct Fast algorithms implementation of like radix- DFT requires [2.sup.m], FPA, [(WH).sup.2] and FHT are good operations while candidates in real FFT reduces it to time systems. WHlog WH operations. Discrete cosine transform Direct [mem.sub.coef] is implementation of memory for 2-D 8x8 DCT precomputed requires 1024 coefficients and multiplications [mem.sub.pro] is and 896 additions. memory for processing. Discrete wavelet transform The arithmetic There are complexity different depends on the implementation method being methods. Lifting selected. based is a widely used method. Hough transform For calculating The Hough 10% feature points transform requires by using (1), an complex image size of 256 computations. Some x 256 requires 2.3 efficient M multiplications solutions include and 1.1 M adders line based, with an angle of multiplierless, 180[degrees] using incremental , a voting and a Hough algorithm. transform using convolution. Postprocessing Lossy compression Lossy compression IP core is is based on DCT preferred because and DWT which are of the discussed above. availability. Lossless compression Depends on the algorithm and image contents as well as image size. * The memory requirements for edge based segmentation are given for techniques involving local processing. The memory requirements for edge based segmentation involving global processing are given in the column for Hough transforms. The memory requirements for watershed segmentation are given for techniques involving morphological watersheds. TABLE 2: Vision functions on software and hardware platform. Vision functions ThAVR32 (ms) ThFPGA (ms) ThFPGA (mW) E Background subtraction 332.5 19.91 0.34 Segmentation 225 19.91 0.13 Morphology 2327 19.97 1.14 Low pass filter 202.5 19.91 0.18 Labelling and features 2610 19.91 2.7 ITU-T G4 compression 345 19.97 1.42 Vision functions AVR32 (mJ) Th FPGA (J Background subtraction 25.7 6.76 Segmentation 17.4 2.58 Morphology 180.3 22.76 Low pass filter 15.6 3.58 Labelling and features 202.3 53.76 ITU-T G4 compression 26.7 28.35 TABLE 3: Arithmetic complexity and memory estimation of reference VSN. Functions Line memory Frame memory Data triggering N.A. N.A. Data sources H = 640 W = 400 H =640 W = 400 Data depth b = 8 bits. b = 8 bits. Frame storage N.A. mem = 640 x 400 x 8 = 2048000 bits. Frame subtraction N.A. N.A. Segmentation N.A. N.A. Filtering mem = [(3-1)-640-1] = N.A. Binary 1280 bits. mem = [(3- Morphology 1) -640-1] = 1280 bits. Postprocessing mem = [3-640-1] = 1920 N.A. lossless bits 3 line buffers for compression coding and Internal memory are required for storing 2 (27) + 2 (64) + 13 Huffman codes. Functions Arithmetic operations Comments Data triggering N.A. Time driven. Data sources N.A. Area scan, 640 x 400. Data depth b = 8 bits. Grey scale, 8 bits. Frame storage N.A. N.A. Frame subtraction 640 x 400 = 256000 Background image is subtractions/additions. stored in FLASH memory. Segmentation 640 x 400 = 256000 Manual thresholding is comparison operations. used. Filtering 640 x 400 x 9 = 2304000 For a mask of 3x3 being Binary AND operations. 640 x used, 2 line buffers Morphology 400 x 9 = 2304000 OR are required for operations. dilation and 2 for erosion. Postprocessing The ITU-T G4 is used Objects are in a white lossless which includes colour and the compression arithmetic operations background is black so for run length coding bilevel ITU-T G4 and entropy. compression scheme is used, which is a lossless compression method. TABLE 4: Cost and performance parameters of different systems. Systems/ Image size L.MEM F_MEM Arithmetic Logic references operations used V1  640 x 400 4480 2048000 5120000 1873 V2  640 x 480 1920 7372800 614400 1221 V3  320 x 240 960 0 384000 1824 V4  640 x 480 1920 7372800 614400 1221 V5  128 x 128 N.A. N.A. N.A. N.A. V6  640 x 480 N.A. N.A. N.A. N.A. Systems/ % age % age % age Max. Max. Pixel references LMem LMem logic fps freq (MHz) V1  4.9 3.2 32 59.5 15.2 V2  2.1 11.5 21 25 7.6 V3  1.0 0.0 31 N.A. N.A. V4  2.1 11.5 21 1.3 0.4 V5  N.A. N.A. N.A. 4.1 0.06 V6  N.A. N.A. N.A. 5.7 1.7 Systems/ Implement references on proposed architecture? V1  [check] V2  [check] V3  [check] V4  [check] V5  x V6  x
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Research Article|
|Author:||Imran, Muhammad; Khursheed, Khursheed; Ahmad, Naeem; O'Nils, Mattias; Lawal, Najeem; Waheed, Malik A|
|Publication:||International Journal of Distributed Sensor Networks|
|Article Type:||Case study|
|Date:||Jan 1, 2014|
|Previous Article:||Spatial-temporal correlative fault detection in wireless sensor networks.|
|Next Article:||Smart solutions in elderly care facilities with RFID system and its integration with wireless sensor networks.|