Printer Friendly

A Perception-Driven Transcale Display Scheme for Space Image Sequences.

1. Introduction

With the rapid development of computer technology, spatial rendezvous and docking needs a large amount of image information between navigation information systems. As an important information form, space image sequences play an important role, such as docking control, docking mechanism design, and track interactive control, and they need to be transmitted between these navigation information systems to measure the relative position, relative velocity, and relative attitude of spacecraft [1-4]. For the spatial image sequence, transcale is its important feature. So, demanding multimedia processing methods are different from those of traditional image processing. At present, the study of space image sequences is still at the initial stage and further exploration is still on the way. Specifically, there are mainly two unresolved problems. The first one is how to achieve the task of high-quality monitoring space targets. For example, the continuous attention to space targets should be strengthened due to the lack of the details of space movements. The second one is how to improve the ability of processing images, such as smooth motion reproduction, accurate trajectory description, and different display effects; all of them should be fine exhibited to enhance the awareness of multimedia processing. To solve the above problems, a transcale display method [5-7] emerged as a new direction for image processing and the challenges are mainly shown as follows.

(1) The change of the attention scale: The existing image processing methods only use the inherent characteristics of the image sequences and lack the attention perceived by observers, which needs to be converted from the description of the whole sequences to that of space targets, so as to better capture fine details of space image sequences.

(2) The change of the scale of the frame rate: The existing frame interpolation methods have the problem that the details of the interpolation frames are not clear during the process of changing the frame rates, which affects the smoothness of space image sequences. Thereby, new methods need to be studied to obtain the optimal interpolation frames.

(3) The change of the resolution scale: Because of the limited screen sizes of different devices, the same image sequence is frequently required to be displayed in the different sizes. However, the important contents are not changed the scales in the existing resizing methods, so new methods need to be proposed to improve the definition of important contents under the guarantee of the global resizing effects.

Facing the above challenges, we propose a Perception-driven Transcale Display Scheme (PTDS) to achieve high-quality space motion reproduction. The main contributions have three points. Firstly, we construct a transcale display framework by providing a different perspective, which significantly improves the awareness of multimedia processing. This framework contains two important modules, transcale description based on visual saliency and perception-driven display of space image sequences. Secondly, the module of transcale description is presented to solve the transcale problems, which is the core of PTDS. Finally, the module of perception-driven display is proposed to realize the trajectory description and movement display for space targets under the condition of changing scales. To sum, PTDS could serve for navigation information systems.

The rest of the paper is structured as follows. Section 2 discusses the framework of PTDS. Sections 3 and 4 describe in detail the formulation of PTDS, namely, transcale description and perception-driven display. Section 5 presents experimental work carried out to demonstrate the effectiveness of PTDS. Section 6 concludes the paper.

2. The Framework of PTDS

Figure 1 shows the framework of PTDS and it contains two important modules, transcale description and perception-driven display. The first one is to improve the definitions of the important contents perceived by observers under the condition of changing different scales, including the attention scale, the scale of frame rate, and the resolution scale. The way to determine the attention regions from the observers' viewpoint is naturally becoming a key issue. Recently, the visual saliency technique is more and more widely used in the multimedia field [8-11]. The reasons for using this technique in image processing contain the following three points. Firstly, it can provide the capability of choosing for image description, strengthen the description details of moving targets, so as to priority allocation to achieve the desired image analysis and synthesis of computing resources. Secondly, it can accomplish auxiliary construction of the flexible description scheme by discriminating the saliency regions and non-saliency regions. Finally, it can improve the cognitive ability and decision-making ability for space image sequences. Therefore, we adopt the visual saliency technique to address the above key issue. This module contains the following three parts:

(1) Attention region computing: It focuses on the problem of the change of the attention scale and then captures the visual attention regions for space image sequences. The calculation process mainly includes two aspects: spatial attention regions and temporal attention regions.

(2) Frame rate conversion: It focuses on the problem of the change of the frame rate scale and improves the motion smoothness of space sequences. The calculation process mainly includes three steps: transform of attention blocks, acquisition of neighboring blocks and prediction of unknown pixels.

(3) Image resolution resizing: It focuses on the problem of the change of the resolution scale and improves the definition of space sequences with different resolutions under the guarantee of the global visual effects. The calculation process mainly includes two steps: partition interpolation computing and seam carving operation.

On the basis of transcale description, the module of perception-driven display is presented to achieve the task of high-quality spatial motion display under the condition of changing the different scales. And this module is an application of the first module and it could serve for the space navigation information processing system. This module also contains the following three parts:

(1) Target trajectory computing: It could adaptively display the motion trajectory of space targets according to the setting of scale parameters. The calculation process mainly includes five steps: attention region computing, frame rate conversion, boundary computing, trajectory coordinate calculation, and motion trajectory display.

(2) Space transcale display. It obtains clear motion details under different time scales and spatial scales, and the calculation process mainly includes five steps: key frame computing, display map calculation, thumbnail computing, transcale display, and thumbnail resizing.

(3) Space movement display. It realizes the motion overview of the space targets in a narrative way. The calculation process mainly includes four steps: transition map setting, transition pixel calculation, transition region acquisition, and display map calculation.

3. The Module of Transcale Description

In this section, we will go over the individual parts of transcale description for space image sequences, including attention region computing, frame rate conversion, and image resolution resizing.

3.1. Attention Region Computing. The approach of attention region computing captures the important contents perceived by observers, i.e. visual attention regions, including spatial attention regions and temporal attention regions, and its flow chart is shown in Figure 2. Spatial attention regions are obtained according to the characteristics of color and image signature. And temporal attention regions are calculated by adopting visual tracking. Specifically, let I[1, n] = {[I.sub.t]}(1 [less than or equal to] t [less than or equal to] n) be a space image sequence defined on the 3D space, and we use {T, VA, KT} to represent the attention region, where T denotes the collection of time stamp and VA denotes the attention regions containing spatial attention region V[A.sub.S] and temporal attention region V[A.sub.T]. KT denotes the transfer mark of attention regions. The computation of VA is shown in

[mathematical expression not reproducible] (1)

The computation of spatial attention regions is based on the research of biological vision; i.e., human's visual system is very sensitive to the contrast of the visual signal [12, 13]. We adopt the latest histogram color contrast method [14] to compute the spatial saliency value of any pixel in [I.sub.t], shown as follows:

[mathematical expression not reproducible] (2)

[mathematical expression not reproducible] (3)

In (2) and (3), [[xi].sub.xi,yi] and [[xi].sub.xj,yj] denote any pixel in [I.sub.t], [mathematical expression not reproducible] denotes the spatial saliency value of [[xi].sub.xi,yi], D([[xi].sub.xi,yi], [[xi].sub.xj,ji]) denotes the color distance function, [p.sub.n] denotes the total number of pixels in [I.sub.t], [c.sub.n] denotes the number of distinct pixel colors in [I.sub.t], [v.sub.c] denotes the color value of [[xi].sub.xi,yi], and [[phi].sub.[omega]] denotes the probability of [v.sub.c] in [I.sub.t]. In addition, G = [[summation].sup.near.sub.u=1]D([v.sub.c], [v.sub.u]) in which near denotes the number of color that are nearest neighbors of [v.sub.c].

Then we quantify [[xi].sub.xj,yj], which has the larger value of spatial saliency, putting it into the set SA[L.sup.1] = {[[xi].sub.xj,yj]}. Because [[xi].sub.xj,yj] may be in a very small, relatively unimportant region, so it is necessary to eliminate such a pixel. There are three elimination rules, shown as follows.

Rule 1. For [for all][epsilon], if [mathematical expression not reproducible] and [mathematical expression not reproducible], then [mathematical expression not reproducible] is eliminated from SA[L.sup.t], where [epsilon] = 1,2, ... s.

Rule 2. For [for all][epsilon]' and [for all][epsilon], if [mathematical expression not reproducible] and [mathematical expression not reproducible], then [mathematical expression not reproducible] is eliminated from SA[L.sup.t].

Rule 3. For [for all][epsilon]" and [for all][epsilon], if [mathematical expression not reproducible] and [mathematical expression not reproducible], then [mathematical expression not reproducible] is eliminated from SA[L.sup.t].

As [epsilon]' = 1, 2, ..., ([w.sub.initial] - j), [epsilon]" = 1, 2, ..., ([h.sub.initial] - j), and [w.sub.initial] and [h.sub.initial] are the initial resolution of image sequences. On this basis, the boundary values of V[A.sub.S]([I.sub.t]) can be acquired by using the technology of bounding box.

The computation of temporal saliency regions is based on the observations of human attention continuity [15, 16]. According to the obtained spatial saliency regions, the temporal saliency regions are computed using visual tracking. Since the attention regions could vary over time, we use KT to describe the change extent of attention regions, shown as follows:

[mathematical expression not reproducible] (4)

where [[delta].sub.SR] denotes a threshold value. Then time t, which satisfies the above equation, is successively put into the set KTS. In our experiments, we determine the value [[delta].sub.SR] to be 0.75.

We utilize Bayesian target tracking to compute V[A.sub.T]([I.sub.t]). Let [[gamma].sub.t] = [[[x.sub.t], [y.sub.t], [dx.sub.t], [dy.sub.t]].sup.T] be the state of V[A.sub.T]([I.sub.t]), in which [[[x.sub.t], [y.sub.t]].sup.T] denotes the position vector and [[[dx.sub.t], [dy.sub.t]].sup.T] denotes velocity vector. The set of spatial saliency region O[S.sub.1 ... k] = [{V[A.sub.T]([I.sub.t])}.sub.t=j1 ... j2] is denoted as the observation set of V[A.sub.T]([I.sub.t]), in which [j.sub.1] and [j.sub.2] are the adjacent values in KTS and [j.sub.1] < [j.sub.2]. Within the Bayesian inference framework, we estimate posterior probability function p([g.sub.k] | O[S.sub.1 ... k]) to compute target state [g.sub.k] for V[A.sub.T]([I.sub.t]). The calculation of p([g.sub.k] | [g.sub.k-1]) includes prediction and updating, shown as follows:

[mathematical expression not reproducible] (5)

[mathematical expression not reproducible] (6)

where [g.sub.k] and O[S.sub.k] denote the state and observation value of k-th frame, respectively, p([g.sub.k] | [g.sub.k-1]) denotes the state transition model which can be solved using affine transformation, and p(O[S.sub.k] | [g.sub.k]) denotes state measurement model which can be computed using structural local sparse appearance method, shown as follows:

[mathematical expression not reproducible] (7)

where d denotes the patch divided by 32x32 pixels in V[A.sub.S]([I.sub.m]) (in which 1 [less than or equal to] m [less than or equal k), N denotes the number of the divided patches, [[sigma].sub.i] denotes the sparse coding of d, M denotes the normalization term, and [[xi].sub.i] denotes the weighted sum of [[sigma].sub.i], setting set O. And Dig(x) is the function of diagonal elements computing.

3.2. Frame Rate Conversion. To enhance motion smoothness of space image sequences, frame rate conversion is performed, whose core is to compute the high quality intermediate frames, i.e., how to calculate the pixel values of the intermediate frames. Here the partition method is used for calculation. Moreover, for non-attention regions, the values of pixels are unchanged. And for the attention regions, the values of pixels are accurately computed and the main computation steps contain transform of attention blocks, acquisition of neighboring blocks, and prediction of unknown pixels, which is shown in Figure 3. Specifically, attention regions in any two consecutive frames, i.e., VA([I.sub.t]) and VA([I.sub.t+1]), are divided into [mu] x [mu] overlapping image blocks, and each block is defined as an attention block, VAB for short. Note that we choose three different values for [mu], namely, 8, 16, and 32, and we find that the appropriate value is 16 according to the obtained experiment results. The set of VAB denotes as VA[B.sup.t] = {[b.sup.t.sub.k]} and the computation procedure for the contained pixels is elaborated as follows.

Firstly, each VAB is projected using Walsh-Hadamard kernels [17], as shown below, and the results are stored in the temporary set WP = {W[P.sub.j] | 1 [less than or equal j [less than or equal [phi]}:

[mathematical expression not reproducible] (8)

Then the transformed result is assigned a hash value to accelerate VAB projection, setting in the corresponding hash table T[B.sub.m][[h.sub.m]([b.sub.z])](1 [less than or equal to] m [less than or equal to] [phi]).

Secondly, we establish spatial and temporal expansion rules to obtain the expansion blocks for each VAB. For [for all][b.sup.t.sub.k]([b.sup.t.sub.k] [member of] VA[B.sup.t], each hash table T[B.sub.m] needs to be operated according to expansion rules and the corresponding expansion block [b.sup.t+1.sub.l]([b.sup.t+1.sub.l] [member of] VA[B.sup.t+1]) can be produced, putting into the matching set. On this basis, we determine the nearest blocks from the obtained expanding blocks using the freedom searching, shown as follows:

[b.sup.n+1.sub.near] = [b.sup.n.sub.k] + w[[alpha].sup.i][R.sup.i](9)

In (9), [R.sup.i] [member of] [-1,1] x [-1,1], [alpha] is a constant value and its value is 0.5, [omega] is a search radius, and [b.sup.n+1.sub.near] is the best block of [b.sup.n.sub.k]. Then for each hash table, the following is used to compare and update the nearest blocks:

[b.sup.n+1.sub.near] = ['] if disc ([b.sup.n.sub.k], []) < disc ([b.sup.n.sub.k], [b.sup.n+1.sub.near]) (10)

Finally, the predicted value of the pixels in the attention regions can be computed through a smoothing operation of the obtained nearest blocks (for detailed algorithm, see Algorithm 1).
Algorithm 1: Frame rate conversion algorithm.

Input: A spatial image sequence I
Output: Changed image sequence [I.sup.t]
(1) Compute the frame number of I as len.
(2) Give the first frame of I as the current frame [I.sub.1].
(3) Give the following frame of [I.sub.1] as [I.sub.2] and calculate
their attention regions V[A.sub.1] and V[A.sub.2] according to the
method in Section 3.1.
(4) Divide V[A.sub.1] and V[A.sub.2] into image block ST with
[mu] x [mu].
(5) Compute the transformed value of each ST using equation (8)
and create hash tables.
(6) FOR i = 1 ... the number of hash tables
          FOR each transformed ST in V[A.sub.1]
          Obtain the expansion blocks according to spatial and
          temporal expansion rules;
          Compute the nearest blocks using the equation (9) and (10);
          Compute the value of pixels in V[A.sub.1];
          END FOR
(7) Merge all the transformed ST.
(8) Compute interpolated frame [I'.sub.1] by combining the attention
region with non-attention region.
(9) Give the following frame of [I.sub.2] as [I.sub.3], [I.sub.3]
[right arrow] [I.sub.2].
(10) Repeat steps (3) to (9) until the frame number of the current
frame is (len-1);
(11) Synthetic interpolated frames and original frames
Return I'

3.3. Image Resolution Resizing. Image resizing has gained significant importance because of rapid increases in the diversity and versatility of display devices. However, existing methods do not resize results from the viewpoint of the

observers. In this section, we present a new resizing method to improve the resizing quality of the important contents perceived by observers. To accomplish this, two main issues need to be addressed. The first is the need to determine the important contents from the observers' viewpoint. The second is the need to improve the definitions of the important contents during resizing. For the first issue, we adopt the method of attention region computing (see Section 3.1) and, for the second one, we introduce a method of partition interpolation with an architecture that is illustrated in Figure 4, containing two steps: partition interpolation computing and seam carving operation. Specifically, we resize [I.sub.t] from [h.sub.initial] x [w.sub.initial] to [] x [] and a resizing factor [2.sup.s] is

[mathematical expression not reproducible] (11)

The idea of partition interpolation computing is to adopt different calculation approaches for different regions of original image sequences. Naturally, for any pixel i(x, y) in [I.sub.t], the interpolated pixel [v.sub.p] is computed using

[mathematical expression not reproducible] (12)

where V[A.sub.t] is the simplified symbol of VA([I.sub.t]), which is computed in Section 3.1, BRt denotes the general region (namely, nonattention region), V[A.sub.t] [union] B[R.subt] = [I.sub.t], and A[R".sub.t] (2v+1, 2y+1) and B[R'.sub.t](2x+1, 2y+1) denote the interpolated pixels in V[A.sub.t] and B[R.sub.t], respectively.

The computation of A[R".sub.t] (2x+1, 2y+1) contains two steps: initial estimation and energy modification. The initial value is calculated using

[mathematical expression not reproducible] (13)

In (13), [j.sub.1] and [j.sub.2] are the determination function, [j.sub.1] = [Z.sub.1] - 3[Z.sub.2] + [Z.sub.3], [j.sub.2] = -3[Z.sub.1] + [Z.sub.2] + [Z.sub.4], and [Z.sub.i](i = 1,2, 3, 4) is computed as follows:

[mathematical expression not reproducible] (14)

For any pixel in general regions, the following is used to compute B[R'.sub.t] (2x+1, 2y+1):

B[R'.sub.t] (2x+1, 2y+l) = 0.25B[R.sub.t] (2x, 2y) + 0.25B[R.sub.t] (2x + 2, 2y) + 0.25B[R.sub.t] (2x, 2y + 2) + 0.25B[R.sub.t] (2x + 2, 2y + 2) (15)

Then we use seam carving method to preserve the contents of attention regions. That is to say, the pixels in the different regions are computed through energy function, shown as follows:

[mathematical expression not reproducible] (16)

where H(V[A.sub.t](x, y)) represents the oriented gradient histogram of the pixels in the attention region and [[zeta].sub.l], [[zeta].sub.r], [[zeta].sub.t], and [[zeta].sub.b] are the boundary values of V[A.sub.t].

4. The Module of Perception-Driven Display

In this section, we describe in detail the formulation of our proposed the module of perception-driven display, containing three algorithms, namely, target trajectory computing, space transcale display, and space movement display. To some extent, all of them are an application of the module of transcale description and could be directly applied in the navigation information systems.

(1) Target Trajectory Computing. To display the motion trajectories of space targets under different scales, the approach of target trajectory computing is presented and it contains the following five steps:

Step 1. For each frame in [I.sub.t] (1 [less than or equal to] t [less than or equal to] n), V[A.sub.t] is computed by utilizing the method of attention region computing.

Step 2. Using the method of frame rate conversion, I[1,n] is changed into HI[1, N']=H[I.sub.t](1 [less than or equal to] t [less than or equal to] N' and N' > n) in which N is the number of changed frames and V[A.sub.t] is updated in H[I.sub.t].

Step 3. The four boundary values are computed for each V[A.sub.t], including left boundary B[V.sub.L], right boundary B[V.sub.R], top boundary B[V.sub.T], and bottom boundary B[V.sub.B]:

[mathematical expression not reproducible] (17)

Step 4. For each V[A.sub.t], trajectory coordinates are calculated, including the value of horizontal coordinate V[A.sub.row] and the value of vertical coordinate V[A.sub.column]:

V[A.sub.row] = B[V.sub.L] + (B[V.sub.R] - B[V.sub.L)/2,

V[A.sub.column] = B[V.sub.T] + (B[V.sub.B] - B[V.sub.T])/2 (18)

Step 5. Motion trajectory A[T.sub.v] with the different scale v is computed using the following formula:

[mathematical expression not reproducible] (19)

(2) Space Transcale Display. The aim of transcale display is not only to show motion pictures at different time scales but also to display with the different resolution scales. This algorithm mainly includes the following five steps:

Step 1. For each [I.sub.t], the key frame [K.sub.j] (j = 1 ... m) is computed and then the corresponding attention region VA[K.sub.j] is obtained.

Step 2. The transcale display map KM is captured by merging the attention regions under the condition of different key frames.

Step 3. For each VA[K.sub.j] (j = 1 ... m), the corresponding thumbnail V[U.sub.j] is computed by reducing the original image to the same resolution.

Step 4. The transcale display map KM' for [I.sub.t] is computed:

KM' = KM [union] V[U.sub.1] [union] V[U.sub.2] [union] ... [union] V[U.sub.m] (20)

Step 5. Using the method of image resolution resizing, V[U.sub.j] could also have high-quality display with the different resolutions.

(3) Space Movement Display. To outline the motion process of space targets, a movement display algorithm is proposed, which focuses on smooth transition of space movement. Specifically, let [K.sub.p] and [K.sub.q] be the continuous key frames in [I.sub.t] and the transition regions of them be K[R.sub.p](u, v) and K[R.sub.q](u', v'), respectively, in which u, u' [member of] [1, height], v[member of] [weight-r+1, weight], v' [member of] [1, r], r denotes the transition radius, and height and weight are the current resolution of key frames. The process of this algorithm consists of the following four steps:

Step 1. Both K[R.sub.p] and K[R.sub.q] are resized to target resolution heightx2r, and the transition region maps [TMap.sub.i] (i = p, q) are obtained.

Step 2. The value of pixel KR(u, v) in the transition region is computed:

KR (x, y) = [c.sub.1] x [TMap.sub.p] (x, y) + [c.sub.2] x [TMap.sub.q] (x, y) (21)

in which [c.sub.1] and [c.sub.2] are transition parameters, [c.sub.1] = 1 - [c.sub.2], [c.sub.2] = v/2r, and x and y are the position coordinates.

Step 3. Repeat Steps 1 and 2 until all the keyframe transition regions are calculated, and then new key frame K' is recorded.

Step 4. All the obtained key frames are merged to form display map MD:

MD = [[mu].summation over (j=1)] [K'.sub.j] (22)

where [mu] is the total number of key frames.

5. Experimental Results and Discussion

In this section, we conduct the quantitative and qualitative experiments to evaluate the performance of PTDS, including the transcale performance (see Sections 5.1, 5.2, and 5.3) and the display performance (see Sections 5.4, 5.5, and 5.6). We downloaded space video from Youku and they are segmented the video clips. Then we changed these clips to image sequences and formed our data shown in the Table 1. In all our experiments, we used the MATLAB platform and a PC with a 2.60 GHz Intel(R) Pentium(R) Dual-Core CPU processor with 1.96 GB of main memory.

5.1. Evaluation of Attention Scale. Figure 5 shows the obtained attention regions using our method, in which each row exhibits the results of different space image sequences and each column exhibits sample frames of each sequences. In addition, each subfigure consists of three parts: the purple-bordered region in the left marks the attention regions, the blue-bordered region in the upper right marks the spatial attention region, and the green-border region in the lower right marks the temporal attention region. Specifically, the first row shows the results for frames 100,128, and 134 from S1, and the second row shows the results for frames 119,121, and 135 from S2. Observing these rows, we can see that the spatial attention regions only capture part of space targets, for example, the result in the first row and first column. This means that both the calculations of spatial attention regions and temporal attention regions are necessary for obtaining the accurate attention regions, especially for image sequences with large motion amplitude. Moreover, the third row shows the results for frame 10, 24 and 30 from S3, and the last row shows the results for frames 8, 45, and 88 from S4. These sequences have a relatively complicated background, which causes great interference to the detection of attention regions. However, the proposed method still obtains good results and these results lay the foundation for transcale display.

Figure 6 shows the quantitative comparison results for the other methods, including image signature detection (ISD) [18] and global contrast detection (GCD) [14]. Here we adopt the overlap score [19] as a quantitative indicator, which is defined as the following formula:

score = [R.sub.S] [intersection] [R.sub.G]/[R.sub.S] [union] [R.sub.G] (23)

where [R.sub.S] denotes the obtained attention result and [R.sub.G] denotes the corresponding ground truth bounding box. The larger the overlap scores are, the more accurate the results are. From this figure, we can see the average overlap scores using three methods. It is noted that our method achieves a larger score than those of the other methods, and the results show that the obtained attention regions using our method are closer to the true significant region of original image sequences.

5.2. Evaluation of Frame Rate Scale. The quality of interpolated frames is a key factor to evaluate the change of the scale of frame rate, and in experiments, we removed one in every two consecutive frames of the original image sequence and reconstructed the removed one adopting the different methods, including three-step search (TSS), adaptive rood pattern search (ARPS), Horn and Schunck (H&S), CNF [20], CSH [21] and our proposed method. Figures 7 and 8 show the frame interpolation results for sequence S5 and S6, and from the red-bordered region of each subfigure, the visual differences among algorithms can be determined. TSS and ARPS exhibit a poor interpolation effect, H&S and CNF introduce a suspension effect, CSH produced disappointing effects, and PTDS shows the clear details of spacecraft. From these figures, it is evident that the proposed method shows comparatively better performance in terms of visual quality.

Figure 9 shows the average quantitative values of each image sequence by using the different methods. The left part demonstrates the average PSNR values, which is traditional quantitative measure in the term of accuracy, and the right part demonstrates the MSSIM results, which assess the image visibility quality from the viewpoint of image formation, under the assumption of a correlation between human visual perception and image structural information. In this figure, it is obvious that our method could achieve the highest average values.

5.3. Evaluation of Resolution Scale. We resize the original image sequences by using the different methods, including scaling, the best cropping, improved seam carving (ISC) [22], and our method, and then the obtained results are compared to evaluate the performance on the change of resolution scales.

Figure 10 shows the comparison results of sequence S7 from 320x145 to 500x230. Figure 10(a) shows the original frames, in which the numbers of frames are 131, 138, 147, 151,174,179,189, 203, and 208. Figure 10(b) shows that using scaling method, the launch base becomes vaguer than before. Figure 10(c) shows that using the best cropping method, the launch base is only partly displayed, resulting in the original information becoming missing. Figure 10(d) shows that using the improved seam carving method, the prominent part of launch base becomes smaller than before, indicating that ISC is not suitable for image enlarging. Figure 10(e) shows that our method clearly displays the prominent objects of the original frames and ensures a global visual effect when the resolution scales are changed. Similarly, resizing results of the sequence S8 are shown in Figure 11.

5.4. Evaluation of Target Trajectory Computing. Figure 12 shows target trajectory of sequence S2 at scales v = 1, v = 10, and v = 20. As can be seen from this figure, as the value of v increases, the movement amplitude of space targets becomes greater and the description of the trajectory could be coarser, which clearly characterizes the change of the movement of automatic homing. Similarly, target trajectory of sequence S5 is shown in Figure 13.

Figure 14 shows the detailed positions of target trajectories corresponding to Figure 12, which are described in way of computing the position of the center pixel in attention regions. And it can be seen that the larger the scale v is, the fewer the description points are and the rougher the position trajectory of center point is and vice versa. Similarly, Figure 15 shows target trajectories corresponding to Figure 13. In short, PDTS exhibits good performance for displaying space target trajectories under different scales, which can lay a foundation for space tasks, such as condition monitoring and motion tracking.

5.5. Evaluation of Space Transcale Display. Figure 16 gives transcale displays for sequences S5 and S9, in which the attention regions are shown in the left part of each subfigure. Figure 16(a) shows both the whole process of detecting from near to far and the motion details of the space targets. Similarly, the process of movement from far to near is given in Figure 16(b). From this figure, we can see that the proposed space transcale display algorithm can clearly exhibit the motion process of the space targets at different scales. And by observing the accompanying attention regions, motion details of space targets, such as target pose and operating status, are displayed at different distances and time, which fully reflects the transcale characteristics of the PTDS scheme.

Figure 17 summarizes the objective evaluation values of space transcale display corresponding to Figure 16, including average gradient, edge intensity, spatial frequency, image definition, and quality score. With the increase of the number of key frames, the values of the above evaluation index are also increasing, indicating that the space transcale display algorithm can obtain high-quality transcale display map.

5.6. Evaluation of Space Movement Display. Figure 18 shows Space movement display for space image sequences, in which a, b, and c are, respectively, S3, S4, and S6. By adopting the narrative diagram, the complete movement of the space targets are clearly exhibited. Figures 18(b), 18(d), and 18(f) are their corresponding transition area details, depicting the comparison results using direct fusion method in the left and our method in the right. For the left part, it is obvious to see a straight line to distinguish between two key frames, while for the right part, smooth and delicate transition areas are displayed to achieve a perfect representation of the movement process.

6. Conclusions

In this paper, we focus on the transcale problem of spatial image sequences and propose a novel display scheme which improves the awareness of multimedia processing. The contribution of this scheme manifests in two aspects. On the one hand, space targets sustained attention by using visual saliency technology and then the details of the movement of the targets are enhanced. On the other hand, the motion of spatial image can be smoothly reproduced and effectively showed on the display device with different sizes and resolutions. Experimental results show that the proposed method outperforms the representative methods in terms of image visualization and quantitative measures. In the future, we will further study the transcale characteristics of space image sequences and explore the mutual influence between the different scales. On this basis, we will also establish a robust transcale processing mechanism to better serve spatial rendezvous and docking.

Data Availability

The data used to support the findings of this study have not been made available because the data is confidential.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.


This work was supported by the National Natural Science Foundation of China (61702241,61602227), the Foundation of the Education Department of Liaoning Province (LJYL019), and the Doctoral Starting Up Foundation of Science Project of Liaoning Province (201601365).


[1] W. Wang and Y. N. Hu, "Accuracy performance evaluation of Beidou navigation satellite system," Acta Astronomica Sinica, vol. 58, no. 2, 2017.

[2] C. Shi, Q. Zhao, M. Li et al., "Precise orbit determination of Beidou Satellites with precise positioning," Science China Earth Sciences, vol. 55, no. 7, pp. 1079-1086, 2012.

[3] J. Luo, S. Wu, S. Xu, J. Jiao, and Q. Zhang, "A cross-layer image transmission scheme for deep space exploration," in Proceedings of the 86th Vehicular Technology Conference (VTC-Fall '17), pp. 1-5, IEEE, September 2017

[4] P. O'Driscoll, E. Merenyi, and R. Grossman, "Using spatial characteristics to aid automation of SOM segmentation of functional image data," in Proceedings of the 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), pp. 1-8, June 2017.

[5] D. Meng, Y. Jia, K. Cai, and J. Du, "Transcale average consensus of directed multi-vehicle networks with fixed and switching topologies," International Journal of Control, vol. 90, no. 10, pp. 2098-2110, 2017.

[6] L. Zhao and Y. Jia, "Transcale control for a class of discrete stochastic systems based on wavelet packet decomposition," Information Sciences, vol. 296, no. 1, pp. 25-41, 2015.

[7] L. Zhao, Y. Jia, J. Yu, and J. Du, "[H.sub.[infinity]] sliding mode based scaled consensus control for linear multi-agent systems with disturbances," Applied Mathematics and Computation, vol. 292, pp. 375-389, 2017.

[8] B. J. White, D. J. Berg, J. Y. Kan, R. A. Marino, L. Itti, and D. P. Munoz, "Superior colliculus neurons encode a visual saliency map during free viewing of natural dynamic video," Nature Communications, vol. 8, article 14263, 2017

[9] J. Yang and M.-H. Yang, "Top-down visual saliency via joint CRF and dictionary learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 3, pp. 576-588, 2017.

[10] S. Bhattacharya, K. S. Venkatesh, and S. Gupta, "Visual saliency detection using spatiotemporal decomposition," IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1665-1675, 2018.

[11] V. Ramanishka, A. Das, J. Zhang, and K. Saenko, "Top-down visual saliency guided by captions," in Proceedings of the 30th Conference on Computer Vision and Pattern Recognition, CVPR '17, IEEE, 2017.

[12] S. I. Cho, S.-J. Kang, and Y. H. Kim, "Human perception-based image segmentation using optimising of colour quantisation," IET Image Processing, vol. 8, no. 12, pp. 761-770, 2014.

[13] M. Eickenberg, A. Gramfort, G. Varoquaux, and B. Thirion, "Seeing it all: convolutional network layers map the function of the human visual system," NeuroImage, vol. 152, pp. 184-194, 2017.

[14] M. M. Cheng, G. X. Zhang, N. J. Mitra, X. Huang, and S. Hu, "Global contrast based salient region detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 409-416, Providence, RI, USA, June 2011.

[15] X. Jia, H. Lu, and M. Yang, "Visual tracking via adaptive structural local sparse appearance model," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '12), pp. 1822-1829, June 2012.

[16] G. Mehraei, B. Shinn-Cunningham, and T. Dau, "Influence of spatial and non-spatial feature continuity on cortical alpha oscillations," The Journal of the Acoustical Society of America, vol. 141, no. 5, pp. 3634-3635, 2017

[17] M. T. Hamood and S. Boussakta, "Fast Walsh-Hadamard-Fourier transform algorithm," IEEE Transactions on Signal Processing, vol. 59, no. 11, pp. 5627-5631, 2011.

[18] X. Hou, J. Harel, and C. Koch, "Image signature: highlighting sparse salient regions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 194-201, 2012.

[19] M. Everingham, L. van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (VOC) challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.

[20] D. Sun, S. Roth, and M. J. Black, "Secrets of optical flow estimation and their principles," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 2432-2439, IEEE, California, Calif, USA, June 2010.

[21] S. Korman and S. Avidan, "Coherency sensitive hashing," in Proceedings of the 2011 IEEE International Conference on Computer Vision, ICCV 2011, pp. 1607-1614, Spain, November 2011.

[22] D. D. Conger, M. Kumar, R. L. Miller, J. Luo, and H. Radha, "Improved seam carving for image resizing," in Proceedings of the 2010 IEEE Workshop on Signal Processing Systems, SiPS 2010, pp. 345-349, USA, October 2010.

Lingling Zi (iD), (1) Xin Cong (iD), (1) Yanfei Peng, (1) and Pei Yang (2)

(1) School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China

(2) College of Science, Liaoning Technical University, Fuxin 123000, China

Correspondence should be addressed to Xin Cong;

Received 22 May 2018; Accepted 19 September 2018; Published 4 October 2018

Academic Editor: Deepu Rajan

Caption: Figure 1: The framework of PTDS.

Caption: Figure 2: The flowchart of the module of attention region computing.

Caption: Figure 3: The computation of pixels in attention regions.

Caption: Figure 4: The method of partition interpolation.

Caption: Figure 5: The obtained attention regions using PTDS from sample frames of different image sequences (from top to down, sequences S1, S2, S3, and S4). In each subfigure, the purple-bordered regions mark the attention regions, the blue-bordered regions mark the spatial attention regions, and the green-border regions mark the temporal attention regions.

Caption: Figure 6: Comparison results of the average overlap score.

Caption: Figure 7: Interpolation frame 38 of sequence S5.

Caption: Figure 8: Interpolation frame 4 of sequence S6.

Caption: Figure 9: Comparison results of average values in the term of PSNR and MSSIM.

Caption: Figure 10: Comparison results of sequence S7 using four methods when the resolution is resized from 320 x 145 to 500 x 230. (a) The original frames, (b) scaling, (c) the best cropping, (d) ISC, and (e) PTDS.

Caption: Figure 11: Comparison results of sequence S8 using four methods when the resolution is resized from 576 x 432 to 600 x 800. (a) The original frames, (b) scaling, (c) the best cropping, (d) ISC, and (e) PTDS.

Caption: Figure 12: Target trajectory display of sequence S2.

Caption: Figure 13: Target trajectory display of sequence S5.

Caption: Figure 14: Target trajectory of the center pixel corresponding to sequence S2. From left to right, v = 1, v = 10, and v = 20.

Caption: Figure 15: Target trajectory of the center pixel corresponding to sequence S5. From left to right, v = 1, v = 10, and v = 20.

Caption: Figure 16: Space transcale display of space image sequences.

Caption: Figure 17: Objective evaluation.

Caption: Figure 18: Space movement display for S3, S4, and S6.
Table 1: Experimental data.

Sequence           Space image             Initial    Number of
ID                   sequence             resolution    frames

SI          Aircraft automatic control     320x240        204
S2           Aircraft automatic homing     320x240        230
S3                 Aircraft lift           640x480        238
S4               Aircraft docking          640x480        277
S5             Spacecraft detection        352x240        157
S6               Spacecraft launch         352x240        101
S7                  Launch base            320x145        208
S8               Spacecraft flight         576x432        161
S9                Aircraft flight          1024x436       130
COPYRIGHT 2018 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Zi, Lingling; Cong, Xin; Peng, Yanfei; Yang, Pei
Publication:Advances in Multimedia
Article Type:Report
Date:Jan 1, 2018
Previous Article:A Scoping Review on Tangible and Spatial Awareness Interaction Technique in Mobile Augmented Reality-Authoring Tool in Kitchen.
Next Article:Combining Convolutional Neural Network and Markov Random Field for Semantic Image Retrieval.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters