# A study-based ranking of LiDAR data visualization schemes aided by georectified aerial images.

IntroductionRecent developments in sensors and technology have enabled capturing high resolution geospatial data. Ancillary information or metadata associated with these data sets help produce quality digital terrain and elevation models, which find applications in decision-making, emergency response and disaster management. Models for three dimensional visualization of geoinformation have also been brought forward by various researchers (Zlatanova 2000; Zlatanova and Verbree 2000; Lattuada 2006; Lin and Zhu 2006; van Oosterom, Stoter, and Jansen 2006).

One of the newer technologies that has attained the level of an industry standard technique for topographic data collection is Light Detection and Ranging (LiDAR). This technology procures dense, accurate, and precise three dimensional topographic information in terms of data points. Based on the principle of laser ranging, LiDAR is available in the terrestrial and airborne modes. In this paper, we shall discuss in the context of airborne altimetric LiDAR.

LiDAR data sets are large for even very small areas. Thus, systems for data quality assessment, decision making etc., require a visualization process or pipeline to see LiDAR data or the terrain represented by these. With respect to the geospatial data sets, a good visualization process is one where the features on the terrain are perceived as they appear in reality. Therefore, it is necessary to compare various visualization pipelines based on user feedback and provide them with a ranking. We have been working in the direction of studying and developing pipelines for LiDAR data visualization aided with georectified aerial images. In this paper, we first design an experiment to compare the various visualization pipelines studied by us, and then use statistical tools to rank them.

Objective

This paper targets two research objectives: (a) design of an experiment to compare the various visualization methods developed by us (described in section "Visualization pipelines studied by us"); (b) describe the statistical methodologies to obtain conclusions from the data obtained through the experiment. Thus, the paper statistically ranks the various visualization schemes in the order of their effectiveness in terms of feature recognition and depth perception.

Data sets and software APIs

In 2004, Optech Inc., Canada conducted a LiDAR flight over the environs of Niagara Falls. The flight was conducted at an average height of 1200 m above ground level. Along with an ALTM 3100 instrument, the airplane was also equipped with the high resolution aerial camera to photograph the terrain. Five different subsets of 100 m x 100 m each were cut out from the data sets. The number of data points in each of these data sets is given in Table 1. Each of these subsets contains buildings, trees, roads, and vehicles.

The aerial images corresponding to each of the data sets were georectified using the TerraSolid package. In our studies, the OpenSceneGraph display engine with its C++ based API is used for the visualization of the LiDAR data sets and derived products with the georectified aerial images. This visualization engine can restitute scenes in mono and stereo modes. For stereoscopic visualization anaglyph glasses were used. The programming was done on an Ubuntu Linux 11.04 64 bit platform with 4 GB RAM and 1TB 5400 rpm HDD.

Visualization pipelines studied by us

MacEachren and Kraak (2001) had pointed out four different research challenges in the field of geovisualization. Out of these, the second agenda item concerns the development of new methods and tools as well as a "fundamental effort in theory building for representation and management of geographic knowledge gathered from large data sets" (Dykes, Mac Eachren, and Kraak 2005).

Visualization engines capable of displaying three dimensional data are typically based on OpenGL (Open Graphics Library) or Microsoft DirectX. Advanced visualization engines are based on Graphical Processing Units (GPUs) which have the algorithms for three dimensional display embedded in them. In principle, graphics engines understand the language of points, lines, and triangles, which in the language of mathematics are known as simprices. In addition, these simplices can be given color or texture attributes. In order to visualize LiDAR data, it has to be translated to the language of the visualization engine, i.e. it has to be converted to a combination of points, lines, and triangles, through a certain process. This process is known as a pipeline.

Several approaches for processing and visualizing LiDAR data have been found in the literature, which could be grouped in the following classes for preparing 3D maps: (a) direct visualization of LiDAR point cloud, (b) through manual process of classification which is time consuming, or (c) through the process of segmentation. In the cases of segmentation and classification, the extracted features are first generalized and then visualized. Out-of-core computation algorithms have also been reported which handle and visualize large point cloud data sets.

We have been studying and developing systems for visualization of LiDAR data using simplices (points, triangles, and tetrahedrons) (Ghosh and Lohani 2007a, 2007b) and have also developed a heuristic to process, extract, and generalize LiDAR data for quick and effective visualization (Ghosh and Lohani 2011). In the following paragraphs, we present a brief summary of the various visualization pipelines for 3D visualization of LiDAR data aided by georectified aerial images.

0-simplex based visualization

Kreylos, Bawden, and Kellogg (2008) visualized large point data sets using a head tracked and stereoscopic visualization mode, using a multiresolution rendering scheme supporting billions of 3D points at 48-60 stereoscopic flames per second. When used in a CAVE based immersive environment, this method has been reported to give better results over other analysis methods. Massive point clouds were rendered using an out-of-core real time visualization method using a spatial data structure for a GPU-based system with 4 GB main memory and a 5400 rpm hard disk (Richter and Drllner 2010). In our study we name this process as PTS. The mono mode of 0-simplex-based or point-based visualization is named as PL-PTS and the corresponding anaglyh mode is termed as AN-PTS (Table 2).

2-simplex-based visualization

2-simplices or triangles can be generated from data points either by using a triangulation algorithm or by generating tetrahedra and then exploding them to their respective facets. We generated Delaunay triangulations and tetrahedralizations by using the Quickhull algorithm (Barber, Dobkin, and Huhdanpaa 1996) provided in the QHULL tool (http://www.qhull.org).

We first generated a TIN from LiDAR data and then draped it with the georectified texture (Ghosh and Lohani 2007a). The LiDAR data, the TIN, and the georectified texture were sent to the visualization engine for visualization. We name this process as DTR/. The mono mode of visualization is named as PL-DTRI and the anaglyph mode is termed as AN-DTRI (Table 2).

The TIN generated from LiDAR data through the Delaunay Triangulation was found to have triangles with small areas but very large edges. Such triangles are "culled" using a certain threshold. The remaining triangles along with the point data and georectified texture are sent to the visualization engine (Ghosh and Lohani 2007b). We name this process as TDTRI. The mono mode of visualization is named as PL-TDTRI and the anaglyph mode is termed as AN-TDTRI (Table 2).

A Delaunay tetrahedralization is generated using the LiDAR data points. The tetrahedrals generated by the process are exploded to their respective triangular facets (Ghosh and Lohani 2007b). The triangular facets containing long edges are removed using a threshold. The remaining triangles, the georectified texture, and the points are sent to the visualization engine. We name this process as TDTET. The mono mode of visualization is named as PL-TDTET and the anaglyph mode is termed as AN-TDTET (Table 2).

Heuristic based visualization

In case of the Delaunay triangulation and subsequent trimming, we have noted that certain dome-shaped terrain features, e.g. very sparse trees, get deleted, whereas planar features are well represented in the case of triangulation. In the case of Delaunay tetrahedralization, the sparse trees were better represented. Owing to the complexity of the tetrahedralization process, the planar features are over represented and the rendering is comparatively slow. It is therefore clear that the LiDAR point cloud needed a hybrid mode of processing, where both triangulation and tetrahedralization can be used. We therefore developed a heuristic-based method.

Ghosh and Lohani (2013) identified that density-based methods are suitable for extracting features from LiDAR data sets. The authors concluded that Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al. 1996) was well suited to extract terrain features from LiDAR data sets. We first extract clusters of points using the DBSCAN algorithm, and then observe that there are four kinds of clusters namely sparse, flat-and-wide, dome-shaped, and those potentially containing planes. We developed strategies for treating these clusters separately for each of the types using heuristics (Ghosh and Lohani 2011). In this paper, we processed the data using two variants of the algorithm presented in Ghosh and Lohani (2011). In the first variant, which we name MDL, the dome-shaped clusters are generalized as described in Ghosh and Lohani (2011). In the second variant, which we name PMDL, the dome-shaped clusters are displayed as points. The mono modes of visualization for the two processes are named as PLMDL and PL-PMDL. The stereoscopic modes of visualization are named as AN-MDL and AN-PMDL (Table 2).

Classification-based visualization

Conventionally, feature extraction from LiDAR data has been achieved through the process of classification. We classify the various features present in the LiDAR data using TerraScan and then generate a CAD model. TerraScan is used to generate a video of the CAD model. This process is named as CAD. Since CAD is only possible in the mono mode, the mode of visualization is named as PL-CAD (Table 2).

In a previous study (Ghosh and Lohani 2012), we compared 12 visualization pipelines by using Friedman's test, subsequent post hoc analysis, and Page's trend test. In this paper, we describe the design of experiment for comparing 13 visualization pipelines (Table 2), and develop a comprehensive statistical background for their ranking and selection.

Methodology

This section presents the experiment designed in order to obtain the ratings from the participants for the 13 visualization schemes presented in Table 2.

Design of experiment

The experiment was designed to examine a user's "perception of various features" like buildings, trees, roads, and vehicles in addition to perception of depth. By perception of various features it meant that the user would be able to identify the type of the feature and give a score on the quality of the rendering of the feature. It was necessary that the participants were brought to the same level before participating in the experiment. Therefore, the conduction of the experiment was divided into three phases: (a) orientation, (b) acquaintance with third dimensional visualization and depth perception, and (c) perception of features from the various schemes presented to the participants rating the scenes.

The rating scale

Researchers have opined differently on selecting the scales of ratings to be adopted for surveys and experiments. While Worcester and Burns (1975) argued that scales without mid points pushed the respondents to the positive end of the scale, Garland (1991) found the opposite. Dawes (2008) found that scores based on 5 point or 7 point scales might produce slightly higher mean scores relative to the highest possible attainable score, compared to a 10 point scale. Dawes (2008) also showed that this difference was statistically significant. Therefore, a 10 point scale was selected for this study.

Phases of the experiment

The orientation phase introduced the participant to the experiment through a LibreOffice Impress presentation. The scheme of rating and the meaning of depth perception were explained to the user in this phase. It was emphasized that the rating of the scenes presented were to be done absolutely on a scale of 1 to 10, where 1 means "very bad" and 10 means "very good".

The second phase of the experiment was designed to acquaint the participant with 3D perception through anaglyph glasses and to test their ability of stereo-vision. Three different anaglyph photos were presented before the participant in an increasing order of complexity in terms of the depth cue. The participants were asked objective questions which involved either counting the number of objects in a scene, ordering the objects in terms of distance from the participant, or identifying the type of the objects.

The third phase involved the rating of the features perceived from the different visualization schemes presented to the participant in the experiment. Since the focus is on the perception of the various features on the terrain and depth cue, the following questions were formulated for obtaining the ratings:

(1) How well do you perceive depth in the presented scene?

(2) How well is your overall perception of features in the presented scene? Features mean buildings, trees, roads, and vehicles.

(3) How well do you perceive buildings in the presented scene?

(4) How well do you perceive trees in the presented scene?

(5) How well do you perceive roads in the presented scene?

(6) How well do you perceive vehicles in the presented scene?

Since there are 13 visualization schemes which are to be evaluated by the users, there could be 13! ways of presenting the outputs from the schemes to the participants. The process of confounding is used to limit the number of possible ways in which visualization schemes are presented to the participant (Finney 1980). Vajta et al. (2007) have proposed a method of determining the order of presentation of the visualization schemes using Hamiltonian paths. A Hamiltonian path will ensure that all the visualization schemes are presented before a participant. It is observed from Table 2 that the presentation methods are either 2.5D or 3D (immersive) and the pre-processing schemes point-based, triangulation-based, tetrahedralization-based, and heuristic-based. The preprocessing schemes are laid out in the x-axis and the presentation schemes are laid out in the y-axis. The confounding is done by ensuring that any movement is either made one step either in the x-direction or the y-direction. Thus, the valid paths from one point to the other are shown in Figure 1. The Hamiltonian paths are then computed through the Mathematica package.

Design of the user interface

C++ and the OpenSceneGraph API were used in combination to develop command line utilities to render the outputs of various visualization pipelines (see section "Visualization-pipelines studied by us"). The outputs can be perceived through multiple directions, and a user can pan, zoom in and zoom out, and rotate the scene by using the mouse. The rendering of the classification based pipeline PL-CAD was done as a walk-through video using Bentley Microstation and TerraSolid software. A user interface using Qt/C++ was designed which alternatively presented a rendered scene and the questionnaire to the participant (Figure 2). The questionnaire enabled the participant to provide scores to the rendered scene on a scale of 1 to 10. The scores were recorded in a hard-disk. Each of the participants was presented with renderings from one of the data sets in the order as determined by one of the Hamiltonian paths. Screenshots of the various schemes presented to a participant are shown in Figure 3. Recorded videos of the users interaction with different outputs from the pipelines have been uploaded to YouTube and the links have been listed in the Appendix.

Conducting the experiment

Sixty third year undergraduate students of the Department of Civil Engineering, Indian Institute of Technology, Kanpur, agreed to participate in the experiments through appointment slots of 45 minutes each. During each of the experiments, it was ensured that the lighting and cooling were kept constant, and the noise level was kept to the minimum possible.

Statistical analysis of data

In the discussion given earlier, there are 13 visualization schemes or processes and 60 participants. To make the statistical discussion simpler, let k = 13 and n = 60. The score given by the ith participant for the jth process is denoted by [s.sub.ij] (Table 3). The null hypothesis is framed as [H.sub.0]:- the processes are all similar. The alternate hypothesis is to prove otherwise. In this section, we outline the statistical ranking and selection process used in this paper.

Friedman's test

Pre-analysis. For each of the n participants, the scores in a row are ranked in a decreasing order. Tied ranks are replaced by averaged ranks (Table 4). The Friedman's Test (Friedman 1937, 1939) checks for the validity of the null hypothesis.

The following notations are being introduced here:

* [r.sup.f.sub.ij]: the rank of [s.sub.ij], for the list

{[S.sub.ij]|j = 1, ... , k}, [for all]I = 1, ... , n

* [R.sub..j] := [[summation].sup.n.sub.i=1] [r.sup.f.sub.ij],

* S: = [[summation].sup.k.sub.j=1] [[[R.sub..j] - [n (k+1)/2]].sup.2],

The formula for Q-statistic for the Friedman's test is given by (Gibbons and Chakraborti 2010)

Q: = [12k(k-1)S/nk([k.sup.2] - 1) - [SIGMA] [SIGMA] t([t.sup.2] - 1)]. (1)

Q denotes the test statistic for the Friedman's test. The double sum in Equation (1) is extended over all sets of t tied ranks in each of the n rows. For large values of k(k > 7), the probability distribution of Q is approximated by the central [chi square] distribution with k - 1 degrees of freedom. If the P-value, given by P([[chi square].sub.k-1] [greater than or equal to] Q), is smaller than the level of significance a then the null hypothesis is not accepted.

Post hoc analysis and ranking. In case the Friedman's test (FT) indicates that the null hypothesis cannot be accepted, i.e. results from the k processes are not similar, a post hoe analysis has to be conducted. The processes i and j are termed as different for a significance level [alpha] if (Gibbons and Chakraborti 2010)

[absolute value of ([R.sub.i] - [R.sub.j])] > z [[alpha]/k(k - 1)] [square root of (nk(k + 1)/6)] (2)

where z [[alpha]/k(k - 1)] is the [alpha]/k(k - 1)th upper standard normal quantile. Since [R.sub.j], j = 1, ... , k are real numbers, [j.sub.1], [j.sub.2], ... , [j.sub.k] can be found such that, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus, all the processes can be given hypothetical ranks from 1 to k. In case two processes are found similar (see Equation (2)), they are alloted the same hypothetical rank as per the standard competition ranking scheme.

Kruskal-Wallis test

Pre-analysis. In case of the Kruskal-Wallis test, all of the scores obtained from the experiment are ranked from 1 to n x k (Table 5), as per the standard competition ranking scheme. The tied ranks are replaced by averaged ranks.

The following notations are introduced:

* [r.sup.K.sub.ij]: is the rank of [s.sub.ij] amongst {[s.sub.11], [s.sub.12], ... , [s.sub.1k], ... , [s.sub.i1], ... , [s.sub.ik], ... , [s.sub.nk]},

* N = k * n,

* S = [[summation].sup.k.sub.i=1] [[[R.sup.K.sub.i] - n(N + 1)/2].sup.2],

The Kruskal-Wallis statistic H is defined as (Gibbons and Chakraborti 2010),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

where the summation [SIGMA]t([t.sup.2] - 1) is calculated over all cases of t-tied ranks. For large values of n, the probability distribution of H is approximated by the central [chi square] distribution with k - 1 degrees of freedom. If the P-value, given by P([[chi square].sub.k-1] [greater than or equal to] H), is smaller than the level of significance a then the null hypothesis is not accepted.

Post hoc analysis and ranking. In case the Kruskal-Wallis test (KW) indicates that the null hypothesis cannot be accepted, a post hoe analysis has to be conducted. The processes i and j are termed as different for a significance level a if (Gibbons and Chakraborti 2010)

[absolute value of ([R.sup.K.sub.i] - [R.sup.K.sub.j])] > z [[alpha]/k(k - 1)] [square root of (nN (N + 1)/6)]. (4)

Since [R.sub.K.sub.j], j = 1, ..., k are real numbers, [j.sub.1], [j.sub.2], ..., [j.sub.k] can be found such that, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Thus, all the processes can be given hypothetical ranks from 1 to k. In case two processes are found similar (see Equation (4)), they are alloted the same hypothetical rank as per the standard competition ranking scheme.

Page's test for ordered alternatives

Page (1963) developed an L-statistic to test the null hypothesis that the k processes are identical against the alternate hypothesis that the processes have given ranks. Since the column sums are real numbers, the [R.sub.j]'s can be arranged in a decreasing/increasing order. If [Y.sub.j] denotes the hypothetical rank of the jth column, obtained through some process, then the L-statistic is given by

L := [k.summation over (j=1)] [Y.sub.j] [R.sub.j] (5)

The value [[chi square].sub.L] defined by Page (1963) was

[[chi square].sub.L] := [[12L - 3nk [(k + 1).sup.2]].sup.2]/n[k.sup.2]([k.sup.2] - 1)(k + 1)] (6)

which approximately follows the central [chi square] distribution with a degree of freedom as 1. However, Gibbons and Chakraborti (2010) apply a continuity correction and define Z as

Z = 12(L - 0.5) - 3nk[(k + 1).sup.2]/k(k + 1) [square root of n(k - 1)] ~ N(0, 1). (7)

The variable Z follows the standard normal distribution. As an extension, [Z.sup.2] follows the central [chi square] with a degree of freedom as 1. Therefore, if P[[chi square].sub.1] [greater than or equal to] [Z.sup.2]] < [alpha], where [alpha] is the level of significance, then the null hypothesis is rejected and the hypothetical ranking of the processes is accepted.

Results and discussion

The experiment intends to present all the visualization schemes from different subsets. In order to avoid biases in the experiment, all the scenes are not presented sequentially but through a process of confounding. With the nodes and edges as defined in Figure 1, 12 Hamiltonian paths are calculated using Mathematica (Table 6). Sixty participants agreed to take part in the experiment, for whom outputs from any one of the five data-subsets were chosen and were shown as per a chosen Hamiltonian path from those available.

Data preparation

The user interface designed for the experiment stored the scores received from each of the participants in the experiment in separate files. Each of these files were read into a corresponding LibreOffice Calc sheet. A LibreOffice BASIC code was then used to collate the results for each of the questions into a single sheet. Since there were six questions, six different sheets were created through the BASIC code. Each of the sheets thus contained a 60 x 13 matrix.

Data analysis

In order to determine whether parametric methods like ANOVA or MANOVA could be applied, it was first checked whether the data obtained through the experiment followed the normal distribution. This was done using the Pearson's [chi square] goodness of fit test and histograms. The results showed that the data were not normally distributed. Therefore, it was decided that non parametric statistics be used. With respect to the formulas and tables described earlier, we have k = 13 different processes and n = 60 participants. Three different tests were applied to examine which methods have performed the best in terms of feature and depth perception. For all purposes, we set the level of significance [alpha] as 0.05.

Friedman's test

The Friedman's test (Friedman 1937, 1939) is first used to test the null hypothesis. The results of the Friedman Test for each of the questions are presented in Table 7. With the level of significance [alpha] = 0.05, it is seen that the null hypothesis cannot be accepted for all of the questions.

Kruskal-Wallis test

The Kruskal-Wallis test (Kruskal and Wallis 1952) and the subsequent post hoc analysis (Kvam and Vidakovic 2007) are used to determine performance differences in terms of depth and feature perception for the different visualization schemes. The results of the Kruskal-Wallis test for each of the questions are given in Table 7. With the level of significance [aplpha] = 0.05, it is seen that the null hypothesis cannot be accepted for all of the questions.

Post hoc analysis and question wise rankings

The post hoc anaylsis of the Friedman's test and the Kruskal-Wallis test are used to determine the performance differences in terms of depth and feature perception for the different visualization schemes (Gibbons and Chakraborti 2010). The hypothetical rankings determined through the post hoc analysis are confirmed using the Page's trend test (Page 1963). The hypothetical rankings from the post hoc analysis and their confirmation using the Page's trend test are presented in Table 8. The ties in this scenario are determined using Equation (2) for the Friedman's test, and using Equation (4) for the Kruskal-Wallis test.

It is seen that the rankings for both the tests and for all the questions are consistent as shown by Spearman's P, Kendall's [tau], and Goodman and Kruskal's [gamma], rank correlation coefficients. For all the questions, it is seen that the P-values for the Page's trend test are lesser for the rankings obtained from the post hoc analysis of the Friedman's test than that of the post hoc analysis of the Kruskal-Wallis test. It can therefore be said that the rankings for the Friedman's test are more acceptable at a significance level of [alpha] = 0.05.

Overall rankings

In the process for question-wise rankings, each of the rows were ranked for conducting the Friedman's test (Table 4). To obtain the overall rankings using the Friedman's test, for the various processes listed in Table 2, the question wise rankings are used. The same null hypothesis test and the post hoc analysis are conducted again on the question-wise rankings. It is found that the Q-statistic for this test is 819.6610 and the P-value is 1.0047E-167. It is however seen that the Friedman's test, in this case, is not able to give a very good ranking inspite of the processes being different, due to the fact that the right hand side of Equation (2) is a large number, thus giving rise to too many ties. Therefore, the Kruskal-Wallis test is used to obtain the overall rankings for the processes.

The question-wise average sums of ranks are used to obtain the overall rankings (Table 5). The question-wise average sums for all the questions are laid out row-wise and ranked independent of the process, as presented in Table 9. The Kruskal-Wallis test and the post hoc analysis are subsequently conducted. It is seen that the processes are different at a significance level [alpha] = 0.05, with the H-statistic being 253.4417, since the P-value is found out to be 2.6195E-047. If a post hoc analysis is conducted and ranking performed, it is seen that the heuristic-based model viewed in the anaglyph mode (AN-MDL) gets the the same rank as the PL-CAD process. Thus, it can be said that in terms of feature recognition and depth perception, the heuristic-based process (described in Ghosh and Lohani (2011)) has performed almost equal to the PL-CAD model which was obtained through the process of manual classification and feature generalization through professional software.

Discussion on question-wise and overall rankings

The experiment collected scores from 60 third year undergraduate students from the Department of Civil Engineering, Indian Institute of Technology, Kanpur, who agreed to participate in the experiment. None of the participants were aware of LiDAR technology. For a layman user, a good visualization scheme is one which represents the various features on the terrain as they appear in real world.

The features being perceived by the participants were buildings, trees, roads, and vehicles. Before analyzing the scores of the experiment, it was expected that the PL-CAD would perform the best in terms of depth and feature perception, since the features like buildings were manually reconstructed and the trees present on the terrain were replaced by Real Photorealistic Content cells. These tasks were performed using TerraScan, a module of the TerraSolid software.

It can be seen from Table 8 that anaglyph-based visualization schemes have generally received better rankings. Almost every participant expressed appreciation for the realistic representation of the data in the anaglyph mode of visualization. Table 8 also reveals that the point based visualization in the anaglyph mode has received lesser rankings.

Since laymen are not used to see the features of the terrain in the point cloud format, low ratings have been received by the point based visualization schemes namely PL-PTS and AN-PTS. It was also observed during the experiment that the participants usually zoomed out while visualizing the point based schemes.

The triangulation-based schemes namely PL-DTRI and AN-DTRI, give rise to pseudo walls of the buildings which are not vertical. However, the building roofs are rendered properly. PL-TDTRI and AN-TDTRI have the building roof hanging in the space and therefore have poor ratings. Further, PL-TDTET and AN-TDTET have smoother building roofs compared to the triangulation-based schemes, but they are still hanging in the space. The buildings in the terrain are reconstructed manually using the process of classification for PL-CAD. Thus, buildings are represented the best in the PL-CAD scheme, and also receive the best scores and ranks from the participants, closely followed by AN-MDL and AN-PMDL.

The trees present in the terrain, reflect the laser beams not only from the periphery of their respective canopies, but also from within it. The Delaunay triangulation only considers one of the points which have the same values of x and y coordinates and different z coordinates. As a result, there is a loss of 3D information when a triangulation is generated from the raw data sets. Therefore, PL-DTRI and AN-DTRI do not render the trees properly. Also, the culling of the long triangles in PL-TDTRI and AN-TDTRI lead to the disappearance of the trees on the terrain. Consequently, these visualization schemes receive lower

ranks for the perception of trees. On the other hand, the visualization pipelines PL-TDTET and AN-TDTET were designed to address the problem faced by the triangulation algorithm in rendering trees. However, the canopies of the trees are found suspended in space. On the contrary, the trees are replaced by RPC models in PL-CAD. Effectively, the best ranks for trees are obtained by PL-CAD followed by AN-TDTET.

The 2-simplex-based and the heuristic-based themes are able to render the vehicles satisfactorily. On the other hand, the TerraScan module used for the process of classification did not have any specific routine for extracting vehicles. Therefore, the vehicles do not appear in 3D in PL-CAD. Eventually, PL-CAD received lower rankings for vehicles. It is to be noted here that although no separate treatment was given for vehicles in the development of the heuristics, vehicles are rendered satisfactorily as they are grouped as a part of the terrain during the clustering process. The roads were visible clearly in most of the visualization schemes and high scores were received from the participants for all of them.

Through the Friedman's test and subsequent post hoc analysis, it is seen that the overall rankings are best in terms of feature and depth perception for AN-MDL, followed by PL-CAD and AN-TDTET on second and third ranks. On the other hand, through the Kruskal-Wallis test and subsequent post hoc analysis, it is seen that the overall rankings are best for PL-CAD, followed by AN-MDL and AN-TDTET on second and third ranks. These rankings have also been verified statistically using the Page's trend test. The ranks assigned by Friedman's test and Kruskal-Wallis test are also found to be consistent using the rank correlation tests given in Spearman (1987), Kendall (1938), and Goodman and Kruskal (1954). However, it is seen from Table 8 that with the rankings calculated through the Friedman's test and post hoc process, the P-values are smaller compared to those obtained from the Kruskal-Wallis test and post hoc analysis. Therefore, the [[chi square].sub.L]-statistic obtained from the Friedman's test and post hoc analysis rejects the null hypothesis more strongly compared to the [[chi square].sub.L]-statistic obtained from the Kruskal-Wallis test and post hoc analysis. The rankings obtained from the Friedman's test and post hoc analysis are therefore more acceptable.

The overall rankings are calculated using the Kruskal-Wallis test and its corresponding post hoc analysis, as the post-hoc analysis of the Friedman's test does not give comprehensive information for overall ranking. The question-wise column averages for the earlier Kruskal-Wallis test are used as an input for the overall ranking. The results are presented in Table 9. From the rankings obtained it can be said that the heuristic-based pipeline AN-MDL (Ghosh and Lohani 2011) performs almost similar to the PL-CAD pipeline where manual classification and reconstruction are involved, in terms of feature and depth perception.

A good choice for a processing pipeline

Based on the rankings obtained through the statistical experiments and analysis of data, it is seen that ANMDL and PL-CAD are almost equivalent pipelines, in terms of feature and depth perception. The heuristic-based processing pipeline AN-MDL (Ghosh and Lohani 2011) takes 600 seconds (about 10 minutes), whereas the PL-CAD processing method takes about 10,800 seconds (about 3 hours) of computational time for execution. Therefore, the heuristic-based processing and visualization method, AN-MDL, is a good choice, amongst those tested in this study.

Limitations with the study and the experiment

Size of the data set

The five data sets used in the study contain various terrain features like buildings, trees, roads, and vehicles in addition to the ground. Each of these data sets had an aerial dimension of 100 m x 100 m. Given the limited hardware specifications, the algorithms studied in Ghosh and Lohani (2007a, 2007b) and Ghosh and Lohani (2011) can be extended to larger data sets by applied specialized data structures and tiling. The tiling in the data sets could be done so as to fit the memory of the computer for one iteration of the algorithm.

Participants in the experiment

The participants in the experiment were from an almost homogeneous group, i.e. the third year undergraduate students of the Department of Civil Engineering, IIT, Kanpur. The participants neither had exposure to viewing LiDAR data nor were they experienced in visualizing scenes in stereo modes. The survey-based comparison, studied and evaluated in this paper, would have been more insightful if a more diverse population was included. Literature found in the domain of comparing visualization methods, have designed experiments based on tasks assigned to the participants and grouping the participants into different categories, e.g. laymen, little experienced, very experienced, and experts. Therefore, in order to develop more insight into the comparison of various visualization schemes for LiDAR data, future studies should involve participants from diverse fields as well as task-based experiments (see Tobon (2005)).

Coloring element in the scenes presented to the participants

All the visualization schemes studied in this paper used the aerial photograph for draping. In future studies, the design of the experiment could be extended to include schemes for height-based coloring or uniform coloring also. In addition, parameters like existence of light and shadow could also be added.

Conclusion

Literature suggests that several methods of visualizing LiDAR data and derived products have been proposed by the scientific community. This paper has presented a method for comparing various visualization pipelines for LiDAR data and ranking them. In fact, this is the first such study for designing an experiment and comparing visualization pipelines for LiDAR data. This was done through the design of an experiment and subsequent data analysis. The Friedman's test (FT), the Kruskal-Wallis (KW) test and their respective post hoc analyses are used repeatedly for ranking according to individual parameters, and then these ranks are used again to find out the overall rankings. It is expected before the conduction of the experiment that the PL-CAD model would perform the best in terms of feature recognition and depth perception. It is indeed proven so in this paper. But further, it is also seen that the novel pipeline developed in Ghosh and Lohani (2011) performs almost equivalent to the PL-CAD model. This novel pipeline automatically transforms LiDAR data set covering an area 100 m x 100 m, to a suitable format for visualization in 600 seconds, whereas for the same data set, the process of generating the PL-CAD video starting from a raw data set takes about 3 hours (10,800 seconds). We see that the data obtained from the experiment is multivariate, and therefore studies using multivariate non parameteric methods for ranking and selection of visualization schemes would be presented in future. The limitations in the study, as presented in section "Limitation with the study and the experiment" would also be alleviated in future research work.

http://dx.doi.org/10.1080/15230406.2014.880071

Appendix

The YouTube links for the various outputs from the 5 data sets listed earlier have been given in the page http://home.iitk.ac.in/~blohani/Datadetails.html

Acknowledgements

The authors are grateful to Optech Inc., Canada for providing the data used for this research work. The authors also express their gratitude to the two anonymous reviewers for suggesting improvements to this paper.

References

Barber, C. B., D. P. Dobkin, and H. Huhdanpaa. 1996. "The Quickhull Algorithm for Convex Hulls." ACM Transactions on Mathematical Software 22 (4): 469-483.

Dawes, J. 2008. "Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales." International Journal of Market Research 50 (1): 61-77.

Dykes, J., A. M. MacEachren, and M. J. Kraak. 2005. Exploring Geovisualization. Oxford: Elsevier Academic Press. Chap. 1.

Ester, M., H.-P. Kriegel, J. Sander, and X. Xu. 1996. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), edited by E. Simoudis, J. Han, and U. Fayyad, 226-231. Menlo Park, CA: AAAI Press.

Finney, D. J. 1980. Statistics for Biologists. London: Chapman and Hall.

Friedman, M. 1937. "The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance." Journal of the American Statistical Association 32 (200): 675-701.

Friedman, M. 1939. "A Correction: The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance." Journal of the American Statistical Association 34 (205): 109.

Garland, R. 1991. "The Mid-Poim on a Rating Scale: Is It Desirable?." Marketing Bulletin 2: 66-70. Research Note 3.

Ghosh, S., and B. Lohani. 2007a. "Development of a System for 3D Visualization of LiDAR Data." In Map World Forum, edited by H. Samanta, Hyderabad. Noida: Geospatial World.

Ghosh, S., and B. Lohani. 2007b. "Near-Realistic and 3D Visualisation of LiDAR Data." International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVI-4AV45, Joint Workshop "Visualization and Exploration of Geospatial Data", Stuttgart, June 26-29.

Ghosh, S., and B. Lohani. 2011. "Heuristical Feature Extraction from LiDAR Data and Their Visualization." In Proceedings of the ISPRS Workshop on Laser Scanning 2011, Vol. XXXVIII, August, edited by D. D. Lichti, and A. F. Habib. University of Calgary, Canada: International Society of Photogrammetry and Remote Sensing.

Ghosh, S., and B. Lohani. 2012. "Experimental Evaluation of LiDAR Data Visualization Schemes." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 1-2: 135-140.

Ghosh, S., and B. Lohani. 2013. "Mining LiDAR Data with Spatial Clustering Algorithms." International Journal of Remote Sensing 34 (14): 5119-5135.

Gibbons, J. D., and S. Chakraborti. 2010. Nonparametric Statistical Inference. Vol. 5. Bocan Raton, FL: Taylor and Francis Group.

Goodman, L. A., and W. H. Kruskal. 1954. "Measures of Association for Cross Classifications." Journal of the American Statistical Association 49 (268): 732-764.

Kendall, M. 1938. "A New Measure of Rank Correlation." Biometrika 30 (1-2): 81-89.

Kreylos, O., G. Bawden, and L. Kellogg. 2008. "Immersive Visualization and Analysis of LiDAR Data." In Advances in Visual Computing, edited by G. Bebis, R. Boyle, B. Parvin, D. Koracin, P. Remagnino, F. Porikli, J.

Peters, J. Klosowski, L. Ares, Y. Chun, T. M. Rhyne, and L. Monroe. Vol. 5358 of Lecture Notes in Computer Science, 846-855. doi:10.1007/978-3-540-89639-5. Berlin: Springer.

Kruskal, W. H., and W. A. Wallis. 1952. "Use of Ranks in One-Criterion Variance Analysis." Journal of the American Statistical Association 47 (260): 583-621.

Kvam, P. H., and B. Vidakovic. 2007. Non Parametric Statistics with Applications to Science and Engineering. Hoboken, N J: John Wiley & Sons.

Lattuada, R. 2006. "Three-Dimensional Representations and Data Structures in GIS and AEC." In Large-Scale 3d Data Integration: Challenges and Opportunities, edited by S. Zlatanova and D. Prosperi. Boca Raton, FL: Taylor & Francis, Inc. Chap. 3.

Lin, H., and Q. Zhu. 2006. "Virtual Geographic Environments." In Large-Scale 3d Data Integration: Challenges and Opportunities, edited by S. Zlatanova and D. Prosperi. Boca Raton, FL: Taylor & Francis, Inc. Chap. 9.

MacEachren, A. M., and M. J. Kmak. 2001. "Research Challenges in Geovisualization." Cartography and Geographic Information Science 28 (1): 3-12.

Page, E. B. 1963. "Ordered Hypotheses for Multiple Treatments: A Significance Test for Linear Ranks." Journal of the American Statistical Association 58 (301): 216-230.

Richter, R., and J. Dollner. 2010. "Out-of-Core Real-Time Visualization of Massive 3D Point Clouds." In Proceedings of the 7th International Conference on Computer Graphics, Virtual Reality Visualisation and Interaction in Africa, 121-128. AFRIGRAPH '10, Franschhoek, South Africa New York, NY, USA: ACM.

Spearman, C. 1987. "The Proof and Measurement of Association between Two Things." The American Journal of Psychology 100 (3-4): 441-471.

Tobon, C. 2005. "Evaluating Geographic Visualization Tools and Methods: An Approach Based Upon User Tasks." In Exploring Geovisualization, edited by J. Dykes, A. M. MacEachren, and M. J. Kraak, 645-666. Oxford: Elsevier Academic Press. Chap. 34.

Vajta, L., T. Urbancsek, F. Vajda, and T. Juhasz. 2007. "Comparison of Different 3D (Stereo) Visualization Methods--Experimental Study." In Proceedings of the 6th EUROSIM Congress on Modelling and Simulation, Ljubljana, Slovenia, September 9-13. ISBN 978-3-901608-32-2.

van Oosterom, P., J. Stoter, and E. Jansen. 2006. "Bridging the Worlds of CAD and GIS." In Large-Scale 3d Data Integration: Challenges and Opportunities, edited by S. Zlatanova, and D. Prosperi. Boca Raton, FL: Taylor & Francis, Inc. Chap. 1.

Worcester, R. M., and T. R. Burns. 1975. "A Statistical Examination of the Relative Precision of Verbal Scales." Journal of Market Research Society 17 (3): 181-197.

Zlatanova, S. 2000. "3D GIS for Urban Development." PhD Thesis, ITC, Netherlands.

Zlatanova, S., and E. Verbree. 2000. "A 3D Topological Model for Augmented Reality." In Proceedings of the Second International Symposium on Mobile Multimedia Systems and Applications, 19-26, November 9-10, Delft, The Netherlands. Accessed December 31, 2013.http://www.gdmc.nl/zlatanova/thesis/html/refer/ps/sz_ev_mmsa.pdf

Suddhasheel Ghosh (a) *, Bharat Lohani (b) and Neeraj Misra (c)

(a) Department of Civil Engineering, MGM's Jawaharlal Nehru Engineering College, Aurangabad, India; (b) Department of Civil Engineering, Indian Institute of Technology Kanpur, Kanpur, India; (c) Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, Kanpur, India

(Received 2 September 2013; accepted 31 December 2013)

* Corresponding author. Email: suddhasheel@gmail.com

Table 1. Description of subsets used in the study. ID Number of points 1 49,666 2 48,637 3 48,254 4 43,822 5 47,644 Table 2. Processing methods and their notations. Process 2.5D 3D Point based visualization PL-PTS AN-PTS Delaunay triangulation PL-DTRI AN-DTRI Trimmed Delaunay PL-TDTRI AN-TDTRI triangulation Delaunay tetrahedralization PL-TDTET AN-TDTET followed by exploding and trimming Heuristic-based processing PL-PMDL AN-PMDL with dome-shaped clusters displayed as points Heuristic-based processing PL-MDL AN-MDL full generalization CAD model generated using PL-CAD TerraScan Table 3. Scores given by n participants. Participants Process 1 2 ... k 1 [s.sub.11] [s.sub.12] ... [s.sub.1k] 2 [s.sub.21] [s.sub.22] ... [s.sub.2k] . . . ... . . . . ... . . . . ... . n [s.sub.n1] [s.sub.n2] ... [s.sub.nk] Table 4. Ranks for each of the n participants. Process Participants 1 2 1 [r.sup.f.sub.11] [r.sup.f.sub.12] 2 [r.sup.f.sub.21] [r.sup.f.sub.22] . . . . . . . . . n [r.sup.f.sub.n1] [r.sup.f.sub.n2] Sums [R.sub.1] [R.sub.2] Means [[bar.r].sub.1] [[bar.r].sub.2] Process Participants ... k 1 ... [r.sup.f.sub.1k] 2 ... [r.sup.f.sub.2k] . . . . . ... . n ... [r.sup.f.sub.nk] Sums ... [R.sub.k] Means ... [[bar.r].sub. k] Table 5. Ranks for each of the n participants. Process Participants 1 2 1 [r.sup.K.sub.11] [r.sup.K.sub.12] 2 [r.sup.K.sub.21] [r.sup.K.sub.22] . . . . . . . . . n [r.sup.K.sub.1] [r.sup.K.sub.n2] Sums [R.sup.K.sub.1] [R.sup.K.sub.2] Means [[bar.r].sup.K.sub.1] [[bar.r].sup.K.sub.2] Process Participants k 1 ... [r.sup.K.sub.1k] 2 [r.sup.K.sub.2k] . . . . . ... . n ... [r.sup.K.sub.nk] Sums ... [r.sup.K.sub.k] Means ... [[bar.r].sup.K.sub.k] Table 6. Hamiltonian paths calculated using Mathematica. Path-ID Sequence of nodes 1 1 8 9 2 3 10 11 2 3 2 1 8 9 10 11 3 5 4 3 2 1 8 9 4 7 6 5 4 3 2 1 5 7 6 13 12 5 4 3 6 7 6 13 12 5 4 11 7 7 6 13 12 5 4 11 8 7 6 13 12 5 4 11 9 7 6 13 12 11 10 9 10 9 8 1 2 3 10 11 11 11 10 9 8 1 2 3 12 13 12 11 10 9 8 1 Path-ID Sequence of nodes 1 4 5 12 13 6 7 2 4 5 12 13 6 7 3 10 11 12 13 6 7 4 8 9 10 11 12 13 5 2 1 8 9 10 11 6 10 3 2 1 8 9 7 10 3 2 9 8 1 8 10 9 8 1 2 3 9 8 1 2 3 4 5 10 4 5 12 13 6 7 11 4 5 12 13 6 7 12 2 3 4 5 6 7 Table 7. Q and H statistics and corresponding P-values for all the questions. Friedman's test Q. no. Q statistic P-value 1 3022.5772 [approximately equal to]0.00E + 000 2 2662.1177 [approximately equal to]0.00E + 000 3 2543.9386 [approximately equal to]0.00E + 000 4 2338.7483 [approximately equal to]0.00E + 000 5 2444.8502 [approximately equal to]0.00E + 000 6 1610.1333 [approximately equal to]0.00E + 000 Kruskal-Wallis test Q. no. H statistic P-value 1 141.9547 2.4132E - 024 2 133.8197 1.0541E - 022 3 131.3701 3.2757E - 022 4 126.9111 2.5692E - 021 5 106.1568 3.4340E - 017 6 71.4586 1.7070E - 010 Table 8. Question-wise ranks for the Friedman's test (FT) and Kruskal-Wallis test (KW), their confirmation using the Page's trend test and their consistencies using Spearman's [rho], Kendall's [tau], and Goodman and KruskaPs [gamma]. Question 1 Question 2 FT KW FT KW PL-PTS 13 13 13 13 PL-DTRI 8 7 8 8 PL-TDTRI 12 12 11 11 PL-DTET 10.5 10 11 10 PL-PMDL 10.5 11 7 7 PL-MDL 9 9 9 9 PL-CAD 2.5 2 1 1 AN-PTS 7 8 12 12 AN-DTR1 4 3 5 4 AN-TDTR1 6 6 6 6 AN-DTET 2.5 4 2 3 AN-PMDL 5 5 4 5 AN-MDL 1 1 3 2 [[chi square] 8.49E3 2.15E2 8.48E3 1.94E2 .sub.L] P-Value 0.00E+00 1.2E-48 0.00E+00 4.3E-44 [rho] 0.9835 0.9890 [tau] 0.8974 0.9487 [gamma] 0.8974 0.9487 Question 3 Question 4 FT KW FT KW PL-PTS 13 13 12 12 PL-DTRI 9 8 10 9 PL-TDTRI 12 12 13 13 PL-DTET 11 10 9 8 PL-PMDL 0 6 11 11 PL-MDL 7 7 8 10 PL-CAD 1 1 1 1 AN-PTS 10 11 0 7 AN-DTR1 4 4 4 4 AN-TDTR1 8 9 5 6 AN-DTET 5 5 2 2 AN-PMDL 2 3 7 5 AN-MDL 3 2 3 3 [[chi square] 8.51E3 1.87E2 8.49E3 1.53E2 .sub.L] P-Value 0.00E+00 1.20E-42 0.00E+00 3.4E-35 [rho] 0.9835 0.9670 [tau] 0.9231 0.8974 [gamma] 0.9231 0.8974 Question 5 Question 6 FT KW FT KW PL-PTS 13 13 13 13 PL-DTRI 8 8 7 7 PL-TDTRI 10.5 11 11 11 PL-DTET 10.5 10 10 9 PL-PMDL 9 7 8 8 PL-MDL 7 9 9 10 PL-CAD 3 1 6 6 AN-PTS 12 12 12 12 AN-DTR1 2 3 3 3 AN-TDTR1 6 6 5 5 AN-DTET 5 5 2 2 AN-PMDL 4 4 4 4 AN-MDL 1 2 1 1 [[chi square] 8.49E3 1.67E2 8.51E3 1.19E2 .sub.L] P-Value 0.00E+00 3.7E-38 0.00E+00 8.5E-28 [rho] 0.9601 0.9945 [tau] 0.8461 0.9743 [gamma] 0.8461 0.9743 Table 9. Overall rankings by using average column ranks for all questions and all the visualization schemes. Rankings are confirmed through the Page's trend test. Q. no. PL-PTS PL-DTRI PL-TDTRI PL-DTET 1 76 56 67 63 2 77 44 66 60 3 75 45 71 62 4 69 52 70 51 5 78 43 53 49 6 74 33 58 48 Sum 449 273 385 333 Avg 74.83 45.50 64.17 55.50 H 253.4417 P-value 2.6195E-047 (Null hypothesis not accepted) Rank 13 7.5 10.5 10.5 [[chi square] 59.4170 .sub.L] P[[[chi square] 1.2756E-014 .sub.1] [greater than or equal to] [[chi square].sub.L]] Q. no. PL-PMDL PL-MDL PL-CAD AN-PTS AN-DTRI 1 64 59 5 57 10 2 38 50 3 68 17 3 35 37 2 65 23 4 61 55 1 42 30 5 40 47 9 73 18 6 36 54 32 72 22 Sum 274 302 52 377 120 Avg 45.67 50.33 8.67 62.83 20.00 H P-value Rank 7.5 10.5 1.5 10.5 4 [[chi square] .sub.L] P[[[chi square] .sub.1] [greater than or equal to] [[chi square].sub.L]] Q. no. AN-TDTRi AN-DTET AN-PMDL AN-MDL 1 31 13 16 4 2 34 15 19 8 3 46 29 7 6 4 41 14 39 25 5 27 26 20 12 6 28 21 24 11 Sum 207 118 125 66 Avg 34.50 19.67 20.83 11 H P-value Rank 6 4 4 1.5 [[chi square] .sub.L] P[[[chi square] .sub.1] [greater than or equal to] [[chi square].sub.L]]

Printer friendly Cite/link Email Feedback | |

Author: | Ghosh, Suddhasheel; Lohani, Bharat; Misra, Neeraj |
---|---|

Publication: | Cartography and Geographic Information Science |

Article Type: | Author abstract |

Geographic Code: | 9INDI |

Date: | Mar 1, 2014 |

Words: | 8447 |

Previous Article: | A weighted multi-attribute method for matching user-generated Points of Interest. |

Next Article: | Model generalization of two different drainage patterns by self-organizing maps. |

Topics: |