# Random Forests-Based Operational Status Perception Model in Extra-Long Highway Tunnels with Longitudinal Ventilation: A Case Study in China.

1. IntroductionBy the end of 2016, 815 extra-long highway tunnels with a total length of 3622.7 km were built in China [1]. Owing to the influence of traffic volume and fleet composition, vehicle emissions accumulate sequentially. These emissions are difficult to disperse, especially in the case of extra-long highway tunnels with high traffic loads and frequent traffic congestions. Tunnel ventilation has become the primary problem during operation periods.

For road tunnels, there exist several, very different approaches to ventilation concepts [2]. They have common objectives, opposite in nature: (a) the pollution levels within admissible margins and (b) the energy consumption for ventilation facilities to fulfill objective (a) should be minimal. Under some circumstances, it is difficult to meet both objectives concurrently by using simple ventilation control algorithms [3]. Thus many advanced control methodologies have been proposed in recent decades.

Appropriate and accurate ventilation control systems can not only decrease energy consumption and save operation cost but also provide drivers with a comfortable and safe driving environment. Standard linear feed-backward control was applied in early ventilation automatic control schemes such as PI or PID. However, these conventional control schemes reach their limits of applicability as soon as nonlinear effects become increasingly dominant. Funabashi et al. (1991) and Koyama et al. (1993), respectively, proposed ventilation control systems for longitudinal ventilation road tunnels with nonlinear programming and fuzzy control applications [4, 5]. Chen et al. (1998) designed a fuzzy logic control model for prediction of pollutant concentrations and adjustment of jet fans [6]. Chu et al. (2008) demonstrated a genetic algorithm in combination with fuzzy control to maintain an adequate level of the pollutants and minimize power consumption [7]. Bogdan et al. (2008) developed a model predictive and fuzzy control algorithm for a longitudinal ventilation system [8]. The predictive controller estimates fresh air requirements (depending on traffic and weather conditions) and calculates the number of necessary jet fans, while the fuzzy controller compares measured and admissible levels of pollutants and adjusts a predicted number of jet fans to keep the pollutant levels within predefined boundaries. Euler-Rolle et al. (2017) applied a model based nonlinear dynamic feedforward control in the longitudinal tunnel ventilation to enhance standard feedback control and improve the closed-loop behavior [9]. However, all these contributions focused rather on the specific pollutants control than on the overall control and dynamic characteristics of in-tunnel operational status. Unchanging ventilation mode and unreasonable control strategy lead to enormous energy consumption and economic loss [10].

The in-tunnel operational status can be considered as a result generated by the combined action of four transportation elements, including the driver, vehicle, road, and environment. Li et al. (2015) focused on the diffusion properties of CO, NO, and [PM.sub.2.5] influenced by in-tunnel traffic force [11]. Yamada et al. (2016) and Martin et al. (2016) concentrated on the impact of in-tunnel tunnel environment (e.g., N[O.sub.2] level and particle number concentrations) on the driver and the passenger [12,13]. Up till the present moment, the quantified segmentation criteria for evaluating the operational status in extra-long highway tunnels have not been enacted. Meanwhile, the analysis and mining of the in-tunnel operational status by deeply combining the real-time traffic flow and environmental information have also seldom been studied.

The decision tree is a classical classification algorithm, which is essentially a data recursive partitioning process based on a series of rules. Since the single decision tree has some drawbacks, such as low precision and overfitting, the ensemble learning method, which summates simple machine learning algorithms to produce better predictive performance than could be achieved by the most sophisticated solutions, has become popular in research in the field of machine learning. Practitioners created various solutions to improve a decision tree by replicating it many times and averaging results. For classification task, the ensemble can be used as a voting system, choosing the most frequent response class as an output for all its replications.

Aiming at finding the best way to replicate the trees in an ensemble, Breiman (1996) tested the effects of bootstrap sampling (sampling with replacement), which not only leaves out some noise but also creates more variation in the ensembles, improving the results. This technique is called "bootstrap aggregating" and use the acronym bagging [14]. Noticing that results of an ensemble of trees improved when the trees differ significantly from each other, Breiman (2001) proposed a new ensemble model, Random Forests (RF), which add a layer of randomness to bagging [15,16]. Random Forests change how the classification or regression trees are constructed by constructing each tree using a different bootstrap sample of the data, which turns out to perform very well compared with many other classifiers, including discriminant analysis, support vector machines, and neural networks, and is robust against overfitting [17].

The main goal of this work was to fuse the in-tunnel traffic flow data (such as fleet segmentation and traffic volume) and ambient air data (such as the concentrations of toxic gas and particular matter and air velocity) based on big data technology and to build a Random Forests-based perception model realizing accurate prediction of the intunnel operational status.

2. Material and Methods

2.1. Operational Monitoring Data. The Xi'an-Hanzhong Expressway (Xihan Expressway) is one of the most critical sections of the G5 Beijing-Kunming Expressway (a part of the China National Expressway Network, commonly known as the Jingkun Expressway), which connects north and southwest China, in Shaanxi province. A critical controlling project in the Xihan Expressway, the Qin Mountains tunnel group (Figure 1), comprises three extra-long highway tunnels, No. 1 tunnel, No. 2 tunnel, and No. 3 tunnel, passing through the Qin Mountains. The mountains are the most important geographical entities that divide northern and southern China.

No. 1 tunnel is a twin-bore tunnel with unidirectional traffic in each bore. The tunnel comprises southbound (SB) and northbound (NB) tunnels, with each direction having two lanes for motor vehicles. Figure 2 depicts the overall structure of the SB tunnel. In total, 11 lay-bys (emergency parking bays), numbered from ESA-1 to ESA-11, have been built along the length of the tunnel. The ventilation mode is longitudinal and is powered by 30 jet fans; an inclined shaft is reserved, and the air supply and exhaust system with additional axial fans had not yet been equipped. Since it was constructed and opened to traffic in 2007, the traffic has consistently increased. Among all vehicles, heavy-good vehicles (HGV) have shown the most notable increase.

In this study, four lay-bys (ESA-1, ESA-4, ESA-8, and ESA-11) were selected as the monitoring or data-collection sites in the driving direction. A real-time monitoring experiment for the operational environment was performed from Nov 27, 2016, to Dec 3,2016. Through the experiment, raw monitoring data of the operational status were obtained. The details of data and monitoring instruments are listed in Table 1.

Raw monitoring data were preprocessed at statistical or resampling intervals of 15 min. That is, traffic flow data were converted to cumulative values every 15 min. The other monitoring data were calculated as average values for each 15 min interval. Finally, the statistical dataset of the operational environment was obtained.

The proportions of passenger cars (PC), light-duty vehicles (LDV), and HGV were 29.46%, 3.21%, and 67.32%, respectively. LDV had the lowest proportion, which smoothly changed; hence, its impact on the in-tunnel operational status could be ignored. The Pearson correlation coefficients indicated that PC had weak positive correlations with CO and N[O.sub.2]. Consequently, the impact of PC on the in-tunnel operational status could also be ignored. Finally, only HGV was retained in the traffic flow data. Profiles of pollutant concentration in the driving direction exhibited a triangular distribution characteristic, increasing consistently from the tunnel entrance to the tunnel exit; this characteristic is consistent with the conventional wisdom of longitudinal ventilation systems. In conclusion, five types of data collected from ESA-11, the monitoring site with the highest degree of pollution, were selected as the sample dataset; these data were the CO, N[O.sub.2], air velocity, [PM.sub.2.5], and HGV data.

2.2. Clustering Method. A five-dimensional space was obtained from the sample dataset. Clustering analysis for the operational status is the task of grouping the sample dataset in such a way that status data in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other clusters. In centroid-based clustering, the task can be summarized as finding the cluster centers and assigning the sample data to the nearest cluster center such that the squared distances from the cluster are minimized and thus obtaining a classification method for multiclass operational statuses.

Fuzzy C-Means (FCM) clustering is a fuzzy clustering algorithm based on an objective function; this algorithm was developed by Dunn [18] and improved by Bezdek [19]. Given its advantages in big data applications, FCM clustering was chosen in this study. Consider that the ith sample data [x.sub.i] = ([x.sub.i1], [x.sub.i2], [x.sub.i3], [x.sub.i4], [x.sub.i5]) denote a five-dimensional monitoring result, namely, the values of CO, N[O.sub.2], air velocity, [PM.sub.2.5], and HGV. The sample dataset containing N measured values is denoted by X. Then X can be expressed by a N x 5 matrix, as shown in the following:

[mathematical expression not reproducible] (1)

The FCM aims to minimize the following objective function:

[mathematical expression not reproducible] (2)

where k is a preset number of operational status, i.e., cluster numbers; v is the sequence number of a cluster; [m.sub.v] is the center of the cluster v; [u.sup.2.sub.iv] stands for the unknown membership of sample [x.sub.i] in cluster v with a membership exponent 2 to determine the level of cluster fuzziness; [[parallel][x.sub.i] - [m.sub.v][parallel].sup.2] denotes the squared Euclidean distance between [x.sub.i] and [m.sub.v]; f is the sequence number of five-dimensional space; and [m.sub.v1], [m.sub.v2], [m.sub.v3], [m.sub.v4], and [m.sub.v5] represent the values of cluster center [m.sub.v] corresponding to CO, N[O.sub.2], air velocity, [PM.sub.2.5] , and HGV, respectively. Cluster center [m.sub.vf] canbe calculated by the following equation:

[mathematical expression not reproducible] (3)

Kaufman and Rousseeuw (2008) proposed a new fuzzy clustering algorithm FANNY based on FCM [20]. The FANNY algorithm has some definite advantages over FCM: lower sensitivity to outliers or otherwise erroneous data and better recognition of nonspherical clusters. In the FANNY algorithm, the following equation is derived from (2):

[mathematical expression not reproducible] (4)

where d([x.sub.i], [x.sub.j]) represents the given distances (or dissimilarities) between samples [x.sub.i] and [x.sub.j]; Euclidean distance is in common use. Each pair is encountered twice because d([x.sub.j], [x.sub.i]) also occurs, and the factor 2 in the denominator compensates for this duplicity. The membership function is subject to the following constraints:

[u.sub.iv] [greater than or equal to] 0, i = 1, ..., N; [k.summation over (v=1)][u.sub.iv] = 1, i =1, ... N. (5)

The optimization problem is solved as shown in (4) to calculate and obtain the membership coefficients of all samples in every cluster [u.sub.iv](1 [less than or equal to] i [less than or equal to] N,1 [less than or equal to] v [less than or equal to] k) and each cluster center [m.sub.v]. Thus, each sample is assigned to the cluster in which it has the largest membership, and the fuzzy clustering is completed.

2.3. Perception Model

Definition 1 (the perception of in-tunnel operational status). Given a training set T = {([x.sub.1], [y.sub.1]), ..., ([x.sub.N], [y.sub.N])} [member of] [([X.sup.5] x Y).sup.N], [x.sub.i] [member of] [X.sup.5] is the ith sample in the training set and it includes the values of CO, N[O.sub.2], air velocity, [PM.sub.2.5], and HGV; [y.sub.i] [member of] Y = {[c.sub.1], [c.sub.2], [c.sub.3], [c.sub.4]} corresponds to one of the four operational statuses of the ith sample--lightly polluted, moderately polluted, heavily polluted, and severely polluted; and i = 1, ..., N is the serial number of the training set. According to algorithmic modeling [21], the target is to find a function f(x) : [X.sup.5] [right arrow] Y--an algorithm that operates on [X.sup.5] to predict the responses of in-tunnel operational status Y.

The ensemble of the Random Forests combining with clustering analysis is shown in Figure 3, in which clustering results of operational status are taken as inputs of Random Forests-based perception model. For perception model, first of all, bootstrap samples of size [n.sub.tree] with replacement from the training set are taken, and a new series of training subsets are formed by the bagging technique. Then, randomly select partial features in training subset for finding the best split variable whenever splitting the sample in a tree and create a complete tree using the bootstrapped examples. Next, compute the performance of each tree using examples that were not chosen in the bootstrap phase (out-of-bag data). Finally, calculate a vote on new cases when completing all the trees in the ensemble. Declare for each of them the winning class as a prediction.

2.4. Modeling Approach. There are the following two crucial parameters in Random Forests modeling, namely, [n.sub.tree] and [m.sub.try]:

(1) [n.sub.tree]--the number of trees to grow;

(2) [m.sub.try]--the number of variables randomly sampled as candidates at each split.

Herein, [n.sub.tree] determines the overall scale of the whole Random Forests, and [m.sub.try] defines the structure of a single decision tree. In other words, [n.sub.tree] and [m.sub.try] determine the construction of the Random Forests at macroscopic and microcosmic levels, respectively.

In R, the randomForest package provides an interface to the Breiman and Cutler's Fortran programs of Random Forests, and randomForest() function implements the algorithm for classification and regression [22]. The function prototype is as follows:

randomForest (formula, data, mtry, ntree, na.action)

in which formula describes the model to be fitted; data is a data frame containing the variables in the model; mtry is the number of variables randomly sampled; ntree is the number of decision trees; na.action specifies the action to be taken if NAs are found.

Since the bootstrap performs sampling with replacement from the training set, its probability to be chosen as the out-of-bag (OOB) sample is [(1 - 1 /N).sup.N]. For large N, the number of OOB samples is expected to be a fraction [e.sup.-1] [approximately equal to] 0.368 of the training set. It means each decision tree is grown by using approximately 1- [e.sup.-1] [approximately equal to] 63.2% of the training samples, leaving [e.sup.-1] [approximately equal to] 36.8% as the OOB samples. Since the OOB part of the data has not been used in tree construction, it can be used to estimate the ensemble prediction performance in the following way.

Let [D.sup.COB.sub.b] be the OOB part of the data for the fcth tree. Then use the bth tree to predict [D.sup.COB.sub.b]. Since each training sample [x.sub.i] is in an OOB sample set, on the average approximately [e.sup.-1] [approximately equal to] 36.8% of the time the ensemble prediction [[??].sup.COB]([x.sub.i]) can be calculated by aggregating only its OOB predictions. Calculate an estimate of the error rate (ER) for classification by

[mathematical expression not reproducible] (6)

where I(x) is the indicator function.

2.5. Evaluation Metric. A status set Y = {[c.sub.1], [c.sub.2], [c.sub.3], [c.sub.4]} is used to denote the four classes of the in-tunnel operational statuses--lightly polluted, moderately polluted, heavily polluted, and severely polluted. Then the confusion matrix (as shown in Table 2) is chosen to describe the classification performance.

In Table 2, [n.sub.i,j] denotes the number of actual statuses identified as [c.sub.j] by the classification model. The confusion matrix reflects the distribution of status set Y, among which the jth column reflects the precision of [c.sub.j] and ith row reflects the recall (also known as sensitivity) of [c.sub.i]. Thus, for the particular operational status, e.g., [c.sub.j], the precision ([mathematical expression not reproducible]) and recall ([mathematical expression not reproducible]) are calculated by the following.

[mathematical expression not reproducible] (7)

[mathematical expression not reproducible] (8)

Besides, the other evaluation metric is the harmonic average of the precision and recall and is called F-measure (F). It is calculated as follows:

[mathematical expression not reproducible] (9)

3. Results

3.1. Classification of Operational Status. Determining the optimal number of clusters is a fundamental issue in clustering analysis. In this study, this value was estimated by the optimum average silhouette width [23]. Suppose a data set is partitioned into k clusters, the silhouette width of sample [x.sub.i] is then defined as

[mathematical expression not reproducible] (10)

where A([x.sub.i]) is the average dissimilarity between [x.sub.i] and all other samples in the cluster to which [x.sub.i] belongs. Similarly, B([x.sub.i]) is the minimum average dissimilarity between [x.sub.i] and all other clusters to which [x.sub.i] does not belong. The average silhouette method computes the average silhouette width ([S.sub.k]) of all N samples for different values of k:

[S.sub.k] = [1/N] [N.summation over (i=1)] [S.sub.k] ([x.sub.i]), k = 1, 2, 3 ... (11)

The optimal number of clusters k is the one that maximizes the average silhouette width over a range of possible values for k.

The average silhouette widths with k = 1,2,3,4,5 for this study are shown in Figure 4. The silhouette plot shows that the k value of 3 corresponded to the maximum width, so the optimal number of in-tunnel operational status is 3 for the actual monitoring dataset. One of the four in-tunnel operational statuses did not appear in the experiment, so the next step is to verify which status was missing.

By applying FANNY algorithm to the preprocess data, three cluster centers are obtained using the following equation.

[mathematical expression not reproducible] (12)

In (12), the five elements in each row represent the values of cluster centers in the following order: CO (ppm), N[O.sub.2] (ppm), air velocity (m/s), [PM.sub.2.5] (mg/[m.sup.3]), and HGV (veh/ 15 min). The number ofHGV is 0 in the first cluster, and the N[O.sub.2] concentrations in the second and third cluster exceed the PIARC standard (1 ppm) [24]. Thus, the three rows represent the cluster centers of lightly polluted, heavily polluted, and severely polluted statuses. The moderately polluted status did not appear when the traffic volume increased slightly.

3.2. Modeling of Status Perception

3.2.1. Optimal Combined Parameter. Before tuning the parameters in the Random Forests, the in-tunnel operational dataset is divided into a training set and a test set in the ratio 7:3. The former was used for parameter tuning and variable importance calculation. The latter was used for model evaluation. The minimum OOB ER principle is considered to be the reference to optimize the combination of parameters [n.sub.tree] and [m.sub.try]. R implementation code performed on a desktop PC running Windows 10, with a 3.6 GHz Intel i7 quad-core CPU and 16 GB RAM is shown as in Algorithm 1.

Assuming [n.sub.tree] = 10,20, ..., 500 and [m.sub.try] = 1, 2, 3,4, 5, 250 combinations of [n.sub.tree] and [m.sub.try] are run iteratively, and the relation between the combined parameters and OOB estimate of ER is obtained as shown in Figure 5.

Figure 5 shows that OOB ER was largely influenced by parameter [n.sub.tree]; the error decreased with increasing [n.sub.tree], making the perception results more accurate. However, the time consumed for each iteration remained on the order of milliseconds, and, hence, the calculation time could be ignored. When [n.sub.tree] > 200, OOB ERs tended to converge. The parameter [m.sub.try] had less impact on the OOB ER. The results further validated that the Random Forests would be less likely to overfit, and the classification error would converge with an increasing number of decision trees. Consider the classification accuracy; an optimal combined parameter was identified as [n.sub.tree] = 500 and [m.sub.try] = 1, corresponding to 4.55% as an unbiased estimation of the OOBER. The R code of status perception modeling is shown below.

Status.rf <- randomForest(Status ~ ., data = train ingset, [m.sub.try] = 1, [n.sub.tree] = 500, na.action = na.omit) print(Status.rf)

3.2.2. Importance Measurement of Variables. Another important feature of the Random Forests is the measurement of variable importance, which allows ranking variables regarding the importance and optimizing the variable subset, thus avoiding the problems created by dimensionality and reducing the computational complexity. There are two indexes to measure variable importance: mean decrease accuracy (MDA) and mean decrease Gini-index (MDG). The former is defined as the average decrease between the percentage of votes for the correct class in the untouched OOB data and the percentage of votes for the correct class in the variable permuted OOB data averaged over all trees. The latter is defined as the total decrease in the Gini-index from splitting on the variable averaged over all trees [22, 25]. The bigger the MDA and MDG, the more important is the variable. The optimal combined parameter in the training set was applied, and the importance indexes of each variable were calculated, as shown in Figure 6.

Algorithm 1 # division of training set and test set set.seed(100) ind <- sample(2, nrow(Mydataset), replace = TRUE, prob = c(0.7,0.3)) trainingset <- Mydataset[ind==1, ] # training set accounts for 70% testset <- Mydataset[ind==2, ] # test set accounts for 30% # combined parameters in Random Forests library(randomForest) library(caret) M <- ncol(trainingset) ntree <- 10*c(1:50) result <- data.frame() # model training set.seed(100) for(m in 1:(M-1)){ for(n in ntree) { fit.rf <- randomForest(Status ~., data = trainingset, mtry = m, ntree = n, na.action = na.omit) OOB.ER <- 1-confusionMatrix(as.table(fit.rf$confusion[,c(-4)])) $overall["Accuracy"] result <- rbind(result, data.frame(m, n, OOB.ER)) } } print(result)

As seen from Figure 6, during the dynamic evolution process of in-tunnel operational status, the importance order of variables from largest to smallest is as follows: N[O.sub.2], CO, HGV, [PM.sub.2.5], and air velocity. N[O.sub.2] and CO, the two main types of gaseous pollutants, are the primary factors that affect the changes in the in-tunnel operational status.

3.3. Perception Results of Operational Status. In this study, the Naive Bayes, Support Vector Machine (SVM), and Random Forests-based perception model were applied to predict operational status in the test set. Evaluation metrics for these three models are listed in Table 3.

Naive Bayes classifier assumes that the value of a particular feature is independent of the value of any other feature, which is always invalid in practice. The naive design and apparently oversimplified assumption affect classification performance of Naive Bayes. SVM can efficiently perform a nonlinear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. However, a significant practical question, the selection of the kernel function parameters, is still not entirely solved. Table 3 indicates that the precision, recall, and F-measure for the Random Forests-based model were better than those for the Naive Bayes or SVM model. For further calculation, the average precision, recall, and F-measure in the Naive Bayes model were 94.85%, 86.79%, and 90.19%, respectively. In the SVM-based model, the average precision, recall, and F-measure were 96.20%, 90.83%, and 93.24%, respectively. In contrast, the average precision, recall, and F-measure in the Random Forests-based model were 98.83%, 95.52%, and 97.07%, respectively. The results validated that the Random Forests-based perception model offers the best performance among the three models, indicating its better adaptability to the dynamic changes of the operational status in extra-long highway tunnels.

4. Discussion

4.1. Optimal Number of Clusters. Determining the number of clusters in a dataset, a quantity often labeled k, is a frequent problem in data clustering and is a distinct issue from the process of actually solving the clustering problem. The correct choice of k is often ambiguous, with interpretations depending on the shape and the scale of distribution of points in a dataset and the desired clustering resolution of the user.

In this study, the silhouette method was chosen for assessing the natural number of operational statuses. Frankly, the determination of k = 3 was of a little subjectivity; k = 2 or k = 3, after all, was only slightly smaller than it. In consequence, long-term accumulation of in-tunnel operational monitoring data is crucial for the rational classification of operational status.

4.2. Management and Control Strategy for Ventilation and Traffic Flow. The perception model can be used to determine the real-time in-tunnel operational status by using a combination of pollutant concentration and traffic volume monitoring results. In the SB direction of Qin Mountains No. 1 tunnel, the percentages of the operational environment with heavily polluted and severely polluted statuses were 59% and 31%, respectively. The lightly polluted status contributed less than 10% of the operational environment. The box-plots presenting the distributions of CO, N[O.sub.2], air velocity, [PM.sub.2.5], and HGV under different statuses are shown in Figures 7(a)-7(e).

Although the moderately polluted status did not appear, the fluctuation range of CO (Figure 7(a)), N[O.sub.2] (Figure 7(b)), and [PM.sub.2.5] (Figure 7(d)) exhibited a tendency to intensify with the deterioration of the in-tunnel operational status, which was basically the same as the tendency for HGV. Figure 7(c) shows that there was a minimal number of HGV in regions with the lightly polluted status. Natural ventilation mode was used during that period, and, hence, the air velocities underwent a rather significant fluctuation influenced by the movement of vehicles (piston effect). For heavily and severely polluted statuses, all jet fans were turned on, and the air velocities were relatively stable (Figure 7(e)). Even so, the concentration of N[O.sub.2] still exceeded the PIARC standard.

The classification of in-tunnel operational statuses provides a scientific way to develop strategies for intelligent ventilation and traffic management and control. For lightly polluted status, consider switching off the fans and depending only on natural ventilation. For moderately polluted status, consider switching on the fans with a variable-frequency drive (VFD) to save energy consumption. For heavily polluted status, consider operating the jet fans at the fully open position and activating the axial fans in the inclined shaft in a timely manner. For severely polluted status, the in-tunnel air quality is terrible and the tunnel is filled with smog and smoke, threatening driving safety; therefore, all jet fans and axial fans should be fully operated. If the tunnel is operated under the severely polluted status for extended periods of time, temporary traffic control measures should be executed to ensure driving safety [26], for instance, limiting HGV powered by diesel engines passing through the tunnel or diverting them upstream of the tunnel.

4.3. Impact on Ecology and Environmental Management. The ecology and environmental impact of transportation are significant because transportation is a major consumer of energy and burns most of the world's petroleum. According to the annual report of Chinese Ministry of Environmental Protection, more than 246 million vehicles emitted 45.47 million tons of pollutants in China in 2014 [27]. Vehicle emissions have become one of the principal sources of air pollution and a significant cause of dust-haze and photochemical smog. Reducing transportation emissions will produce considerable positive effects on Earth's air quality, acid rain, smog, and climate change. Although stricter vehicle emission standards have been implemented, a vast number of old vehicles are still rolling down the road, exceeding the emission limit by several times. Consequently, effective measures should be made to accelerate the elimination of aging automobiles or retrofit them with approved pollutant control devices.

5. Conclusions

In this study, the operational monitoring data in an extra-long highway tunnel were analyzed in detail using big data technology. By combining monitoring results of CO, N[O.sub.2], air velocity, [PM.sub.2.5], and HGV, a data-driven model for in-tunnel operational status perception was structured. The major conclusions are as follows.

By applying the FANNY algorithm, the optimal number of clusters for obtaining the in-tunnel operational status was determined following the principle of maximum average silhouette width. Owing to the restriction of the total experimental duration, the clustering results did not contain all four operational statuses. Unfortunately, the moderately polluted status was not observed. The next step is to perform long-term monitoring of the in-tunnel operational environment and obtain massive data, thus realizing more scientific and reasonable classification of in-tunnel operational statuses.

A Random Forests-based perception model was built for determining in-tunnel operational status. Taking the perception accuracy into consideration primarily, an optimal combined parameter of the Random Forests was identified. Prediction results indicated that the proposed model was better than the contrast models and had the better adaptability to dynamic changes of operational status in extra-long highway tunnels, thus realizing accurate predictions.

The distribution of individual variable under different operational statuses were analyzed. The management and control strategies for ventilation and traffic flow under lightly polluted, heavily polluted, and severely polluted statuses were discussed. These strategies could help improve the operation and management level of extra-long highway tunnels and provide a scientific method to realize energy saving and emission reduction.

https://doi.org/10.1155/2018/5056284

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. 51678063 and Fundamental Research Funds for the Central Universities of Ministry of Education of China under Grants nos. 310832161006 and 310821173102. The authors thank the administration center of Qin Mountains tunnel group of Shaanxi provincial expressway construction group corporation Xihan branch for assistance with the experiments. They also thank graduate students Penglei Sun, Yuhui Zhai, Wei Li, Tongzhan Liu, and Xiang Ji from Chang'an University who participated in the experiments.

References

[1] Ministry of Transport of China, Statistical Bulletin: Transportation Industry in 2016, Beijing, China, 2017

[2] PIARC, "Road tunnels: operational strategies for emergency ventilation," La Defense, 2011.

[3] S. Bogdan and B. Birgmajer, "Model Predictive Fuzzy Control of Longitudinal Ventilation System in a Road Tunnel," Automatika --Journal for Control, Measurement, Electronics, Computing and Communications, vol. 47, pp. 39-48, 2006.

[4] M. Funabashi, I. Aoki, M. Yahiro, and H. Inoue, "A fuzzy model based control scheme and its application to a road tunnel ventilation system," in Proceedings IECON '91 International Conference on Industrial Electrlnics, Control and Instrumentation, Kobe, Japan, 1991.

[5] K. Toshihiro, Y. Tatsuro, W Takahiro, S. Masanori, M. Miyako, and E. Hisashi, "Road Tunnel Ventilation Control Based on Nonlinear Programming and Fuzzy Control," IEEJ Transactions on Industry Applications, vol. 113, no. 2, pp. 160-168, 1993.

[6] P.-H. Chen, J.-H. Lai, and C.-T. Lin, "Application of fuzzy control to a road tunnel ventilation system," Fuzzy Sets and Systems, vol. 100, no. 1-3, pp. 9-28, 1998.

[7] B. Chu, D. Kim, D. Hong et al., "GA-based fuzzy controller design for tunnel ventilation systems," Automation in Construction, vol. 17, no. 2, pp. 130-136, 2008.

[8] S. Bogdan, B. Birgmajer, and Z. Kovacic, "Model predictive and fuzzy control of a road tunnel ventilation system," Transportation Research Part C: Emerging Technologies, vol. 16, no. 5, pp. 574-592, 2008.

[9] N. Euler-Rolle, M. Fuhrmann, M. Reinwald, and S. Jakubek, "Longitudinal tunnel ventilation control. Part 1: Modelling and dynamic feedforward control," Control Engineering Practice, vol. 63, pp. 91-103, 2017.

[10] C. Guo, M. Wang, L. Yang, Z. Sun, Y. Zhang, and J. Xu, "A review of energy consumption and saving in extra-long tunnel operation ventilation in China," Renewable & Sustainable Energy Reviews, vol. 53, pp. 1558-1569, 2016.

[11] Q. Li, C. Chen, Y. Deng et al., "Influence of traffic force on pollutant dispersion of CO, NO and particle matter (PM2.5) measured in an urban tunnel in Changsha, China," Tunnelling and Underground Space Technology, vol. 49, pp. 400-407, 2015.

[12] H. Yamada, R. Hayashi, and K. Tonokura, "Simultaneous measurements of on-road/in-vehicle nanoparticles and NOx while driving: Actual situations, passenger exposure and secondary formations," Science of the Total Environment, vol. 563-564, pp. 944-955, 2016.

[13] A. N. Martin, P. G. Boulter, D. Roddis et al., "In-vehicle nitrogen dioxide concentrations in road tunnels," Atmospheric Environment, vol. 144, pp. 234-248, 2016.

[14] L. Breiman, "Bagging predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

[15] L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

[16] L. Breiman and A. Cutler, "Random Forests," 2001, https://www .stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

[17] A. Liaw and M. Wiener, "Classification and regression by randomforest," The R Journal, vol. 2, no. 3, pp. 18-22, 2002.

[18] J. C. Dunn, "Well-separated clusters and optimal fuzzy partitions," Journal of Cybernetics, vol. 4, no. 1, pp. 95-104, 1974.

[19] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Springer, New York, NY, USA, 1981.

[20] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, Hoboken, NJ, USA, 2008.

[21] L. Breiman, "Statistical modeling: the two cultures," Statistical Science. A Review Journal of the Institute of Mathematical Statistics, vol. 16, no. 3, pp. 199-231, 2001.

[22] A. Liaw and M. Wiener, "Breiman and Cutler's Random Forests for Classification and Regression," 2015, https://CRAN.R-project.org/package=randomForest.

[23] P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987

[24] PIARC, "Road tunnels: vehicle emissions and air demand for ventilation," La Deefense, 2012.

[25] L. Breiman, "Manual On Setting Up, Using, And Understanding Random Forests V3.1," 2002, https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.

[26] PIARC, "Integrated approach to road tunnel safety," La Defense, 2007.

[27] Ministry of Environmental Protection of China, China Vehicle Emission Control Annual Report, Beijing, China, 2015, Ministry of Environmental Protection of China, China Vehicle Emission Control Annual Report.

Chao Qian (iD), (1) Jianxun Chen (iD), (2) Yanbin Luo (iD), (2) and Shuguang Li (1)

(1) School of Electronic & Control Engineering, Chang'an University, Xi'an 710064, China

(2) School of Highway, Chang'an University, Xi'an 710064, China

Correspondence should be addressed to Jianxun Chen; chenjx1969@chd.edu.cn

Received 1 November 2017; Accepted 10 June 2018; Published 5 July 2018

Academic Editor: Juan C. Cano

Caption: Figure 1: Qin Mountains tunnel group.

Caption: Figure 2: Overall structure of Qin Mountains No. 1 tunnel in the southbound direction.

Caption: Figure 3: Diagram of Random Forests combining with clustering analysis.

Caption: Figure 4: Optimal number of clusters; higher average silhouette widths are preferred.

Caption: Figure 5: Influence of the combined parameters ([n.sub.tree] and [m.sub.try]) on OOB error.

Caption: Figure 6: Comparison of importance of different variables.

Caption: Figure 7: Distributions of particular variables under different statuses.

Table 1: Monitoring instruments and data. Data category Data name Ambient air CO N[O.sub.2] Air velocity [PM.sub.2.5] Traffic flow PC LDV HGV Data category Instrument and model Ambient air ThermoFisher Model 48i CO Analyzer ThermoFisher Model 42i N[O.sub.x] Analyzer AZ Instrument 9871 ThermoFisher Model 5030i Particulate Monitor Traffic flow Laser vehicle detector Data category Unit Statistical interval Ambient air ppm ppm 15-min average value m/s mg/[m.sup.3] Traffic flow vehicle/15 min vehicle/15 min 15-min accumulative value vehicle/15 min Table 2: Confusion matrix for in-tunnel operational status. Actual status Perception status Lightly Moderately polluted polluted ([c.sub.1]) ([c.sub.2]) Lightly polluted ([c.sub.1]) [n.sub.1,1] [n.sub.1,2] Moderately polluted ([c.sub.2]) [n.sub.2,1] [n.sub.2,2] Heavily polluted ([c.sub.3]) [n.sub.3,1] [n.sub.3,2] Severely polluted ([c.sub.4]) [n.sub.4.1] [n.sub.4.2] Total [N.sub..,1] [N.sub..,2] Actual status Heavily Severely polluted polluted ([c.sub.3]) ([c.sub.4]) Lightly polluted ([c.sub.1]) [n.sub.1,3] [n.sub.1,4] Moderately polluted ([c.sub.2]) [n.sub.2,3] [n.sub.2,4] Heavily polluted ([c.sub.3]) [n.sub.3,3] [n.sub.3,4] Severely polluted ([c.sub.4]) [n.sub.4.3] [n.sub.4.4] Total [N.sub..,3] [N.sub..,4] Actual status Total Lightly polluted ([c.sub.1]) [N.sub.1,.] Moderately polluted ([c.sub.2]) [N.sub.2,.] Heavily polluted ([c.sub.3]) [N.sub.3,.] Severely polluted ([c.sub.4]) [N.sub.4,.] Total N Table 3: Comparison of evaluation metric for different perception models. Operational status Naive Bayes P (%) R (%) F (%) P (%) Lightly polluted 100 80 88.89 100 Heavily polluted 88.71 98.21 93.22 92.44 Severely polluted 95.83 82.14 88.64 96.15 Operational status SVM Random Forests R (%) F (%) P (%) R (%) F (%) Lightly polluted 85 91.89 100 90.00 94.74 Heavily polluted 98.21 95.24 96.49 100 98.21 Severely polluted 89.29 92.59 100 96.55 98.25

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Research Article |
---|---|

Author: | Qian, Chao; Chen, Jianxun; Luo, Yanbin; Li, Shuguang |

Publication: | Journal of Advanced Transportation |

Article Type: | Case study |

Geographic Code: | 9CHIN |

Date: | Jan 1, 2018 |

Words: | 6344 |

Previous Article: | A Causal Model for Safety Assessment Purposes in Opening the Low-Altitude Urban Airspace of Chinese Pilot Cities. |

Next Article: | An Eco-Driving Advisory System for Continuous Signalized Intersections by Vehicular Ad Hoc Network. |

Topics: |