An Approach to Integrating Tactical Decision-Making in Industrial Maintenance Balance Scorecards Using Principal Components Analysis and Machine Learning.
In business and engineering, decision-making approaches and models are developed in response to the uncertainty of technological and demand conditions. In business, it is possible to identify a strategic  or operational  approach or a more particularly focused approach for suppliers ; in the engineering field, it is possible to identify cases regarding manufacturing conditions , product design , or aspects relating to civil engineering . In the context of the current market, in which delivery times are continually reduced and, more importantly, responses to orders are increasingly immediate, the production response in the industrial environment is faster, and quality and time affect both the complexity and the flexibility of the system . Considering that the capacity of the machines is limited, we consider those productive areas with identified bottlenecks as strategic productive areas of the factory. Using this capacity as an invariant value, the system attempts to maintain the maximum availability of the machines that comprise a strategic productive area. Moreover, if continuous production occurs in this context, for example, in papermaking, downtime caused by damage is irrecoverable.
Another characteristic of the market context is a wider range of products, resulting in the transformation of manufacturing from mass production to flexibility; in the latter case, this versatility leads to greater wear and fatigue on machines because of the high rate of change in the configuration, potentially resulting in a loss of reliability. This finding means that it is necessary to consider more extreme measures in terms of both the prediction and the anticipation of failure. Thus, predictive maintenance engineering has developed and perfected technologies for condition monitoring and predicting failures before breakage occurs [8-10]. Although this approach is more operational and requires more resources and investments than following the scheme , it cannot be established in an entire strategic productive area without critical equipment, facilities, or machine parts. To respond to this problem, a methodology has been considered for a productive area designated as strategic that offers knowledge extraction and the prediction of availability indicators. Thus, the maintenance department can provide a timely response with minimal resources to maintain the required reliability.
In maintenance field, when the decisions-making is related to strategies or policies, in the long term, the considerations of fuzzy uncertainty are convenient. Thus, the literature review, carried out by Mardani et al.  about the fuzzy multiple criteria decision-making techniques, found that, in maintenance environments, the fuzzy approach is utilized in strategic framework, in the long term as, for example, in the selection of the maintenance strategy [13-15] or the maintenance policy . This could be extensible to projects [17, 18] or in civil engineering  environments where there is more uncertainty due to the different conditions of each event. However in industrial environments and with continuous process as papermaking, where the same machines are used in the manufacturing despite product variety, the risks in the predictions are lower.
The integration of Principal Component Analysis (PCA) and Machine Learning (ML) techniques can facilitate decision-making in these environments. PCA is a very efficient method to find attributes that are influential in explaining the greater variation of a data set characterized by many explanatory variables in many registers . This algorithm is used extensively in the literature, particularly for predictive maintenance, as a method of reducing dimensions . According to Alpaydin , ML is a branch of artificial intelligence whose goal is to programmatically automate a computer's learning process, similar to how humans and animals naturally learn through experience; the algorithms of ML directly employ the data without previously establishing an equation as a model. In addition, these algorithms improve their efficiency as the quantity of data used as examples increases during the learning. ML finds natural patterns in the data and helps make better decisions and establish predictions. Because of its versatility, ML has been used in many fields, including construction . However, this approach is not habitually combined with the PCA and ML techniques. The grouping of data facilitated by PCA allows a better interpretation of complex systems such as those in which ML is applied; this interpretability is considered a characteristic of achievement through ML methods .
This work consists of a segment of a global and modular framework for Maintenance Decision Support Systems , whose general objective is to propose a system that assists an expert in decision-making to design customized maintenance programs in a productive plant . This system begins with the alignment of the company's strategic objectives, followed by the tactical and operational maintenance.
This paper addresses that tactical goal and has the objective of providing better knowledge and predictions by projecting reliability behavior in a medium-term future (yearly horizon taken over monthly intervals), integrating this new functionality into the classic Balance Scorecard (BSC) and making it possible to extend its current function of measuring the current situation to a new aptitude: predicting evolution based on historical data . For this objective, techniques such as PCA and ML are used.
In the proposed Custom Balance Scorecard design, Matlab[C]  is used to integrate an exploratory phase of data using PCA algorithms and another phase of discovery and prediction that uses ML; we will use Artificial Neural Network (ANN) algorithms. The beginning data used to evaluate the results were obtained from productive area records composed of two main papermaking machines coded in the Computerized Maintenance Management System (CMMS) as M1 and M2, respectively. The data have been divided into two parts. The first part will be used in the exploratory phase, which reflects the maintenance work orders received in the productive area in one year. The other part will be used in the analysis phase, in which production values and machine responses are represented as efficiency variables and failure times; this part also considers a period of one year. Because of the continuous improvement process that characterizes the papermaking industry, the maintenance function's influence on productive efficiency and sustainability is more sensitive than in other types of industrial plants [29-31]; therefore, this study focuses on indicators, overall equipment efficiency (OEE), and mean time to failure (MTTF) .
The PCA algorithm has been used in the exploratory phase. In the analysis phase using ML techniques, ANN is used for its versatility as algorithms for supervised and unsupervised learning and for its suitable behavior against other ML techniques that are used for prediction . In unsupervised learning, two types of algorithms are used to extract the knowledge of the data structure through clustering. Hierarchical clustering is used, as is Neuronal Network of Self-Organizing Map (SOM). Both algorithms identify groups of individuals by similar behaviors from individual data and have been used effectively both to identify the stages of wear in industrial environments  and to characterize the energy in electrical supply networks . Hierarchical clustering makes it possible to show the natural grouping structure of the data as a function of the metric that is set as a criterion of proximity, whereas SOM decomposes the data into a set number of groups. Supervised learning will use ANN regression algorithms for the suitable predictive behavior of machine maintenance variables .
The production plant presents in its management system a clear division between maintenance and production, occurring equally for its databases; therefore, there is no single database where we can access all the information jointly in an integral manner. Because of this, we have to access maintenance and production data separately, so we have two distinct tables identified as dataWO, Figure 1(a), corresponding to the maintenance database, and dataWOF, Figure 1(b), corresponding to production database. Both tables will be defined in more depth later, nevertheless, to clarify the following two phases: dataWO will contain the input data for the PCA and clustering algorithms corresponding to the unsupervised learning technique, while dataWOF will serve as input information for the regression algorithm according to the supervised learning technique. The separation of maintenance and production departments at the level of database management leads to separate analysis and the use of different techniques in terms of the knowledge extraction process.
2.1. Exploratory Data Phase. There is a first preparatory, preliminary data step in which the starting data correspond to the work orders, WOs, which have received the papermaking machines, M1 and M2, during a calendar year. These data have been extracted from a CMMS database; Figure 1(a) shows the treatment of these data once they have been imported; these are defined in the table dataWO. The data obtained present 46 attributes and 1080 instances of WOs after a prior filtering. The work order is a document that in its original format presents 46 fields that represent the 46 original attributes (see Table 1), which can be grouped into descriptive fields of problem and resolution, with free text of alphanumeric type, and other categorical variables of numeric type to accommodate the kind of work order, such as order type, requester, repair shop, repair type, urgency type, asset condition, and implication of failure. Numerical categorical variables that hold classes are order status, homogeneous family, section, and installation, type of fixed assets, type of work, and operative sequences. The remaining qualitative variables are of date type that record dates and times of request and programming of the intervention and completion. However, it is permissible to perform a PCA on all data for the 46 attributes, applying it only on the 7 numerical types (4 associated costs of totals, orders, parts, and workforce, 2 repair times, and 1 of the number of operators) which are shown in Table 1, as input variables for the PCA, discarding the rest of the original variables since they are used to obtain context information, as they document the problem and its physical location; that is why they will serve as prefilter variables for location and situation in which the maintenance intervention is located.
In the exploratory data phase, the statistical technique of PCA has been used to reduce the data dimension and find the principal axes that best represent the variation of data. These axes are orthogonal to each other and are calculated using a linear base change application by choosing a new coordinate system for the original set of data in which the largest variance of the dataset is captured on the first axis (called the first component); the second-largest variance is the second axis, and so on. This methodology reduces to a problem of eigenvalues and eigenvectors on the covariance matrix of the data, obtaining a reduction of the dimensionality of the data on those axes that make a more substantial contribution to its variance in general; therefore, many principal axes are used whose sum represents approximately 80% of the variation of the original data .
The PCA parts of a data set are tabulated such that each line represents an observation, instance, or individual and each column represents an attribute or variable. Consider that a data set consisting of n observations with k attributes is available. In matrix notation, we will express [mathematical expression not reproducible], where [mathematical expression not reproducible] is the matrix representing the table with the coefficients ([a.sub.ij]) as the ith observation of the jth variable; hence, the matrix of observations [mathematical expression not reproducible] is formed by k vectors of variables, [bar.[x.sub.j]], sorted by columns, and each vector has n components corresponding to its n observations, as shown in
[mathematical expression not reproducible] (1)
To reduce the size of the variables, one must find another vector subspace that is aligned with those vector components that involve more variation, and one must form a basis for these components to be represented in an orthogonal, that is, a linearly independent system. This problem is reduced to finding a vector space whose vectors, [bar.v], represent the variation of the data, that is, a system in which (2) is satisfied:
A * [bar.v] = [lambda] * V. (2)
However, in this case, the variation is not reduced but is used to find the principal components and axes or their own values and vectors of the data. In accordance with this philosophy, we will attempt to find those components and principal axes that explain the maximum variation of the data. Thus, instead of matrix [mathematical expression not reproducible], its covariance matrix is obtained, [mathematical expression not reproducible], which is then normalized with a mean of zero and a standard deviation of one. The result is then used to obtain the values and eigenvectors of [mathematical expression not reproducible]. That is, the result is used to solve the following equations:
[mathematical expression not reproducible] (3)
[mathematical expression not reproducible]. (4)
Once the principal components are obtained, [[lambda].sub.1], along with the principal axes, [[bar.v].sub.i], they together explain the variation of the data, which are ordered as a Pareto diagram, selecting exclusively that set of components p that explain at least 80% of the variation in the data. Thus, a reduction in the dimensions of the data of k original variables to p < k variables is obtained. In general, the matrix of observations projected onto the main axes, [mathematical expression not reproducible], contains n observations of the variables that are obtained by (5), where [mathematical expression not reproducible] is the matrix formed by columns with the eigenvectors [[bar.v].sub.i], obtained from (4).
[mathematical expression not reproducible] (5)
This transformation expresses the original data in axes that coincide with the natural variation. One aspect to be considered in this analysis is that this transformation is linear; therefore, it is not suitable for representing nonlinear problems. In cases of nonlinearity, it is advisable to use ML clustering algorithms, as will be observed later.
2.2. Phase of Analysis through Machine Learning. In this phase, the preparatory data step uses as input data, in addition to the previous data, the production values and their responses as efficiency variables and failure times for an operational year for both papermaking machines (M1 and M2). Data are extracted and grouped from two databases: maintenance and production. The data obtained present 35 attributes and 12 instances corresponding to each month for each machine (identified as M1 and M2). Figure 1(b) shows the treatment of these data once they have been imported; they are defined in the table dataWOF. The manufacturing report is a document that in its original format presents 35 fields that represent the original attributes (see Table 2), which can be grouped in identifying fields of the machine in question, of alphanumeric type, and numerical categorical variables record the natural month. The remaining attributes are of numeric type and record the values of time, cost, interventions, and production. For each papermaking machine, 3 predictor variables associated with production parameters, shown in Table 2 (daily production in tons of paper per day, average paper weight in grams per square meter, and average speed in meters per second), are selected, with the objective of obtaining a target of two simultaneous predictive responses (OEE in the percentage of machine utilization and MTTF). Answers that evaluate the aptitude of the maintenance function applied to the productive area are provided by both machines. The three predictive variables are those that, from experience, characterize production better. Although a PCA could be performed as in the exploratory phase for the dataWO, it was not considered in this occasion due to the few instances, 12, which we had for each machine. Since PCA is a statistical analysis, it has been considered that the few instances are not sufficient to carry out such an analysis, considering at this point a selection based on criteria based on experience. However, as more productive data and more instances are obtained, a PCA can be performed on all or those productive attributes of Table 2 to reduce the dimension and select those that represent the most influence in the variation of data.
ML is divided into two techniques : supervised learning, which is training a model on known input and output data to predict future outputs, and unsupervised learning, which is finding hidden patterns and intrinsic structures in the input data. Figure 2 provides an illustration of ML. For each technique, different algorithms can be used in which choosing the ideal is performed by trial and error.
Supervised learning uses classification and regression techniques to develop predictive models. The difference between these techniques is that the classification predicts responses in discrete or categorical variables, whereas regression predicts responses in a continuous variable . Unsupervised learning uses the clustering technique commonly used in exploratory data analysis to find hidden patterns as clusters in data.
From the algorithms of ML (Support Vector Machine, Discriminant Analysis, Naive Bayes, Nearest Neighbor, Decision Trees, K-means, Hierarchical, Gaussian Mixture, Hidden Markov Model, and ANN), we will use algorithms modeled with ANN for their versatility for both nonsupervised techniques (clustering) and supervised techniques (regression). The former groups the input data to recognize patterns and define the natural groups present in the data; the latter's purpose is to establish correspondence or mapping between input values or predictors and the output variables or objectives to predict. Thus, the model is trained (or adjusted) by a knowledge base formed by historical examples of known inputs and outputs.
According to Rumelhart et al. , the process of the back-propagation training of ANN is used for regression techniques such as supervised learning. The ANN back-propagation training process is divided into two stages: forward and backward propagation. A network configuration consisting of multiperceptron layers, as shown in Figure 3, and an activation function of the output layer range and [0,1] is used before an input x, which is expressed by
y(x) = [e.sup.x] / (1+ [e.sup.x]). (6)
In the forward propagation stage, we select an input data set for training ([x.sub.1], [x.sub.2], ..., [x.sub.I-1]) and apply it to the network to obtain the outputs [y.sub.d]. For each neuron j of the hidden layer, the value of each nucleus [n.sub.j] is given by
[n.sub.j] = [I summation over (i=1)] ([W.sub.j,i] x [a.sub.i]) + b, (7)
where each input value of the previous layer [a.sub.i] is weighted by [W.sub.j,i] and the output of the hidden layer is expressed by
[a.sub.j] = f ([n.sub.j]). (8)
This is a nucleus activation function and is performed iteratively for each output layer until the final output, [y.sub.d], as in
[y.sub.d] = f([n.sub.yd]) = f ([H summation over (j=1)] ([W.sub.yd,k] x [a.sub.k])). (9)
The backward propagation stage consists of measuring the error committed as the difference between the calculated value, [y.sub.d], and the real value, [y.sub.r]. We recalculate the weights w, attempting to minimize the error in the reverse, first obtaining the new weights of the layer of exit [w.sup.n.sub.yd,k] based on the old, o, equation
[w.sup.n.sub.yd,k] = [w.sup.o.sub.yd,k] + [a.sub.k] * [[y.sub.d] * (1- [y.sub.d]) * ([y.sub.r] - [y.sub.d])], (10)
and later the new weights of the hidden layer [w.sup.n.sub.j,i]
[w.sup.n.sub.j,i] = [w.sup.o.sub.j,i] + [w.sup.o.sub.j,i] * [a.sub.i] * [[a.sub.j] * (1- [a.sub.j]) * ([w.sup.new.sub.k,j] - [[delta].sub.j])], (11)
where [[delta].sub.j] is obtained by applying (9) on the following equation:
[[y.sub.d] *(1 - [y.sub.d]) * ([y.sub.r] - [y.sub.d])] = [[delta].sub.yd] (12)
This process is repeated for n observations until a predetermined acceptable value of the error is achieved, usually using the mean square error (MSE), which is defined in (13). To ensure a rapid convergence of the iterative method, we usually use mathematical optimization methods; in this case, we use Bayesian Regularization .
MSE = 1 / n [n summation over (d=1)] [([y.sub.r] - [y.sub.d]).sup.2] (13)
It is possible to integrate new functionalities into a custom control panel of the industrial plant. In this case, predictive analysis was added for the expected availability response of a productive area, considering the main core of the industrial plant; thus, it is possible to anticipate the information. The future availability in the medium-term (at monthly intervals) of both machines allows the maintenance department to correct possible deviations that are out of tolerance before they occur, improving their response. In the first phase, which is exploratory, we use PCA to discover the smallest dimensions that explain the variation of the data. Applying the PCA to the set of WOs of productive area 2 (composed of M1 and M2), principal components or axes, PCi, are found; these are sufficient to explain the variation of the original data contained in the WOs of the productive area. As a result of the PCA, 5 components are identified that would explain 100% of the variation; therefore, 2 linearly dependent vectors are detected among the input variables, reducing in two the original dimension; on the other hand, from 5 principal components 3 would account for 78.6% of the variation in data. This work aims at the number of interventions, costs, and maintenance times and will represent the results of PCA on the first 3 principal components. In Figure 4(a), it is observed that the first three principal components represent 78.6% variation of the data, reason why it is decided to represent the data using these three components as principal axes of representation; this is visualized in Figure 4(b), with projections of the original data on the three principal components. Previously the data were normalized with mean 0 and standard deviation 1.
Table 3 shows the values obtained in the PCA of the maintenance metrics, where the projections are obtained on the three principal components of the maintenance numerical variables selected from Table 1 (total costs, orders, parts, and workforce cost, estimated repair time, repair time, and number of operators). From the seven maintenance variables studied, it can be observed that the ones that gain more relevance in principal axes are from greater to less: order cost, total cost, and part cost; this has been considered extracting the Euclidean modulus or norm of each variable in the three principal axes, which is shown in the last column of Table 3. The rest of the four remaining variables have a more or less similar amplitude, so they are considered of equal importance.
In the second phase, ML, the clustering technique is used to discover patterns hidden in the data, such as the natural grouping. For this technique, from the data stored in the dataWO table shown in Figure 1(a), only two attributes are used: total cost and repair time. Both attributes can be key indicators for the maintenance department, and the repair time can be key also for the production department because of downtime. Therefore, after knowing their influence on the principal components, it is important to deepen their relationship.
For the clustering technique, two algorithms have been used. The first technique, hierarchical clustering, allows the creation of a dendrogram, which is a tree diagram that measures the number of natural groups, or clusters, depending on the distance criterion that is fixed between data. In this case, by setting a distance value on the ordinate axis, the tree is trimmed by a horizontal line that cuts the dendrogram in as many intersections as natural groups appear. In this case, Figure 5(a), it is observed that, for Euclidean mean distances of 6000 to 7000, the tree presents two natural groups; from 3000 to 5500, it presents three groups; from 2000 to 3000, it presents four groups; and below 1000, the number of groups increases considerably. Because of this compression, a value of 900 is used; by pruning the tree into 7 natural groups, the different groups of color data can be illustrated in Figure 5(b). As can be seen, the hierarchical clustering technique allows an overview of the number of clusters that can be obtained as a function of the chosen distance value. There are several distance metrics, you can even define as a custom; in this case Euclidean distance has been used as a metric.
The clustering technique is again used, performing a second algorithm of an SOM, ANN, on the subset of total cost data and repair time as the chosen variables reflecting costs and times of the plant's intervention maintenance. An SOM or Kohonen consists of a competitive layer that can classify a set of vector data with any number of dimensions into as many classes as neurons have a layer [39-41]. Neurons are arranged in a two-dimensional topology of the data set. The trained network with 2 variables and 1080 input data is shown in Figure 5(c); its two-dimensional topology with data impact is shown in Figure 5(d).
The network is configured by 2 dimensions, 2 x 4, discovering a pattern of 8 natural groups in the data; these are distributed with a clear linear relationship between them. In addition, there are discrepant data that have no linear relationship and reveal an unconventional repair; this is extraordinary and realized in one of the machines, and it was not cataloged like normal repair. This finding reveals an error in the introduction of the information in the CMMS. This event was also revealed by the green dot (single group) of the hierarchical clustering figure (see Figure 5(c)). Figure 6(a) shows the original data in the two variables (cost, time), and Figure 6(b) shows the 7 natural groups and the linear relationship between them. This figure also shows that group 8 has no linear relationship, as previously discussed. The interpretation of these results, under the maintenance approach, shows that the linear relationship between the groups comes to reflect the following analysis on the distribution of the groups, observing that the first four groups for costs between 0 and 1000 [euro] are very close to each other, while the centroids of 2000 [euro], 3000 [euro], and 5000 [euro] present greater distance. With this, it is inferred that the majority of the interventions cost less than 1000 [euro], with a smaller number of interventions at intervals of 1000 [euro].
Finally, the regression technique enables prediction of the future availability values of both main papermaking machines (M1 and M2) using the OEE indicators of each papermaking machine and its MTTF, such as average runtime before failure. In addition, these values are calculated simultaneously in the trained ANN model. A trained neural network with input data (predictors) is used that combines the three production variables and the two target output variables, which are the overall efficiency of each OEE machine and average time to MTTF failure, measured for 12 months of the year for each machine. For this technique, from the data stored in the dataWOF table shown in Figure 1(b), only three production attributes are used: daily production in tons of paper per day, average paper weight in grams per square meter, and average speed in meters per second.
For the papermaking machines M1 and M2, 3 input variables, a 10-layer feed-forward network with hidden neurons, and 2 layers of linear output neurons can adjust arbitrarily suitable multidimensional mapping problems, given consistent data and sufficient neurons in their hidden layer. The network will be trained with 70% of the data using the back-propagation algorithm of Bayesian Regularization; 15% of the data will be used for validation, and the remaining 15% will be used for the test. An MSE performance function is used, as shown in Figures 7(a) and 8(a), for M1 and M2, respectively. The result of the trained network is suitably provided according to the regression adjustment coefficients that are shown in Figures 7(b) and 8(b), for M1 and M2, respectively.
For each machine that forms the productive area, the result of the adjustment is observed by comparing the output variables OEE and MTTF (in blue) measured with the output variables foreseen by the OEEp and MTTFp model (in red). The results of M1 are shown in Figure 9(a), and those of M2 are shown in Figure 9(b). In short, this network can predict the maintenance behavior of the production area of the plant in availability terms (efficiency and operating times), feeding the model with the predictive values of production. As more instances of input/output data are introduced, the network will be retrained and more reliable, since it will be adjusted with a greater number of real examples that have occurred; therefore, the greater quantity will lead to the acquisition of more experience and knowledge. In maintenance terms, it is necessary to make predictions of availability and operating times according to three characteristic values of production, and the model established in this way allows to predict the OEE and MTTF observing a nonlinear behavior in the time, as is shown in Figure 9. In this sense, the responses for M1 are observed where the OEE oscillates between an average value of 94% with a low dispersion of [+ or -]0.66%, thus not happening with MTTF values of mean 81.07 h of high dispersion [+ or -]25.82 h. As for M2, the OEE has an average value of 95.52% with a low dispersion of [+ or -]1.12%, despite MTTF values of 118.03 h mean with very high dispersion [+ or -]59.35 h, concluding that the regression model can accurately predict high and low amplitude oscillation values over time.
It is noted that the fit is acceptable for training, which is provided by the global adjustment regression coefficient R values, as shown in Figure 7(b), for both machines.
For machine M1, discrepant values are observed in the validation setting for month 10 and for both OEE and MTTF indicators; there are two reasons for this reason. First, there is overadjustment when the data have not been prepared well, and there are data with erroneous or poorly conditioned input information. Second, there are a low number of observations or minimal historical information. In this paper, it has been verified that the input data for the ANN did not present poor conditioning; therefore, the overadjustment problem is discarded, and the few data (i.e., the few observations) available are considered the main cause of discordance of the inputs for validation. The problem of feeding the network with few data to train and validate the ANN is due to unavailability of more data for reasons of good performance in the industrial plant, a fact that would undoubtedly improve the learning of the network and therefore its efficiency and accuracy. However, this fact highlights another very interesting aspect of the ANN; the network is easily adaptable and configurable given a low number of observations. Here, this adaptability makes it possible to accurately predict 11 hits of 12 possibilities; thus, there is a 91.67% probability of success in this case.
For machine M2, there is a nearly total adjustment for the OEE indicator but not for the MTTF indicator, for which it is evident that, in month 6, there is a discrepancy in the prediction, as in M1; the minimal data (observations) used for the validation obtain the same precision as M1.
Another relevant aspect of the result is the acceptable precision in the prediction of availability indicators, which are based exclusively on time, by simply using as input three productive variables as predictor variables (daily paper mass, paper surface density, and machine speed). In addition to the two output variables, OEE and MTTF indicators as objectives to be predicted are calculated simultaneously, a fact that reflects an additional value of this type of networks and greater computational efficiency, by obtaining in a single simulation the prediction of more than one objective variable.
A PCA-ML model has been developed such that it can be integrated into scorecards with a traditional focus, BSC, thus including a tactical definition of longer-term strategic approaches such as a scorecard based on BSC. This new extension allows better control over the maintenance function of an industrial plant in the medium-term, with a monthly interval, in such a manner that allows the measurement of certain indicators of those productive areas that were previously considered strategic. This model of PCA and an ML algorithm using ANN can be integrated very easily into any traditional control panel by converting the developed source code to packages of different programming languages and including them in a library to be used as a function in a spreadsheet or a standalone executable application. In addition, at the control panel, this model is provided with ML to discover structures and behavior patterns that are relatively hidden in WOs. By utilizing a clustering or clustering technique, natural groups are determined in the cost variables and maintenance workforce; in addition, predictions about the availability of the productive area are made through the indicators OEE and MTTF. Thus, the scorecard model on a paper production plant has been validated.
As possible future works, this methodology could be applied to civil engineering and in this case applying a fuzzy uncertainty due to particular characteristics of this sector.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors want to thank the Department of Construction Engineering and Manufacturing and the College of Industrial Engineers of UNED for their support through Projects 2017-IFC08 and 2017-IFC09.
 D. Wollmann and M. T. Steiner, "The strategic decision-making as a complex adaptive system: a conceptual scientific model," Complexity, vol. 2017, Article ID 7954289,13 pages, 2017
 R. Calvo, R. Domingo, and M. A. Sebastian, "Operational flexibility quantification in a make-to-order assembly system," International Journal of Flexible Manufacturing Systems, vol. 19, no. 3, pp. 247-263, 2008.
 M. Mahdiloo, R. F. Saen, and M. Tavana, "A novel data envelopment analysis model for solving supplier selection problems with undesirable outputs and lack of inputs," International Journal of Logistics Systems and Management, vol. 11, no. 3, pp. 285-305, 2012.
 M. Madic, J. Antucheviciene, M. Radovanovic, and D. Petkovic, "Determination of manufacturing process conditions by using MCDM methods: Application in laser cutting," Engineering Economics, vol. 27, no. 2, pp. 144-150, 2016.
 E. K. Zavadskas, R. Bausiys, D. Stanujkic, and M. Magdalinovic-Kalinovic, "Selection of lead-zinc flotation circuit design by applying WASPAS method with single-valued neutrosophic set," Acta Montanistica Slovaca, vol. 21, no. 2, pp. 85-92, 2016.
 J. Antucheviciene, Z. Kala, M. Marzouk, and E. R. Vaidogas, "Decision making methods and applications in civil engineering," Mathematical Problems in Engineering, vol. 2015, Article ID 160569, 3 pages, 2015.
 R. Calvo, R. Domingo, and M. A. Sebastin, "Systemic criterion of sustainability in agile manufacturing," International Journal of Production Research, vol. 46, no. 12, pp. 3345-3358, 2008.
 F. A. Olivencia Polo, J. Ferrero Bermejo, J. F. Gomez Fernandez, and A. Crespo Marquez, "Failure mode prediction and energy forecasting of PV plants to assist dynamic maintenance tasks by ANN based models," Renewable Energy, vol. 81, pp. 227-238, 2015.
 A. J. C. Trappey, C. V. Trappey, L. Ma, and J. C. M. Chang, "Intelligent engineering asset management system for power transformer maintenance decision supports under various operating conditions," Computers and Industrial Engineering, vol. 84, pp. 3-11, 2015.
 V. Martinez-Martinez, F. J. Gomez-Gil, J. Gomez-Gil, and R. Ruiz-Gonzalez, "An Artificial Neural Network based expert system fitted with Genetic Algorithms for detecting the status of several rotary components in agro-industrial machines using a single vibration signal," Expert Systems with Applications, vol. 42, no. 17-18, pp. 6433-6441, 2015.
 J. Moubray, Reliability-centered Maintenance, Industrial Press, New York, NY, USA, 1997
 A. Mardani, A. Jusoh, and E. K. Zavadskas, "Fuzzy multiple criteria decision-making techniques and applications--two decades review from 1994 to 2014," Expert Systems with Applications, vol. 42, no. 8, pp. 4126-4148, 2015.
 M. Bashiri, H. Badri, and T. H. Hejazi, "Selecting optimum maintenance strategy by fuzzy interactive linear assignment method," Applied Mathematical Modelling. Simulation and Computation for Engineering and Environmental Systems, vol. 35, no. 1, pp. 152-164, 2011.
 M. M. Fouladgar, A. Lashgari, Z. Turskis, A. Yazdani-Chamzini, and E. K. Zavadskas, "Maintenance strategy selection using AHP and COPRAS under fuzzy environment," International Journal of Strategic Property Management, vol. 16, no. 1, pp. 85-104, 2012.
 L. Wang, J. Chu, and J. Wu, "Selection of optimum maintenance strategies based on a fuzzy analytic hierarchy process," International Journal of Production Economics, vol. 107, no. 1, pp. 151-163, 2007.
 M. Ilangkumaran and S. Kumanan, "Selection of maintenance policy for textile industry using hybrid multi-criteria decision making approach," Journal of Manufacturing Technology Management, vol. 20, no. 7, pp. 1009-1022, 2009.
 J.-H. Yu, M.-Y. Jeon, and T. W. Kim, "Fuzzy-based composite indicator development methodology for evaluating overall project performance," Journal of Civil Engineering and Management, vol. 21, no. 3, pp. 343-355, 2015.
 A. Valipour, N. Yahaya, N. Md Noor, A. Mardani, and J. Antucheviccienee, "A new hybrid fuzzy cybernetic analytic network process model to identify shared risks in PPP projects," International Journal of Strategic Property Management, vol. 20, no. 4, pp. 409-426, 2016.
 M.-Y. Cheng, D. K. Wibowo, D. Prayogo, and A. F. V. Roy, "Predicting productivity loss caused by change orders using the evolutionary fuzzy support vector machine inference model," Journal of Civil Engineering and Management, vol. 21, no. 7, pp. 881-892, 2015.
 H. Abdi and L. J. Williams, "Principal component analysis," Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433-459, 2010.
 H. Li, D. Parikh, Q. He et al., "Improving rail network velocity: A machine learning approach to predictive maintenance," Transportation Research Part C: Emerging Technologies, vol. 45, pp. 17-26, 2014.
 E. Alpaydin, Introduction to Machine Learning, MIT Press, Cambridge, MA, USA, 2004.
 H. Naganathan, W. O. Chong, and X. Chen, "Building energy modeling (BEM) using clustering algorithms and semi-supervised machine learning approaches," Automation in Construction, vol. 72, pp. 187-194, 2016.
 A. Vellido, J. D. Martin-Guerrero, and P. J. G. Lisboa, "Making machine learning models interpretable," in Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2012, pp. 163-172, Bruges, Belgium, April 2012.
 N. Rodriguez-Padial, M. Marin, and R. Domingo, "Strategic Framework to Maintenance Decision Support Systems," in Proceedings of the Manufacturing Engineering Society International Conference, MESIC 2015, pp. 903-910, July 2015.
 G. Waeyenbergh and L. Pintelon, "A framework for maintenance concept development," International Journal of Production Economics, vol. 77, no. 3, pp. 299-313, 2002.
 S. W. Palocsay, I. S. Markham, and S. E. Markham, "Utilizing and teaching data tools in Excel for exploratory analysis," Journal of Business Research, vol. 63, no. 2, pp. 191-206, 2010.
 Mathworks, Introducing Machine Learning. https://www.mathworks.com.
 L. M. Calvo and R. Domingo, "Influence of process operating parameters on CO2 emissions in continuous industrial plants," Journal of Cleaner Production, vol. 96, article no. 4308, pp. 253-262, 2015.
 L. M. Calvo and R. Domingo, "A first approach to the use of CO2 emissions as a maintenance indicator in industrial plants," in Proceedings of the Manufacturing Engineering Society International Conference, MESIC 2013, pp. 678-686, June 2013.
 L. M. Calvo and R. Domingo, "CO2 emissions reduction and energy efficiency improvements in paper making drying process control by sensors," Sustainability (Switzerland), vol. 9, no. 4, article 514, 2017.
 R. Domingo and S. Aguado, "Overall environmental equipment effectiveness as a metric of a lean and green manufacturing system," Sustainability (Switzerland), vol. 7, no. 7, pp. 9031-9047, 2015.
 H. Jiang, Y. Zou, S. Zhang, J. Tang, and Y. Wang, "Short-Term Speed Prediction Using Remote Microwave Sensor Data: Machine Learning versus Statistical Model," Mathematical Problems in Engineering, vol. 2016, Article ID 9236156, p. 13, 2016.
 D. A. Tobon-Mejia, K. Medjaher, and N. Zerhouni, "CNC machine tools wear diagnostic and prognostic by using dynamic Bayesian networks," Mechanical Systems and Signal Processing, vol. 28, pp. 167-182, 2012.
 A. Rohani, M. H. Abbaspour-Fard, and S. Abdolahpour, "Prediction of tractor repair and maintenance costs using Artificial Neural Network," Expert Systems with Applications, vol. 38, no. 7, pp. 8999-9007, 2011.
 M. Nilashi, M. Z. Esfahani, T. Ramayah, O. Ibrahim, M. D. Esfahani, and M. Z. Roudbaraki, "A multicriteria collaborative filtering recommender system using clustering and regression techniques," Journal of Soft Computing and Decision Support System, vol. 3, pp. 24-30, 2016.
 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986.
 M. Kayri, "Predictive abilities of Bayesian regularization and Levenberg-Marquardt algorithms in artificial neural networks: a comparative empirical study on social data," Mathematical & Computational Applications, vol. 21, no. 20,11 pages, 2016.
 I. Bose and R. K. Mahapatra, "Business data mining--A machine learning perspective," Information and Management, vol. 39, no. 3, pp. 211-225, 2001.
 G. Koksal, I. Batmaz, and M. C. Testik, "A review of data mining applications for quality improvement in manufacturing industry," Expert Systems with Applications, vol. 38, no. 10, pp. 13448-13467, 2011.
 J. Hernandez Orallo, M. J. Ramirez Quintana, C. Ferri Ramirez, J. Hernandez Orallo, M. J. Ramirez Quintana, and C. Ferri Ramirez, Introduccion a la mineria de datos, Pearson Prentice Hall, Madrid, Spain, 2004.
Nestor Rodriguez-Padial, Marta Marin, and Rosario Domingo
Department of Construction and Manufacturing Engineering, Universidad Nacional de Educacion a Distancia (UNED), C/Juan del Rosal 12, 28040 Madrid, Spain
Correspondence should be addressed to Rosario Domingo; email@example.com
Received 26 May 2017; Revised 26 July 2017; Accepted 29 August 2017; Published 12 October 2017
Academic Editor: Romualdas Bausys
Caption: FIGURE 1: Imported data from CMMS: (a) table dataWO; (b) dataWOF.
Caption: FIGURE 2: Machine learning techniques.
Caption: FIGURE 3: ANN regression trained with back-propagation perceptron multilayer.
Caption: FIGURE 4: (a) Principal components variance; (b) projected data on principal components.
Caption: FIGURE 5: (a) Dendrogram; (b) data; (c) ANN SOM trained; (d) SOM topology.
Caption: FIGURE 6: (a) Artificial Neural Network SOM trained; (b) SOM topology sample hits.
Caption: FIGURE 7: (a) ANN regression performance for M1; (b) regression fit.
Caption: FIGURE 8: (a) ANN regression performance for M2; (b) regression fit.
Caption: FIGURE 9: Output variables measure (blue) versus predicted (red): (a) M1; (b) M2.
TABLE 1: Original attributes of the maintenance work order. STATE_WO Numerical categorical TYPE-WO Word categorical WO_NUMBER Alpha numerical REQUESTING_DEPT Word categorical REQUESTING_DEPT Date REQUESTING_HOUR Date WORKSHOP Word categorical URGENCY Word categorical TERM_DATE Date WORK_DESCRIPT_1 Alpha numerical WORK_DESCRIPT_2 Alpha numerical HOMOGENEOUS_GROUP Numerical categorical LOCATION Alpha numerical ELECTRIC_CODE Alpha numerical SECTION Numerical categorical INSTALLATION Numerical categorical DEVICE_DESCRIPTION Alpha numerical PRINCIPAL_WO Alpha numerical ENROLLMENT Alpha numerical TYPE_REPAIR Word categorical SEQUENCE_NUMBER Numerical categorical TYPE_INMOBILIZED Numerical categorical OPERATORS_NUMBER Numerical ESTIMATED_REPAIR_TIME Numerical ASSET_CONDITION Word categorical TYPE_WORK Numerical categorical SCHEDULED_DATE Date SCHEDULED_HOUR Date WORK_DESCRIPT_1_FINAL_1 Alpha numerical WORK_DESCRIPT_1_FINAL_2 Alpha numerical IMPLICATION_FAULT Word categorical ELEMENT_FAULT Alpha numerical CAUSE_FAULT Alpha numerical SUBSTITUTION Alpha numerical ID_SUBSTITUTE_ENROLLMENT Alpha numerical ID_SUBSTITUTED_ENROLLMENT Alpha numerical START_DATE Date START_HOUR Date FINAL_DATE Date FINAL_HOUR Date REPAIR_TIME Numerical WORKFORCE_COST Numerical PARTS_COST Numerical ORDER_COST Numerical TOTAL_COST Numerical REPAIR_DATE Date TABLE 2: Original attributes of the manufacturing report (production database). ID_MACHINE Alpha numerical MONTH Numerical categorical NON_PLASTERED_PRODUCTION Numerical PLASTERED_PRODUCTION Numerical VOLUME_PLASTERED_PRODUCTION Numerical TOTAL_PRODUCTION Numerical WORKS_DAYS Numerical DAILY_PRODUCTION_TON Numerical CUTOUT_PRODUCTION Numerical DAILY _CUTOUT_PRODUCTION_TON Numerical DECREASE_MACHINE_% Numerical AVAILABLE_HOURS Numerical IDLE_TIME_EXTERNAL_CAUSES Numerical MAINTENANCE Numerical SCHEDULED Numerical BREAKS Numerical PRODUCTIONREST Numerical TOTAL_IDLE Numerical DAILY_IDLE_TIME Numerical OEE Numerical START_NUMBERS Numerical CUTOUT _TIME_CHANGES Numerical AVERAGE_PAPER_WEIGHTS Numerical AVERAGE_REAL_WIDTH Numerical AVERAGE_SPEED Numerical AVERAGE BUDGETED WIDTH Numerical WIDTH_DECREASE_CMS Numerical WIDTH_DECREASE-TON Numerical CAPE_PRODUCTION Numerical COST Numerical COST/TON Numerical INTERVENTIONS_NUMBER Numerical MEAN TIME BETWEEN FAILURES Numerical MEAN TIME TO REPAIR Numerical MEAN TIME TO FAILURE Numerical TABLE 3: PCA results. Principal components values. Metrics PC1 PC2 PC3 Metrics modulus TOTAL_COST 0.4696 0.476 0.036 0.6697 ORDER-COST 0.1598 0.5714 0.6089 0.8502 PARTS_COST 0.2323 0.3913 -0.5737 0.7323 WORKFORCE_COST 0.5234 -0.2668 -0.0987 0.5957 REPAIR_TIME 0.5235 -0.2668 -0.0987 0.5957 ESTIMATED_REPAIR_TIME 0.3793 -0.2661 0.2076 0.5077 OPERATORS_NUMBER 0.09 -0.2839 0.4861 0.5701
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Research Article|
|Author:||Rodriguez-Padial, Nestor; Marin, Marta; Domingo, Rosario|
|Date:||Jan 1, 2017|
|Previous Article:||Dynamical Analysis of a Class of Prey-Predator Model with Beddington-DeAngelis Functional Response, Stochastic Perturbation, and Impulsive Toxicant...|
|Next Article:||Fractional-Order and Memristive Nonlinear Systems: Advances and Applications.|