Printer Friendly
The Free Library
6,672,335 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

A comparison of neural networks and econometric discrete dependent variable models in prediction of occupational attainment.


ABSTRACT

A number of recent studies have compared the performance of neural networks to a variety of statistical techniques for the classification problems. In this paper, we compared the prediction of occupational attainment using a backpropagation back·prop·a·ga·tion  
n.
A common method of training a neural net in which the initial system output is compared to the desired output, and the system is adjusted until the difference between the two is minimized.
 neural network model and a multinomial logit In statistics and economics, a multinomial logit model is a regression model which generalizes logistic regression to where can be more than two cases. Introduction  model Both techniques use variables related to education, experience minority status, disability status, marital status marital status,
n the legal standing of a person in regard to his or her marriage state.
, sex, and geographic region as inputs to perform the prediction. The neural network training and performance evaluation Performance evaluation

The assessment of a manager's results, which involves, first, determining whether the money manager added value by outperforming the established benchmark (performance measurement) and, second, determining how the money manager achieved the calculated return
 is also discussed in detail Although a comparison of the predictive ability of both models showed similar results, this paper presents neural networks as a more robust alternative for occupational attainment prediction application.

1. INTRODUCTION

During the last two decades, several research studies have focused on investigating the determinants of occupational attainment, as well as the race and sex differences in occupational attainment for the determination of occupational segregation (Broom, Jones, McDonell, and Williams, 1980; Brown, Moon, and Zoloth, 1980a; Gabriel, Williams, and Schmitz, 1990; Meng and Miller, 1995; Miller and Volker, 1985; Schmidt, and Strauss, 1975). As defined by Brown, Moon, and Zoloth (1980a)"an individual's occupational attainment is a function of employer's willingness to hire that person and of the individual's desire to work in specific occupations. Willingness of employers to hire an individual depends on human capital. The individual's desire for a particular occupation can be expressed by at least three of the arguments in a utility function: income, taste for the work involved and family size. The interaction of these supply and demand factors leads to the individual's employment in a particular occupation." The most common technique employed in occupational attainment prediction involves the estimation of parameters using statistical models with discrete dependent variables, namely multinomial logit, and ordered probit In statistics, ordered probit is a flavor of the popular probit analysis, used for ordinal dependent variables. Similarly, the popular logit method also has a counterpart ordered logit. . Although these qualitative response techniques are widely used in the social sciences, it is well known that the predictive power The predictive power of a scientific theory refers to its ability to generate testable predictions. Theories with strong predictive power are highly valued, because the predictions can often encourage the falsification of the theory.  of these models is rather poor (Greene, 1993).

A useful and robust approach for solving applied business and engineering problems that are distinguished by lack of availability or applicability of any mathematical model
Note: The term model has a different meaning in model theory, a branch of mathematical logic. An artifact which is used to illustrate a mathematical idea is also called a mathematical model and this usage is the reverse of the sense explained below.
 is neural networks, also known as artificial neural networks (artificial intelligence) artificial neural network - (ANN, commonly just "neural network" or "neural net") A network of many very simple processors ("units" or "neurons"), each possibly having a (small amount of) local memory. , connectionist models, parallel distributed processing The first term used to describe the distribution of multiple computers throughout an organization in contrast to a centralized system. It started with the first minicomputers. Today, distributed processing is called "distributed computing." See also client/server.  models, and neuromorphic systems. Neural networks are developed as generalizations of mathematical models of human cognition Human cognition is the study of how the human brain thinks. As a subject of study, human cognition tends to be more than only theoretical in that its theories lead to working models that demonstrate behavior similar to human thought. , based on the following assumptions (Fausett, 1994): (1) information processing information processing: see data processing.
information processing

Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer-based operations.
 occurs at many simple elements called neurons or neural processing elements (NPEs), (2) signals are passed between neurons over connection links, (3) each connection link has an associated weight, which, in a typical neural net neural network also neural net
n.
A real or virtual device, modeled after the human brain, in which several interconnected elements process information simultaneously, adapting and learning from past patterns.

Noun 1.
, multiplies the signal transmitted, (4) each neuron applies an activation function (usually nonlinear) to its net input (sum of weighted input signals) to determine its output signal. Since neural networks utilize an architecture and information processing manner similar to the brain, some of the networks show similar characteristics that are associated with the brain, for example, the ability to learn from examples, to generalize from situations, to classify examples into categories, and to self organize information. It has been determined that while under ideal statistical conditions the neural network can perform as well as a multiple regression Multiple regression

The estimated relationship between a dependent variable and more than one explanatory variable.
 model, under less than ideal conditions (e.g. when the model is not correctly specified, existence of missing data, outliers, heteroscedasticity, or autocorrelation Autocorrelation

The correlation of a variable with itself over successive time intervals. Sometimes called serial correlation.
), neural networks' performance is superior (Denton, 1995).

The utility of neural networks in practical applications for a wide variety of problems has been documented extensively in engineering, management science, and financial management literature (Sharda, 1994). This paper describes how we developed a neural network model to predict occupational attainment. We also compare the accuracy of the neural network prediction with that obtained by a multinomial logit model. Utilizing a neural network software Neural network software is used to simulate, research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems.  called NeuralWorks Professional II Plus (NeuralWare, 1993), we designed, developed, and validated a neural network that makes use of data from the Current Population Survey (CPS (1) (Characters Per Second) The measurement of the speed of a serial printer or the speed of a data transfer between hardware devices or over a communications channel. CPS is equivalent to bytes per second. ).

Section 2 is a review of literature detailing several practical applications of neural networks in business and engineering. Section 3 describes the various studies conducted in the area of occupational attainment. We also discuss in this section the motivation to use a neural network for the occupational attainment prediction problem. Section 4 explains the issues involving the multinomial logit model, and provides the basic architecture, training, and performance evaluation of the neural network. Results are discussed in Section 5, followed by conclusions in Section 6.

2. LITERATURE REVIEW: NEURAL NETWORKS

A wide range of interesting applications of neural networks motivated and helped us in our research. Lu et al. (Lu, Chen, Kim, and Hwang, 1996) compared the effectiveness of neural networks and the multinomial logit model, and concluded that the ANNs perform better than logit regressions in franchising decision-making. Fletcher and Goss n. 1. Gorse.  (1993) used back-propagation (BP) neural network model to predict the bankruptcy of a firm, and reported improvement in the prediction accuracy over the logit model. Salchenberger, Cinar and Lash (1992) used a neural network and the logit model to forecast bank failures using the bank's financial ratios as inputs. The neural network performed as well or better than the logit model. Wu, Fang, King, and Nuttle (1995) applied neural network technology for the decision surface modeling of apparel retail operations. Denton (1995) compared neural networks to linear regression Linear regression

A statistical technique for fitting a straight line to a set of data points.
 forecasting models and used the mean square error of their forecasts as a measurement of their relative performance. The analysis was performed with, (1) an ordinary linear regression, (2) in the presence of an outlier outlier /out·li·er/ (out´li-er) an observation so distant from the central mass of the data that it noticeably influences results.

outlier

an extremely high or low value lying beyond the range of the bulk of the data.
, (3) in the presence of multicollinearity, and (4) with a misrepresentation misrepresentation

In law, any false or misleading expression of fact, usually with the intent to deceive or defraud. It most commonly occurs in insurance and real-estate contracts. False advertising may also constitute misrepresentation.
 of the model. Their results show that there was no significant difference in the performances in the first case, but the neural network performed better than the regression model in the last three cases. Kuo and Reitsch (1995) have compared the forecasting performance of neural networks and several conventional forecasting models. Neural networks showed a superior performance in almost all cases.

Numerous successful attempts have been made to use neural networks for managerial decision making. Hansen, McDonald, and Stice (1992) applied neural networks in two audit problems. In the first audit problem, the authors predicted the audit opinion rendered for specific firms from financial ratios and other predictors. The second audit problem investigated the neural network's ability to predict the likelihood of litigation An action brought in court to enforce a particular right. The act or process of bringing a lawsuit in and of itself; a judicial contest; any dispute.

When a person begins a civil lawsuit, the person enters into a process called litigation.
 presented to an auditor by a specific client. Tam and Kiang kiang: see ass.  (1992) discussed a BP neural network application in predicting future bank bankruptcy based on financial ratios. Dutta and Shekhar (1988) applied neural networks to a generalization problem of predicting the corporate bond ratings. They used a BP network with one hidden layer and simultaneous weight adjustments to learn the weights. The authors reported that that the neural network resulted in a successful prediction rate of 88.3% compared to a success rate of 64.7% for the linear multivariate model. Surkan and Singleton (1990) later suggested that networks trained with two hidden layers outperform a network with only one hidden layer containing a comparable number of neurons. Successful application of various neural network training algorithms in forecasting of potable potable /pot·a·ble/ (po´tah-b'l) fit to drink.

po·ta·ble
adj.
Fit to drink; drinkable.



potable

fit to drink.
 water demand is reported by Thind (1994). Chiang, Urban, and Baldridge (1996) discussed a BP neural network approach to mutual fund net asset value forecasting.

Hu, Zhang, and Haiyang (2004) found that the ANN performed better than logistic regression In statistics, logistic regression is a regression model for binomially distributed response/dependent variables. It is useful for modeling the probability of an event occurring as a function of other factors.  in the modeling of foreign equities. Collins, Ghosh, and Scofield. (1988) have built a neural network to make mortgage underwriting An Introduction to Mortgage Underwriting

Underwriting is the process a lender uses to determine if the risk of lending to a particular borrower under certain parameters is acceptable.
 judgments. Their application made use of a Multiple Neural Network Learning System (MNNLS) to replicate the decisions made by mortgage insurance underwriters. The MNNLS used an array of coupled Restricted Coulomb coulomb (k`lŏm) [for C. A. de Coulomb], abbr. coul or C, unit of electric charge. The absolute coulomb, the current U.S.  Energy (RCE Recurrent corneal erosion (RCE)
Repeated erosion of the cornea. May be a result of inadequate healing of a previous abrasion.

Mentioned in: Corneal Abrasion
) sub-networks. Using a variant of the nonlinear least square method, neural networks have also been used to search and decode nonlinear regularities in asset price movements (White, 1988). Kimoto, Asakawa, Yoda, and Takeoda (1990) applied modular neural networks to develop a buying and selling timing prediction system for stocks on the Tokyo Stock Exchange Tokyo Stock Exchange

Main stock market of Japan, located in Tokyo. It opened in 1878 to provide a market for the trading of government bonds newly issued to former samurai.
. They developed a high-speed learning method called supplementary learning that is actually based on error backpropagation. Odom and Sharda (1990) developed a neural network model using backpropagation for prediction of bankruptcy. They claimed that their model performed better than discriminant dis·crim·i·nant  
n.
An expression used to distinguish or separate other expressions in a quantity or equation.
 analysis which is generally used for this class of problems. A study by Kattan (1993) applied the BP neural network to predict the self-reported usage of a popular graphics software package.

3. LITERATURE REVIEW AND BACKGROUND: OCCUPATIONAL ATTAINMENT

Gabriel, Williams, and Schmitz (1990) used the multinomial logit model to make comparisons of occupational attainment among ethnic and gender groups. They obtained logit coefficients for each variable and used the coefficients for white males in minority groups to make inferences about occupational segregation, and found the logit model appropriate for predicting the occupational attainment. Schmidt and Strauss (1975) analyzed occupational attainment using the multinomial logit model. Meng and Miller (1995) applied the multinomial logit and ordered probit models to the estimation of occupational attainment based on education, job experience, firm tenure and three regional dummies. Based on this study, the authors reported that the intricate linkages between independent variables and occupational outcomes are better captured by the multinomial logit model as it estimates a larger set of parameters--one set for each occupation. Miller and Volker (1985) compared the performance of the multinomial logit and ordered probit models to determine occupational attainment and mobility in a random sample of Australian males. They found that, while the ordered probit model is more consistent with previous notions of job hierarchies and requires less computational time, the multinomial logit model performed better in the prediction of the occupational distributions. Brown et al. (1980a) modeled occupational attainment with the multinomial logit model using data from the National Longitudinal Survey. The model allowed them to simulate the occupational distribution of women that would exist if they were treated as men. Brown et al. (1980b) used the multinomial logit model to estimate segregation by sex in occupational attainment, and tested the robustness of the model with multiple discriminant analysis
For other uses of this acronym, see MDA


In statistics, multiple discriminant analysis (LDA) is a generalization of linear discriminant analysis. External links
  • Definition at statistics.com
, and reported that both approaches provides similar results.

As discussed above, despite the widespread use of statistical models in the domain of occupational attainment, several limitations of these models exist. The estimation of models with discrete dependent variables is a complicated statistical problem. Because of heteroskedasticity in the error terms and the need to keep probabilities within the 0-1 range, the ordinary least squares model is not appropriate, forcing modelers to use maximum likelihood estimation. Although more appropriate than ordinary regression models, maximum likelihood estimations have the following drawbacks:

(a) Independence of irrelevant alternatives Independence of irrelevant alternatives (IIA) is a term for an axiom of decision theory and various social sciences. Although exact formulations of IIA differ, intentions of the usages are similar in attempting to provide a rational account of individual behavior or aggregation of  property

According to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 Kennedy (1996), a limitation of the multinomial logit model is that when several alternatives with very similar utility are introduced in the model, the probabilities of choosing the other alternatives will be affected. This makes the model inappropriate when several alternatives are close substitutes of each other.

(b) Assumptions about error terms

In statistical models we must make some assumptions on the error terms. In the multinomial logit model, we assume that the error terms present a logistic distribution In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. . In the ordered probit model, we assume that the error terms are normally distributed. We do not need to make any assumptions about error terms in the neural network.

(c) Determination of the functional form

In multinomial logit models some restrictive assumptions must be made about the functional form of the relationship between the explanatory variables and the probabilities of each of the alternative outputs. A Box-Cox transformation In statistics, the Box-Cox transformation of the response variable Y is used to make the linear model more appropriate to the data. It can be used to attempt to impose linearity, reduce skewness or stabilize the residual variance.  can be applied to the probit In probability theory and statistics, the probit function is the inverse cumulative distribution function (CDF), or quantile function associated with the standard normal distribution.  and logit models to determine if the regressors should be included in natural form or in logarithmic logarithmic

pertaining to logarithm.


logarithmic relationship
when the logs of two variables plotted against each other create a straight line.
 form. Greene, Greene, and Seaks (1995) described the effect of the Box-Cox transformation in a probit model In statistics, a probit model is a popular specification of a generalized linear model, using the probit link function. Probit models were introduced by Chester Ittner Bliss in 1935. . A neural network has the capability to select the parameters that best approximate the underlying functional form of the data. Therefore, a neural network can simulate the appropriate functional form with the data provided, resulting in the best forecast of the occupational attainment.

4. MODELS

4.1 Multinomial logit model

If we assume that individual workers select their occupations according to a number of individual specific variables (e.g., personal characteristics, taste, economic status, regional labor market labor market A place where labor is exchanged for wages; an LM is defined by geography, education and technical expertise, occupation, licensure or certification requirements, and job experience ), we can define the goodness of their occupation [Y.sub.i] as:

[Y.sub.i] = a + [X'.sub.i][beta] + [epsilon] [epsilon] ~ N(0,1)

where [X.sub.i] is a vector of personal characteristics of the individual i and [beta] is the corresponding vector of coefficients. [Y.sub.i] is not observed. Instead, we can observe whether or not the individual adopted a particular occupation. If we assume that the probability that an individual i is employed in the jth occupation follows the cumulative distribution of a logistic random variable, then it can be expressed as:

[MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression.  NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. .]

where [P.sub.ih] is the conditional probability conditional probability

the probability that event A occurs, given that event B has occurred. Written P(AB).
 that individual i adopts occupation h', [[beta].sub.h] is the vector of coefficients that relate the characteristics X to the log probability ratio. This model can be estimated by maximizing the likelihood function of a sample of T independent observations on individual choices [Y.sub.i]. The likelihood function can be written as:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

where [LAMBDA The Greek letter "L," which is used as a symbol for "wavelength." A lambda is a particular frequency of light, and the term is widely used in optical networking. Sending "multiple lambdas" down a fiber is the same as sending "multiple frequencies" or "multiple colors. ](.) is the logistic cumulative distribution function.

4.2 Neural Network Model

This section describes the neural network model constructed to predict the occupational attainment. The inputs to the network are 10 decision variables (see Table I) that influence the decision regarding the occupational attainment. The output of the network is the classification of the occupation in one of the 9 categories (see Table III) that is best suited for the individual worker. The neural network model development starts with selecting appropriate neural network architecture. Different neural architectures are employed for different problems. We initially considered several neural network architectures such as Standard Backpropagation (BP), Fast Backpropagation, Quickprop, Cascade-Correlation, and Learning Vector Quantization Vector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression.  (LVQ LVQ Learning Vector Quantization
LVQ Lattice Vector Quantization
LVQ Learned Vector Quantization
). We decided to use Standard BP network, as it is practical, easy to implement, and the most popular choice among management scientists for various research and application efforts (Sharda, 1994). The BP neural network was implemented using NeuralWorks Professional II/PLUS software (Neuralware, 1993).

4.3 Data description

The data needed to train the BP neural network as well as the multinomial logit model was obtained from the 1996 edition of the Current Population Survey (CPS). Each data set consisted of 10 input values corresponding to each decision variable and a specific output corresponding to one of the nine actual occupational categories. These 10 variables are specified and described in Table I and a statistical summary of is provided in Table I1. The same set of decision variables was used in the multinomial logit model and the BP neural network. A total of 4,982 cases were available to train the neural network and a hold-out sample of 1,452 cases was used for testing the network performance. For achieving best results, the training data set was selected in such a way that (1) the data was evenly divided among the various outcomes, and (2) it reasonably represented the entire universe.

Since the purpose of the models tested was to predict occupation of workers, observations within the age group 16-65 working for the private sector, federal, state or local government were selected. Both self-employed and Armed Forces related occupations were excluded from the sample. The categorical variables (minority, disability, sex, geographic region and marital status) were recoded in multiple dummy fields, mapped to 1 or 0. The education variable (highest grade achieved) was remapped See remap.  to a continuous variable reflecting the years of education completed.

All subjects not belonging to the white race, Mexican American Mexican American
n.
A U.S. citizen or resident of Mexican descent.



Mexi·can-A·mer
, Chicano, Mexican, Puerto Rican Puer·to Ri·co  
Abbr. PR or P.R.
A self-governing island commonwealth of the United States in the Caribbean Sea east of Hispaniola.
, Cuban, Central or South American, or other Spanish descent, were considered to be minorities. Experience was computed as age minus years of education minus five. Respondents who declared to "have a health problem or a disability which prevents work or which limits the kind or amount of work" were considered disabled. Marital status was coded to "married" if so declared and "single" if never married, widowed, divorced, or separated.

Table III describes the classification of occupations used in the models. The nine possible outputs were recoded to dichotomous di·chot·o·mous  
adj.
1. Divided or dividing into two parts or classifications.

2. Characterized by dichotomy.



di·chot
 variables containing the value 1 if the individual had the occupation represented by that variable or 0 if the individual had a different occupation. Therefore, the mean represents the proportion of observations corresponding to each occupation.

4.3 Neural Network Architecture and Training

Relevant design parameters for BP network include the number of hidden layers, the number of nodes in the hidden layer, the number of training cycles, epoch size (number of times examples are presented before the weights are updated), initial and final learning rate and momentum terms. These are important decisions in modeling the network and can significantly affect the performance of the network. There exists almost no sound theoretical basis for selecting model parameters. Therefore, modelers must rely on heuristic A method of problem solving using exploration and trial and error methods. Heuristic program design provides a framework for solving the problem in contrast with a fixed set of rules (algorithmic) that cannot vary.

1.
 guidelines that seem to provide good performance (Jain and Nag, 1995). For example, literature suggests that while at least one hidden layer is mandatory, there is no evidence of any advantage by adding multiple hidden layers (Dutta and Shekhar, 1988; Salchenberger et al., 1992). In our application we experimented with various combinations of the design parameters. The final basic architecture of the neural network model is shown in Figure 1. For simplicity, some of the links and processing elements have been omitted in Figure 1.

[FIGURE 1 OMITTED]

The system is a feedforward feedforward /feed-for·ward/ (fed-for´ward) the anticipatory effect that one intermediate in a metabolic or endocrine control system exerts on another intermediate further along in the pathway; such effect may be positive or negative.  backpropagation neural network with 10 input nodes or Neural Processing Elements (NPEs), one hidden layer with 15 NPEs, and an output layer with 9 NPEs, each corresponding to one of the decision categories. The network is presented a training file containing records corresponding to 4,982 observations from the training data set. The 4,982 observations of the training set were presented to the network randomly, and the network was trained for 50,000 cycles, when the error stabilizes and additional training did not yield any improvement in performance (refer to Gupta, 1995; Wasserman, 1989; and Wu et al., 1995 for background, details, and mathematical equations for the BP neural network).

A hyperbolic hy·per·bol·ic   also hy·per·bol·i·cal
adj.
1. Of, relating to, or employing hyperbole.

2. Mathematics
a. Of, relating to, or having the form of a hyperbola.

b.
 tangent tangent, in mathematics.

1 In geometry, the tangent to a circle or sphere is a straight line that intersects the circle or sphere in one and only one point.
 function was chosen as transfer function. This choice was straightforward, since the hyperbolic tangent is the most popular transfer function for classification problems. An initial set of random weights was generated and gaussian noise (1) In communications, a random interference generated by the movement of electricity in the line. It is similar to white noise, but confined to a narrower range of frequencies. You can actually see and hear Gaussian noise when you tune your TV to a channel that is not operating.  was assumed at the input stage. A weighted sum of the inputs was computed for each processing element of the next layer, and the resulting values were again transformed by the transfer function and weighted until the data reached the output layer. After the resulting values have been compared to the desired ones and the weight corrections have been calculated, the connection weights are updated. The process of updating network weights constitutes the basis of the learning process.

Before a connection weight is adjusted, the new increment is multiplied by a learning rate or learning coefficient. The adjustment of the learning rates is a critical step in network training. The learning rates were selected through trial and error, starting with a relatively high rate of 0.2. The initial learning rate was reduced until the network was able to learn without saturated cells. A cell saturates when the value received by a processing element is beyond the transfer function range. When a cell is saturated, it always yields the same output and becomes unable to learn from additional training. As the learning process progressed, the learning rates were reduced from 0.2 to 0.15 after 10,000 cycles and to 0.10 from 20,000 to 50,000 cycles. This allowed the network to continue learning and slowly achieving stability, minimizing its mean squared error In statistics, the mean squared error or MSE of an estimator is the expected value of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated. . A higher learning higher learning
n.
Education or academic accomplishment at the college or university level.
 coefficient would have allowed for faster learning, but the fluctuations would have made it more difficult to reach convergence. To minimize the effect of oscillations oscillations See Cortical oscillations.  in the weight adjustments, a momentum factor was included in the calculation of the weight adjustments. The momentum factor is a proportion of the previous weight adjustment that is added to each weight adjustment. A momentum factor increases stability in the learning process and allows the network to advance faster toward convergence. A momentum rate of 0.5 proved most effective to avoid oscillations at the beginning, and it was also reduced as training progressed.

5. RESULTS

The multinomial logit model yields a coefficient for each variable and each occupation. These coefficients express how the independent variable affects the natural logarithm Natural logarithm

Logarithm to the base e (approximately 2.7183).
 of the likelihood that an individual adopts a particular occupation over the executive/managerial (base) occupation. Logit coefficients are shown in Table IV.

The [chi square chi square (kī),
n a nonparametric statistic used with discrete data in the form of frequency count (nominal data) or percentages or proportions that can be reduced to frequencies.
] statistic of 3561.87 is significant at [alpha]=0.01 level, meaning that the multinomial logit model is a good predictor of occupational attainment. Since this test of goodness of fit Goodness of fit means how well a statistical model fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.  cannot be performed in the neural network, in order to compare the forecasting performance of both models we chose two procedures: (1) percentage of cases correctly forecasted by the model, and (2) mean squared error. The percentage of cases correctly forecasted is a good measure because the ultimate objective of the models is to accurately forecast the occupational attainment. It was calculated by counting the cases for which the occupation with the highest calculated probability was the actual occupation adopted. In the neural network, the probability matrix is extracted as an output file from the software. In the multinomial mul·ti·no·mi·al  
n.
See polynomial.



[multi- + (bi)nomial.]


mul
 Iogit model, the probability matrix is calculated by applying the coefficients estimated with the training data set to the test data set.

The mean square error (MSE MSE Mouse (computer)
MSE Materials Science & Engineering
MSE Mean Squared Error
MSE Mean Square Error
MSE Master of Science in Engineering
MSE Manufacturing Systems Engineering
MSE Mechanically Stabilized Earth
) is another good measure of performance because both models minimize MSE and therefore, it is a standard measurement for performance of forecasting techniques. The MSE was calculated with the formula:

MSE = [H.summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument)  over h=1][T.summation over i=1][([A.sub.ih] - [F.sub.ih]).sup.2]/T * H

where [A.sub.ih] is the actual value of the occupation h dummy variable This article is not about "dummy variables" as that term is usually understood in mathematics. See free variables and bound variables.

In regression analysis, a dummy variable
 for individual i (1 or 0) and [F.sub.ih] is the forecasted probability that individual i adopts occupation h. The forecast errors are squared to avoid offsetting effects of errors with different signs. The results of both tests are shown in Table V.

The column labeled SQR Query and reporting software from Hyperion Solutions Corporation, Santa Clara, CA (www.hyperion.com) that provides a graphical front end for designing and generating complex, high-volume reports. (MSE) shows the square root of the mean squared error, and it is useful because it is expressed in the same units as the actual observation (probability). This means that the average deviation between forecasted and actual probabilities is very close to 30% in both cases. The percentage of cases correctly forecasted is also very close to 30% in both cases.

6. CONCLUSIONS

The review of the literature shows several studies of occupational attainment using statistical models, as well as applications of neural networks to other forecasting problems, such as linear regression. This study is a first attempt to systematically apply an ANN to the problem of forecasting occupational attainment. We found that the neural network is an alternative to the discrete dependent variable models. Since the neural network is capable of emulating virtually any functional form, it can perform better in cases where the errors are not normally distributed, or where there are several occupations with very similar utility.

Training a neural network can be, however, a very expensive procedure, and it is difficult, in general, to know how good the parameters generated in the training are until the network has been tested. Because of the lack of theoretical foundations, training a neural network requires a long trial and error process, experimenting different combinations of learning rates, momentum terms, transfer functions, and network architectures. The determination of the learning rates and other network parameters is fundamental to train the network successfully.

As previously reported, genetic algorithms Genetic algorithms

Search procedures based on the mechanics of natural selection and genetics. Such procedures are known also as evolution strategies, evolutionary programming, genetic programming, and evolutionary computation.
 can reduce the arbitrary nature of the determination of the training parameters, improving the training process and, therefore, the forecasting performance of the network. The results of this study can be extended to many other classification problems, especially to those with discrete dependent variables.

The results of the study show that, although the training of the network was less than optimal, there was no significant difference in the number of accurate forecasts between the neural network and the multinomial logit model. Likewise, there was no significant difference in mean square error between both models. A network trained by specialized and experienced neural network engineers can yield better results than the statistical model, allowing more accurate forecasts and fast re-estimations with different populations, from different geographic areas or different sampling criteria. However, a limitation of the ANN is that it does not allow for the statistical analysis often performed with multinomial logit estimates. Standard errors are not available in neural networks, making it impossible to perform confidence intervals, tests for significance for variables, tests for differences between data sets, etc.

Most occupational attainment models are used to compare among ethnic groups, ages, male and female, or other criteria. The usual approach is to estimate the statistical model for a group and then use the coefficients in another group to yield the expected occupational attainment. The difference between the predicted and the real attainment can be attributed to occupational segregation, geography or other factors. The use of neural networks easily allows training the network with a group, e.g., males, and testing with a different group, e.g., females, to make inferences about occupational segregation. Once the training algorithm is established, the neural network is in general easier to use than the standard statistical software packages.

The promising results of this study suggest that the performance of the neural network can be greatly improved with advanced training techniques such as genetic algorithms. This good performance can be generalized to other discrete dependent variable forecasting problems. In terms of future work in this area, it might be worthwhile to consider how other types of neural network architectures might be used in the application domain. It would also be interesting to optimize the neural network using a genetic algorithm genetic algorithm - (GA) An evolutionary algorithm which generates each individual from some encoded form known as a "chromosome" or "genome". Chromosomes are combined or mutated to breed new individuals.  and to test the use of the neural network and the multinomial Iogit model in a combined approach for predictive reinforcement.
TABLE I. DESCRIPTION OF NEURAL NETWORK INPUT VARIABLES

Neural Network         Description
Input Variables

[X.sub.1] (EDU)        Educational attainment, transformed to
                       equivalent years
[X.sub.2] (MINO)       Minority Status (not white, or Mexican, Chicano,
                       Puerto Rican, Cuban, Central or South American,
                       other Spanish: 1=minority, 0=not a minority.
[X.sub.3] (EXP)        Experience (Age minus EDU minus 5).
[X.sub.4] (DISA)       Respondent has a health problem or a disability
                       that prevents work or limits the kind or amount
                       of work: 1=disabled, 0=not disabled.
[X.sub.5] (SEX)        Self declared sex: 1=female, 0=male.
[X.sub.6] (NEAST)      Geographic region: 1=northeast, 0 otherwise.
[X.sub.7] (MIDW)       Geographic region: 1=Midwest, 0 otherwise.
[X.sub.8] (SOUTH)      Geographic region: 1=South, 0 otherwise.
[X.sub.9] (WEST)       Geographic region: 1=West, 0 otherwise.
[X.sub.10] (MARITL)    Marital status: 1=married, 0=single.

TABLE II. DESCRIPTIVE STATISTICS: INDEPENDENT VARIABLES

Variable    Mean     Std Deviation    Minimum    Maximum    N

DISA         0.02     0.15             0          1         4982
NEAST        0.20     0.40             0          1         4982
MIDW         0.24     0.43             0          1         4982
WEST         0.26     0.44             0          1         4982
MINO         0.28     0.45             0          1         4982
SOUTH        0.30     0.46             0          1         4982
SEX          0.45     0.50             0          1         4982
MARITL       0.60     0.49             0          1         4982
EDU         12.99     2.99             0         21         4982
EXP         19.38    11.76            -1         58         4982

TABLE III. OUTPUT VARIABLES (OCCUPATIONAL CLASSIFICATIONS)

Occupation                                        Mean    Std. Dev.

O[C.sub.1] (Executive, Admin, and Managerial)     0.11    0.31
O[C.sub.2] (Professional Specialty)               0.12    0.32
O[C.sub.3] (Technicians and Related Support)      0.11    0.32
O[C.sub.4] (Sales)                                0.11    0.31
O[C.sub.5] (Admin. Support Occupations, Incl.
  Clerical)                                       0.12    0.33
O[C.sub.6] (Services)                             0.11    0.32
O[C.sub.7] (Precision Prod., Craft and Repair)    0.10    0.31
O[C.sub.8] (Machine Operator, Assembler,
  Inspector)                                      0.10    0.30
O[C.sub.9] (Farming, Forestry and Fishing)        0.11    0.32

Occupation                                        Min    Max    Code

O[C.sub.1] (Executive, Admin, and Managerial)     0      1      0
O[C.sub.2] (Professional Specialty)               0      1      1
O[C.sub.3] (Technicians and Related Support)      0      1      2
O[C.sub.4] (Sales)                                0      1      3
O[C.sub.5] (Admin. Support Occupations, Incl.
  Clerical)                                       0      1      4
O[C.sub.6] (Services)                             0      1      5
O[C.sub.7] (Precision Prod., Craft and Repair)    0      1      6
O[C.sub.8] (Machine Operator, Assembler,
  Inspector)                                      0      1      7
O[C.sub.9] (Farming, Forestry and Fishing)        0      1      8

TABLE IV. MULTINOMIAL LOGIT ESTIMATES OF OCCUPATIONAL ATTAINMENT
(STANDARD ERRORS IN PARENTHESES)

Variable     ln(P1/P0)    ln(P2/P0)      ln(P3/P0)      ln(P4/P0)

Constant      -3.996        2.365          5.811          5.067
              (0.499)      (0.479)        (0.502)        (0.506)
EDUC           0.267       -0.126         -0.351         -0.379
              (0.028)      (0.0286)       (0.315E-01)    (0.0315E-01)
MINORI         0.271        0.452          0.319          0.636
              (0.169)      (0.162)        (0.166)        (0.159)
EXPER         -0.018       -0.318E-02     -4.12E-01      -0.262E-01
              (0.006)      (0.589E-02)    (0.604E-02)    (0.579E-02)
DISA           0.600E-01    0.389          0.663          0.33419
              (0.521)      (0.474)        (0.46019)      (0.475)
SEX            0.622        0.202         -0.164          1.022
              (0.128)      (0.125)        (0.127)        (0.134)
MIDW           0.488E-02    0.198          0.207          0.118
              (0.181)      (0.181)        (0.190)        (0.187)
SOUTH         -0.181       -0.226          0.269E-01      0.601E-01
              (0.172)      (0.174)        (0.178)        (0.173)
WEST          -0.699E-02    0.937E-02      0.106         -0.376E-02
              (0.183)      (0.181)        (0.189)        (0.187)
MARITL        -0.315E-02   -0.139         -0.311         -0.400E-01
              (0.140)      (0.136)        (0.139)        (0.136)

Variable     ln(P5/P0)     ln(P6/P0)     ln(P7/P0)     ln(P8/P0)

Constant       9.375         9.662        10.815        12.355
               0.518         0.530         0.524         0.535
EDUC          -0.661        -0.659        -0.758        -0.901
               0.335E-01     0.342E-01     0.338E-01     3.42E-01
MINORI         1.037         0.243         0.859         0.378
               0.161         0.178         0.170         0.177
EXPER         -0.297E-01    -0.109E-01    -0.122E-01    -0.368E-01
               0.597E-02     0.624E-02     0.616E-02     0.642E-02
DISA           0.560         0.663         0.845        -0.135
               0.476         0.475         0.465         0.554
SEX            0.999E-01    -2.642        -1.507        -1.854
               0.132         0.186         0.147         0.158
MIDW           0.280         0.198         0.408         1.026
               0.195         0.200         0.199         0.235
SOUTH         -0.184        -0.185        -0.271         0.578
               0.184         0.189         0.191         0.225
WEST           0.252E-01    -0.828E-01    -0.284         1.165
               0.194         0.201         0.205         0.229
MARITL        -0.513         0.352E-01    -0.451        -0.447
               0.141         0.154         0.149         0.153

Chi-squared                  3561.87

TABLE V. PERCENTAGES ACCURATELY FORECASTED AND MEAN SQUARED ERRORS

                    % correct forecasts   MSE        SQR(MSE)

Multinomial Logit   31.25%                0.089792   0.299653
Neural Network      30.04%                0.090417   0.300694


REFERENCES

Broom, L.; Jones, F. L.; McDonell, P.; and Williams, T., The Inheritance of Inequality, Routledge, London, 1980

Brown, R.; Moon, M.; and Zoloth, B. "Incorporating occupational attainment in studies of male-female earnings differentials," The Journal of Human Resources The fancy word for "people." The human resources department within an organization, years ago known as the "personnel department," manages the administrative aspects of the employees. , 15, 1980a, 3-28.

Brown, R.; Moon, M.; and Zoloth, B., "Occupational attainment and segregation by sex, Industrial and Labor Relations Review Industrial and Labor Relations Review is a publication of the Cornell University School of Industrial and Labor Relations. It is an interdisciplinary journal publishing original research on all aspects of labor relations. , 33, 1980b, 506-517.

Chiang, W.; Urban, T.L.; and Baldridge, G.W., "A neural network approach to mutual fund net asset value forecasting," Omega, 24, 1996, 205-215.

Collins, E.; Ghosh, S.; and Scofield, C., "An Application of a Multiple Neural Networks Learning System to Emulation of Mortgage Underwriting Judgments," IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields.  International Conference on Neural Networks, 2, San Diego, California “San Diego” redirects here. For other uses, see San Diego (disambiguation).
San Diego is a coastal Southern California city located in the southwestern corner of the continental United States. As of 2006, the city has a population of 1,256,951.
, 1988, 459-466.

Denton, J. W., "How good are neural networks for causal forecasting?" Journal of Business Forecasting, 14, 1995, 17-20.

Dutta, S.; and Shekhar, S., "Bond Ratings: A Non-Conservative Application of Neural Networks," IEEE International Conference on Networks, 2, San Diego, California, 1988, 443-450.

Fausett, L., Fundamentals of Neural Networks, Prentice-Hall, Englewood Cliffs, New Jersey Englewood Cliffs is a borough in Bergen County, New Jersey, United States. As of the United States 2000 Census, the borough population was 5,322. The borough houses the world headquarters of CNBC and the American headquarters of Unilever. , 1994.

Fletcher, D.; and Goss, E., "Forecasting with neural networks: An application using bankruptcy data," Information and Management, 24, 1993, 159-167.

Gabriel, P. E.; Williams, D. R.; and Schmitz, S., "The Relative Occupational Attainment of Young Blacks, Whites, and Hispanics," Southern Economic Journal, 57, 1990, 35-46.

Greene, W., Econometric Analysis, Macmillan, New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
, 1993.

Greene, W.; Greene, L.; and Seaks, T., "Estimating the functional form of the independent variables in probit models," Applied Economics, 27, 1995, 193-196.

Gupta, V. K., "Development of a neural network model and a fuzzy expert system: application to managerial decision making", PhD dissertation, Department of Industrial Engineering, University of Houston, Houston, TX.

Hansen, J.; McDonald, J. and Stice, J. "Artificial Intelligence and Generalized Qualitative-Response Model: An Empirical Test on Two Audit Decision-Making Domains," Decision Sciences, 23, 1992, 708-723.

Hu, M. Y.; Zhang, G. P.; Haiyang, C. "Modeling foreign equity control in Sino-foreign joint ventures with neural networks". European Journal European Journal is a weekly Deutsche Welle (DW) news program produced in English. It is broadcast from Brussels, Belgium and primarily covers political and economic developments across the European Union and the rest of Europe, as well as issues of particular concern to  of Operational Research, 159 (3), 2004, 729-740.

Jain, B. A.; and Nag, B. N., "Artificial neural network models for pricing initial public offerings," Decision Sciences, 26, 1995, 283-302.

Kattan, M. W., "A Model for Explaining and Predicting the Effectiveness of Machine Learning Techniques", Unpublished Ph.D. Dissertation, College of Business Administration, University of Houston, Houston, Texas “Houston” redirects here. For other uses, see Houston (disambiguation).
Houston (pronounced /'hjuːstən/) is the largest city in the state of Texas and the
, 1993.

Kennedy, P., A Guide to Econometrics econometrics, technique of economic analysis that expresses economic theory in terms of mathematical relationships and then tests it empirically through statistical research. , MIT MIT - Massachusetts Institute of Technology  Press, Cambridge, Massachusetts This article is about the city of Cambridge in Massachusetts. For the English university town, see Cambridge, England. For other places, see Cambridge (disambiguation).
Cambridge, Massachusetts is a city in the Greater Boston area of Massachusetts, United States.
, 1996.

Kimoto, T.; Asakawa, K.; Yoda, M.; and Takeoda, M., "Stock Market Prediction Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit.  System with Modular Neural Networks," Proceedings of the International Joint Conference on Neural Networks, 1, San Diego, California, 1990, 1-6.

Kuo, C., and Reitsch, A., "Neural Networks vs. Conventional methods of forecasting," Journal of Business Forecasting, 14, 1995/1996, 17-22.

Lu, L-C L-C Lower Hatch Close Auxiliary .; Chen, W-H; Kim, D.; and Hwang, C-P C-P Sleepy (chat) ., "Artificial neural systems improve franchising decision making," International Journal of Manaqement, 13, 1996, 25-32.

Meng, X.; and Miller, P. W., "Occupational segregation and its impact on gender wage discrimination in China's rural industrial sector", Oxford Economic Papers, 47, 1995, 136-155.

Miller, P. W.; and Volker, P. A., "On the determination of occupational attainment and mobility," Journal of Human Resources, 20, 1985.

NeuralWare, Inc., Neural Computing . A Technology Handbook for Professional II/PLUS and NeuralWorks Pittsburgh, Pennsylvania “Pittsburgh” redirects here. For the region, see Pittsburgh Metropolitan Area.

Pittsburgh (pronounced IPA: /ˈpɪtsbɚg/) is the second largest city in the Commonwealth of Pennsylvania.
, 1993.

NeuralWare, Inc., Reference Guide." Software Reference for Professional II/PLUS and NeuralWorks Pittsburgh, Pennsylvania, 1993.

Odom, M. D., and Sharda, R., "A Neural Network Model for Bankruptcy Prediction," Proceedings of the International Joint Conference on Neural Networks, 2, San Diego, California, 1990, 163-168.

Salchenberger, L. M.; Cinar, E. M.; and Lash, N. A., "Neural Networks: A New Tool for Predicting Thrift Failures," Decision Sciences 23, 1992, 899-916.

Schmidt, P. J., and Strauss, R. P., "The Prediction of Occupation Using Multiple Logit Models," International Economic Review, June, 1975, 471-486.

Sharda, R., "Neural networks for the MS/OR MS/OR Management Science and Operations Research  analyst: An application bibliography," Interfaces, 24, 1994, 116-130.

Surkan, A. J.; and Singleton, J. C. "Neural Networks for Bond Rating Improved by Multiple Hidden Layers," Proceedings of the International Joint Conference on Neural Networks, 2, San Diego, California, 1990, 157-162.

Tam, K. Y.; and Kiang, M. Y. "Managerial applications of neural networks: The case of bank failure predictions," Management Science, 38, 1992,926-947.

Thind, H. S., "Forecasting Nonlinear Time Series Using Artificial Neural Network." An Application to Daily Municipal Water Demand Forecasting". Unpublished Masters Thesis, Department of Industrial Engineering, University of Houston, Houston, TX., 1994.

Wasserman, P. D., Neural Computing, Van Nostrand Reinhold, New York, 1989.

White, H., "Economic Prediction using Neural Networks: The Case of IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries)  Daily Stock Returns," Proceedings of the International Conference on Neural Networks, 2, San Diego, California, 1988, 451-458.

Wu, P.; Shu-Cherng Fang, S-C S-C Split - Convert Switch .; King, R. E.; and Nuttle, H. "Decision Surface Modeling of Apparel Retail Operations Using Neural Network Technology," International Journal of Operations and Quantitative Management, 1(1), 1995, 33-47.

Author Profile

Dr. Jose V. Gavidia earned his Ph.D. at the University of Texas--Pan American in 2001. Currently, he teaches global management of technology and international business at the College of Charleston The College of Charleston (CofC) is a public university located in historic downtown Charleston, South Carolina. The College was founded in 1770 and chartered in 1785, making it the oldest college or university in South Carolina, the 13th oldest institution of higher learning in  in Charleston, South Carolina South Carolina, state of the SE United States. It is bordered by North Carolina (N), the Atlantic Ocean (SE), and Georgia (SW). Facts and Figures


Area, 31,055 sq mi (80,432 sq km). Pop. (2000) 4,012,012, a 15.
.

Dr. Vipul Gupta earned his Ph.D. from the University of Houston in 1995. He teaches decision support systems at Saint Joseph's Saint Joseph's may refer to:
  • Saint Joseph's University, Philadelphia, Pennsylvania, United States
  • Saint Joseph's College (disambiguation page)
  • St. Joseph's High School (disambiguation page)
  • St.
 University, in Philadelphia, Pennsylvania.
COPYRIGHT 2004 International Academy of Business and Economics
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Gupta, Vipul K.
Publication:Journal of Academy of Business and Economics
Geographic Code:1USA
Date:Jan 1, 2004
Words:6063
Previous Article:A model to price puttable corporate bonds with default risk.
Next Article:Fat--tails and VaR estimation using power EWMA models.
Topics:



Related Articles
Possible applications of neurocomputing in defense.
A neural network - could it work for you?
The AI factory; how artificial intelligence will create 'smart plants.' (Cover Story)
Neural-net neighbors learn from each other. (computer model of brain cells)
Prostate cancer: diagnosis by computer. (neural network trained to identify men with prostate cancer and to predict recurrence) (Brief Article)
QC of the discontinuous compounding process in a rubber internal mixer by regression and neural networks process models.
Gender differences in academic rank in Australian universities.(Contributed Article)
An econometric analysis of banking financial results in Ukraine.
NeuralTools software for Excel.(IT News)
Building predictive models for election results in India--an application of classification trees and neural networks.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles