Printer Friendly
The Free Library
6,672,916 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Building predictive models for election results in India--an application of classification trees and neural networks.


ABSTRACT

The 2002 Judgment of the Supreme Court of India The Supreme Court of India is the highest court of the land as established by Part V, Chapter IV of the Constitution of India. According to the Constitution of India, the role of the Supreme Court is that of a federal court, guardian of the Constitution and the highest court of  paved pave  
tr.v. paved, pav·ing, paves
1. To cover with a pavement.

2. To cover uniformly, as if with pavement.

3. To be or compose the pavement of.
 the way for compulsory disclosure of information with respect to the background of candidates in elections. This information includes the assets and liabilities as well as criminal antecedents, if any. The genera/elections held in 2004 were the first set of elections after the implementation of Supreme Court ruling. Thus, a fairly large amount of data on the candidate' background had become available for the first time. This data was used to build predictive models for forecasting the results of the Legislative Assembly elections of the state of Karnataka. Two different data mining techniques namely, classification trees and artificial neural networks (artificial intelligence) artificial neural network - (ANN, commonly just "neural network" or "neural net") A network of many very simple processors ("units" or "neurons"), each possibly having a (small amount of) local memory.  were used to build the predictive models. The prediction accuracy ranged between go and 98 percent.

Keywords: Predictive Models, Classification Trees, Artificial Neural Networks, Elections, Data Mining

1. INTRODUCTION

The Indian general elections of 2004 were unique for more than one reason. The National Democratic Alliance (NDA (Non Disclosure Agreement) An agreement signed between two parties that have to disclose confidential information to each other in order to do business. In general, the NDA states why the information is being divulged and stipulates that it cannot be used for any ) government led by the Bharatiya Janata Party Bharatiya Janata party (bär`ətēə jän`ətə) [Hindi,=Indian People's party] (BJP), Indian political party that espouses Hindu nationalism.  (BJP BJP Bharatiya Janata Party (India)
BJP British Journal of Psychiatry
BJP British Journal of Photography
BJP Bubble Jet Printer (Canon)
BJP Bence Jones Protein
BJP Boston Jolly Pirates
) ruling at the center was so positive about its prospects of reelection re·e·lect also re-e·lect  
tr.v. re·e·lect·ed, re·e·lect·ing, re·e·lects
To elect again.



re
 they had advanced the elections by few months. Similarly, the state government in Andhra Pradesh Andhra Pradesh (än`drə prä`dāsh), state (2001 provisional pop. 75,727,541), 106,052 sq mi (275,608 sq km), SE India, on the Bay of Bengal. The capital is Hyderabad. , ruled by the Telugu Desam party advanced the elections to the state assembly in order to cash in on the positive feeling of the electorate. So was the case with the State of Karnataka where the ruling party was the Indian National Congress Indian National Congress, Indian political party, founded in 1885. Its founding members proposed economic reforms and wanted a larger role in the making of British policy for India. . The emergence of Janata Dal Janata Dal is an Indian political party which was formed through the merger one of the major Janata Party factions, the Lok Dal and a group of Congressmen led by V.P. Singh.  (S), which was led by the earlier prime minister of India The Prime Minister of India is, in practice, the most powerful person in the Government of India. The Prime Minister is technically outranked by the head of state, the President of India. , Mr. Deve Gowda and the failure of the ruling Indian National Congress was rather unexpected. The final results of elections had surprised many analysts. Even the exit polls turned out to be so close, some of the pollsters decided not to draw any conclusions. In any general elections, such results are not uncommon. But the results of the Karnataka Legislative Assembly elections of 2004 for the 12th Assembly were a major surprise. Table 1 shows the radical shift in the party positions in the 2004 assembly elections, for the 12th Assembly of Karnataka.

The general elections of 2004 are generally considered as a landmark election for other reasons. It was the first time that the entire country voted using electronic voting Electronic voting (also known as e-voting) is a term encompassing several different types of voting, embracing both electronic means of casting a vote and electronic means of counting votes.  machines. The Election Commission used more than a million electronic voting machines and the votes cast by more than 400 million voters were announced in less than 8 hours (Election Watch, 2004).

The general elections of 2004 could be considered as unique for a very important reason. In 2002, the Supreme Court of India delivered a judgment following a public interest litigation Public Interest Litigation, in Indian law, means litigation for the protection of public interest. It is litigation introduced in a court of law, not by the aggrieved party but by the court itself or by any other private party.  in the High Court of Delhi by the Association for Democratic Reforms asking for disclosure of candidates' background at the time of filing the nomination forms. The purpose is to make sure that the voters have sufficient information about the candidates in order to enable them to make an informed choice while casting their votes. The Delhi High Court The High Court of Delhi (Hindi: दिल्ली उच्च न्यायालय) was established on October 31, 1966.  delivered its judgment upholding the petition in 2000, giving directions to the Election Commission of India The Election Commission of India is an autonomous, quasi-judiciary constitutional body created to conduct free and fair elections to representative bodies in India. It was established on January 25, 1950.  to collect the information about the candidates using the police and other such agencies of the government and assess their suitability for holding a public office and give wide publicity to such information. Consequently, the Government of India The Government of India (Hindi: भारत सरकार [3]Bhārat Sarkār), officially referred to as the Union Government, and commonly as Central Government  filed a Special Leave Petition in the Supreme Court of India in 2001 against the judgment of the Delhi High Court. Interestingly, several political parties have become interveners to the case, opposing the Delhi High Court Judgment. The judgment pronounced by the Supreme Court in 2002 directed the Election Commission of India to ask for the following information from the candidates by way of affidavit affidavit

Written statement made voluntarily, confirmed by the oath or affirmation of the party making it, and signed before an officer empowered to administer such oaths.
 to be filed by the candidates along with the nomination form:

* Whether the candidate is convicted/acquitted/discharged of any criminal offence OFFENCE, crimes. The doing that which a penal law forbids to be done, or omitting to do what it commands; in this sense it is nearly synonymous with crime. (q.v.) In a more confined sense, it may be considered as having the same meaning with misdemeanor, (q.v.  in the past, if any, whether he/she is punished pun·ish  
v. pun·ished, pun·ish·ing, pun·ish·es

v.tr.
1. To subject to a penalty for an offense, sin, or fault.

2. To inflict a penalty for (an offense).

3.
 with imprisonment Imprisonment
See also Isolation.

Alcatraz Island

former federal maximum security penitentiary, near San Francisco; “escapeproof.” [Am. Hist.: Flexner, 218]

Altmark, the

German prison ship in World War II. [Br. Hist.
 or fine or both?

* Whether the candidate, six months prior to filing the nomination, is accused of any pending case of any offence punishable pun·ish  
v. pun·ished, pun·ish·ing, pun·ish·es

v.tr.
1. To subject to a penalty for an offense, sin, or fault.

2. To inflict a penalty for (an offense).

3.
 with imprisonment of two years or more and in which charge is framed or cognizance The power, authority, and ability of a judge to determine a particular legal matter. A judge's decision to take note of or deal with a cause.

That which is cognizable to a judge is within the scope of his or her jurisdiction.
 is taken by the court of law.

* The assets (movable, immovable, bank balances etc.) of not only the candidate but also of his/her spouse and the dependents.

* Liabilities, if any, particularly to any public financial institutions or government

* Educational qualifications of the candidate

When the Election Commission issued an order in June 2002 implementing the order of the Supreme Court, it created a flurry Flurry

A drastic volume increase in a specific security.
 of activity among all the political parties. Twenty one political parties unanimously decided in an all party meeting that the order of the Election Commission could not be allowed to be implemented. An amendment to the Representation of People Act (which governs the electoral issues) was to be introduced in the Parliament in the Monsoon monsoon (mŏnsn) [Arab., mausium=season], wind that changes direction with change of season, notably in India and SE Asia.  session of 2002. The bill retained the disclosure of pending cases but deleted Deleted

A security that is no longer included on a specified market. Sometimes referred to as "delisted".

Notes:
Reasons for delisting include violating regulations, failing to meet financial specifications set out by the stock exchange and going bankrupt.
 the disclosure requirements of the assets and liabilities of the candidates. When the amendment could not be introduced in the parliament for various reasons, the government issued an ordinance A law, statute, or regulation enacted by a Municipal Corporation.

An ordinance is a law passed by a municipal government. A municipality, such as a city, town, village, or borough, is a political subdivision of a state within which a municipal corporation has been
 which maintained that "no candidate shall be liable to disclose or furnish fur·nish  
tr.v. fur·nished, fur·nish·ing, fur·nish·es
1. To equip with what is needed, especially to provide furniture for.

2.
 any information which is not required to be disclosed under the proposed bill, not withstanding anything contained in any judgment, decree or order of any court or any direction, order or any other instruction issued by the Election Commission". This ordinance led to a flurry of Writ Petitions in the Supreme Court. The Supreme Court delivered its judgment in March 2003, holding the amended act illegal, null A character that is all 0 bits. Also written as "NUL," it is the first character in the ASCII and EBCDIC data codes. In hex, it displays and prints as 00; in decimal, it may appear as a single zero in a chart of codes, but displays and prints as a blank space.  and void. It restored the earlier judgment of 2002 and also declared that the judgment had attained finality fi·nal·i·ty  
n. pl. fi·nal·i·ties
1. The condition or fact of being final.

2. A final, conclusive, or decisive act or utterance.

Noun 1.
.

Thus, the 2004 General Elections were the first elections to be held where the candidates were forced to make complete disclosures of their antecedents as well as their assets and liabilities. It was also decided to extend the provisions of the Supreme Court judgment not only to the parliament and state assembly elections, but also to local (village panchayat Noun 1. panchayat - a village council in India or southern Pakistan
panchayet, punchayet

council - a body serving in an administrative capacity; "student council"
 level) elections as well.

Thus, there was a lot of information about the candidates available to the voters, which could enable them to make an informed decision while casting their vote. It is also interesting to see if the information thus made available did make any difference in the outcome of the elections. Also, it is important to see if it would be possible to use the information to predict or forecast the results of the elections. Such forecast is very different from the forecasts based on "exit polls". Here, the attempt is not only to build predictive models in order to predict the possible outcomes, but also to identify various aspects of information, which could influence the final results of the elections.

Thus the objectives of this research paper are to

1. Develop predictive models which could be used for predicting the outcomes of the election

2. Identify various aspects of information made available consequent con·se·quent  
adj.
1.
a. Following as a natural effect, result, or conclusion: tried to prevent an oil spill and the consequent damage to wildlife.

b.
 to the Supreme Court judgment and the subsequent orders of the Election Commission and

3. Evaluate the relative importance of these aspects of information in predicting the election outcomes.

2. METHODOLOGY

The elections for the lower house of the Parliament (Lok Sabha The Lok Sabhha (alternatively titled, the House of the People, by the Constitution of India) is the lower house in the Parliament of India. The Lok Sabha also stands for the term of the lower house between consecutive parliamentary general elections in India. ) and the legislative assembly of three states namely Andhra Pradesh, Karnataka and Orissa were held simultaneously in 2004. Subsequent to the order of the Election Commission, details of the antecedents of the candidates along with the assets and liabilities became available. The Association for Democratic Reforms had formed Election Watch Committees in the states to collate col·late  
tr.v. col·lat·ed, col·lat·ing, col·lates
1. To examine and compare carefully in order to note points of disagreement.

2. To assemble in proper numerical or logical sequence.

3.
 the information from the nomination papers and the appended affidavits. Data with respect to the candidates of the Karnataka State Legislative Assembly is obtained from the Election Watch Committee of Karnataka. In addition to the data collected from the Election Watch, other information available in the public domain is used to complete the information on each of the candidates. The election results of the candidates (win or loss) are used as the dependent variable for the predictive models. This is treated as a binary categorical That which is unqualified or unconditional.

A categorical imperative is a rule, command, or moral obligation that is absolutely and universally binding.

Categorical is also used to describe programs limited to or designed for certain classes of people.
 variable. In addition, a number of variables on which information was available are used as independent variables. These variables included

* Age of the candidate (binned into 6 categories)

* Number of contestants in the specific constituency (binned into 4 categories)

* Movable assets (binned into 3 categories)

* Immovable assets (binned into 3 categories)

* Total Assets (binned into 3 categories)

* Liabilities (binned into 3 categories)

* Ownership of commercial buildings (binned into three categories including unknown)

* Ownership of residential buildings (binned into three categories including unknown)

* Whether the candidate belongs to the ruling party or not

* Revenue Division of the state (all the districts in the state are grouped into 4 revenue divisions)

* The areas to which the districts of Karnataka originally belonged (Karnataka state was formed by taking some of the districts of old Bombay Old Bombay was used to refer to the area which was formed by the merging of the seven original islands of Mumbai, India. The term is now archaic and was used from the 19th century until the 1980s. The more widely used term today is just the word 'town'.  and Madras Madras.

1 State and former province, India: see Tamil Nadu.

2 City, India: see Chennai.
 Provinces, Princely States A princely state is any state under the reign of a prince and is thus a principality taken in the broad sense. The term refers not only to sovereign nations ruled by monarchs but also to lower polities ruled by various high nobles (often vassals in a feudal system).  of Mysore and Nizam). This, along with the revenue divisions is expected to represent the demographic characteristics and the developmental differences across different districts of the state.

* Whether the constituency was reserved for the scheduled caste sched·uled caste
n.
Any of the historically disadvantaged Indian castes of low rank, now under government protection.



[From such castes having been entered on a list or "schedule" during British rule.]
 and scheduled tribe candidates

* Type of political party (binned into 6 categories including independents)

* Whether the candidate is an incumbent member of the legislative assembly A Member of the Legislative Assembly, or MLA, is a representative elected by the voters of an electoral district to the legislature or legislative assembly of a subnational jurisdiction.

* Whether the candidate belongs to the incumbent party in the specific constituency

* Gender

* Educational level (binned into 6 categories)

* Whether the candidate had any criminal record

* Whether the candidates owns any agricultural land

* Whether the candidate has any liabilities to financial institutions

* Whether the candidate has any liabilities to government

Since all the variables are categorical in nature, usual predictive models that revolve around Verb 1. revolve around - center upon; "Her entire attention centered on her children"; "Our day revolved around our work"
center, center on, concentrate on, focus on, revolve about
 regression techniques could not be used for prediction of this specific case. On the other hand, other predictive models such as classification trees, neural networks neural network or neural computing, computer architecture modeled upon the human brain's interconnected system of neurons. Neural networks imitate the brain's ability to sort out patterns and learn from trial and error, discerning and extracting , classification and regression trees etc. would be ideal for handling these types of variables. These techniques, which fall in the broad categorization of data mining techniques, are used for developing the predictive models.

3. CANDIDATE PROFILE

The general profile of the candidates is presented in Table 2. There are a total of 224 constituencies for which the elections were held in 2004. Of these 224 constituencies, data on all the candidates was available for 195 constituencies.

About 70 percent of the candidates have studied beyond high school level. Forty five percent of them are either graduates or post graduates. Only 86 of the candidates are female representing the domination of male members in the elections. The candidates appear to be younger with about 57 percent belonging to the age group of 30 to 50 years. Less than two percent of the candidates are more than 70 years old.

More than three-fourths of the members of the previous assembly (11th Karnataka Assembly) are again contesting in the 2004 elections. 151 out of the 195 constituencies have candidates who are incumbents. Similarly, 88 percent of the assembly constituencies had candidates from the incumbent party. All the national as well as regional parties had more or less equal distribution in terms of number of candidates contesting the elections.

More than 11 percent of the candidates have declared to have a criminal record. Very few candidates declared to have any dues to the government or financial institutions, where as those who have dues to banks accounted for 43 percent. About one-third of the candidates have declared ownership of commercial buildings and similar number declared ownership of residential buildings. About 55 percent of them own agricultural land. In general, the candidates are predominantly pre·dom·i·nant  
adj.
1. Having greatest ascendancy, importance, influence, authority, or force. See Synonyms at dominant.

2.
 male with better educational qualifications and younger in age. The ownership of assets is mainly in agricultural land, residential and commercial buildings.

4. RESULTS

Two most commonly used classification techniques are classification trees and artificial neural networks. A brief description of the two techniques is given below.

4.1 Classification Trees

A classification tree is a predictive model, which takes the form of a tree. Each branch of the tree is a classification question, and the leaves of the tree are partitions of the data set. The tree divides the data on each branch without losing any of the data. The technique picks predictors (independent variables) and the appropriate values for branching on the basis of the gain in information that the branching provides. The information gain can be defined as the difference between the amount of information that is needed to correctly predict the outcome before and after the split (branching) has been made. This difference is measured by the extent of entropy entropy (ĕn`trəpē), quantity specifying the amount of disorder or randomness in a system bearing energy or information. Originally defined in thermodynamics in terms of heat and temperature, entropy indicates the degree to which a given  or Gini Coefficient The Gini coefficient is a measure of statistical dispersion most prominently used as a measure of inequality of income distribution or inequality of wealth distribution. It is defined as a ratio with values between 0 and 1: the numerator is the area between the Lorenz curve of the  or simple Chi-square analysis. The classification trees provide rules for prediction that are easy to understand and implement and hence are they used very frequently for building predictive models (Nagadevara and Tara, 2004).

4.2 Artificial Neural Networks (ANN)

The artificial neural networks (ANN) are generally based on the concepts of the human (or biological) neural network consisting of neurons Neurons
Nerve cells in the brain, brain stem, and spinal cord that connect the nervous system and the muscles.

Mentioned in: Speech Disorders
, which are interconnected by the processing elements. The ANNs are composed of two main structures namely the nodes and the links. The nodes correspond to the neurons and the links correspond to the links between neurons. The ANN accepts the values of inputs into what are called input nodes. This set of nodes is also referred to as the input layer, as shown in Figure 1.

[FIGURE 1 OMITTED]

These input values are then multiplied by a set of numbers (also called as weights) that are stored in the links. These values, after multiplication multiplication, fundamental operation in arithmetic and algebra. Multiplication by a whole number can be interpreted as successive addition. For example, a number N multiplied by 3 is N + N + N. , are added together to become inputs to the set of nodes that are to the right of the input nodes. This layer of nodes is usually referred to as the hidden layer. Many ANNs contain multiple hidden layers, each feeding into the next layer. Finally, the values from last hidden layer are fed into an output node, where a special mapping or thresholding function is applied and the resulting number is mapped to the prediction. The ANN is created by presenting the network with inputs from many records whose outcome is already know. For example, the data on age, income and occupation of the first customer (first record) are inputted into the input layer. These values are fed into the hidden layer and after processing (by combining these values using appropriate weights) the prediction is made at the output layer. If the prediction made by the ANN matches with the actual known status of the customer (either Loyal or Hopper A tray, or chute, that accepts input to a mechanical device, such as a disk duplicator or printer. In the days of punch cards, millions of cards were numerically or alphabetically organized by placing them into the hopper of a card sorter, taking them out of all the stackers and putting ), then the prediction is good and the ANN proceeds to the next record. If the prediction is wrong, then the extent of error (expressed in numerical values) is apportioned ap·por·tion  
tr.v. ap·por·tioned, ap·por·tion·ing, ap·por·tions
To divide and assign according to a plan; allot: "The tendency persists to apportion blame as suits the circumstances" 
 back into the links and the hidden nodes. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, the values of the weights at each link are modified based on the extent of error in prediction. This process is referred to as the backward propagation The transmission (spreading) of signals from one place to another. . The artificial neural networks are found to be effective in detecting unknown relationships. ANNs have been applied in many service industries such as health (to identify the length of stay and hospital expenses) (Nagadevara, 2004), hospitality (Nagadevara, 2005) air lines (Chatfield, 1998) etc.

4.3 Skewed Distributions Skewed distribution

Probability distribution in which an unequal number of observations lie below (negative skew) or above (positive skew) the mean.


The above two techniques, namely classification trees and artificial neural networks were applied to predict the results of the Karnataka Legislative Assembly elections of 2004. The entire data with respect to the 1641 candidates was used to train the models as well as for testing the effectiveness of the prediction. Both the techniques resulted in interesting predictions. The predictions of these two models are given in Table 3.

The overall misclassification with respect to the neural network was only 3.84 percent. At the same time, the misclassification for the classification tree was about 11 percent. Both the models are very effective in predicting the "loser (jargon) loser - An unexpectedly bad situation, program, programmer, or person. Someone who habitually loses. (Even winners can lose occasionally). Someone who knows not and knows not that he knows not. " category (the misclassification is less than two percent). On the other hand, both the models are rather ineffective in predicting the "winner" category. For this category, the accuracy level of the classification tree was only 21.54 percent where as that of the neural network is 77.44 percent. In neither of the cases, the accuracy of this category is nowhere near that of the "loser" category. This type of problem in training the model, with both the classification tree as well as the neural network is not uncommon with skewed skewed

curve of a usually unimodal distribution with one tail drawn out more than the other and the median will lie above or below the mean.

skewed Epidemiology adjective Referring to an asymmetrical distribution of a population or of data
 data sets. The data set consisted of 195 constituencies and consequently the total number of winners is only 195. On the other hand the total number of candidates was 1641. The proportion of winners among the total candidates was less than 12 percent. Thus, if the model predicts that "every one is a loser", the model would be misclassifying a maximum of 12 percent of the cases, resulting in an accuracy level of 88 percent! The behavior of the model would be such that it tends to predict more cases as losers since the data set is heavily skewed in favor of losers. In such cases standard classifiers tend to be overwhelmed o·ver·whelm  
tr.v. o·ver·whelmed, o·ver·whelm·ing, o·ver·whelms
1. To surge over and submerge; engulf: waves overwhelming the rocky shoreline.

2.
a.
 by the large class and ignore the small or minority class (Chawla, 2002, Chawla 2003). This problem of skewed data sets with "minority classes" can be handled with different approaches. At the algorithmic level, solutions include adjusting inflating the costs to counter the class imbalance imbalance /im·bal·ance/ (im-bal´ans)
1. lack of balance, such as between two opposing muscles or between electrolytes in the body.

2. dysequilibrium (2).
, adjusting the probabilistic (probability) probabilistic - Relating to, or governed by, probability. The behaviour of a probabilistic system cannot be predicted exactly but the probability of certain behaviours is known. Such systems may be simulated using pseudorandom numbers.  estimate at the tree leaf, in the case of classification trees etc. At the data level, the solutions include random over sampling and under sampling as well as directed over sampling.

The basic idea of over sampling or under sampling is to eliminate or minimize the imbalance or rarity by altering the distributions of training examples (Weiss, 2004). Typically the class distribution is altered to reduce the problems associated with rare classes. Under sampling eliminates majority class examples. This is achieved by taking a smaller sample of the large class and combining it with the entire training set of the smaller class. In order to avoid any sampling bias, a number of such random samples are drawn from the large class and the models are trained using these different samples. Ultimately, the classification rules obtained from different samples are combined or bagged to evolve a single set of classification rules. The over sampling duplicates minority class examples. Over sampling can lead to over-fitting because it involves more numbers of exact copies (Chawla, 2003 and Drummond and Holte, 2003). Over sampling does not actually make new data available and consequently, could become ineffective in improving the predictability of the minority class (Drummond and Holte, 2003). At the same time the over sampling had given better classification in other studies such as (Japkowicz and Stephen, 2002). In the case of election data of Karnataka, it is felt that under sampling will not be appropriate because the elimination of some of the losing candidates from the database will result in loss of important information required for classification. Consequently, it was decided to use over sampling techniques by replicating the minority class there by increasing the total number of records to 2226. The minority class in the over-sampled data set constituted about 35 percent of the total there by avoiding the pitfalls involved in the classifying skewed data sets.

4.4 Final Results

Two techniques of classification, namely classification trees and neural networks are used for classifying and predicting the winners and losers of the election using the data set with over sampling. The misclassification matrix for the two techniques is presented in Table 4.

The overall misclassification for the neural network is about 2.02 percent as compared to 11.01 percent for the classification tree. There was no significant improvement in the overall misclassification after over-sampling the data set. At the same time, there is a significant increase in the accuracy levels of the predictions with respect to the winners. The accuracy levels have improved to 92.82 percent in the case of classification tree and to 96.92 percent in the case of neural network. This improvement in the accuracy levels is significant when compared to the results obtained earlier without over sampling (21.54 percent for the classification tree and 77.44 percent for the neural network). The improvement in the case of the classification tree is through a trade off in the accuracy levels of the predictions with respect to the losers. The prediction accuracy of this category had come down to 86.93 percent as compared to 98.34 percent earlier. The classification tree is presented in Figure 2. It can be seen that the variable "Party Type" is at the top of the tree indicating that it is the first variable over which the branching is carried out. The criminal record and the movable assets appear in the next level.

[FIGURE 2 OMITTED]

The neural networks do not provide similar levels to the variables. Nevertheless, the neural networks do attach a numerical value to the variables indicating how sensitive the prediction is with respect to a change in the value of these variables. Table 5 presents the sensitivity levels of these variables grouped into 5 categories namely, demographic characteristics, ownership details, extent of liabilities, political factors and others.

Among the demographic characteristics, age and education appear to be predominant in predicting the outcomes. It may be recalled that most of the candidates in this election belong to an younger group with fairly good educational background. From the assets and liabilities side, the total assets and total liabilities appear to be important, followed by the dues to the government and ownership of agricultural land and commercial buildings. Among the political factors, the results are most sensitive to the Party Type. Incumbency in·cum·ben·cy  
n. pl. in·cum·ben·cies
1. The quality or condition of being incumbent.

2. Something incumbent; an obligation.

3.
a. The holding of an office or ecclesiastical benefice.
 does not appear to be of high importance while the number of candidates in the fray fray 1  
n.
1. A scuffle; a brawl. See Synonyms at brawl.

2. A heated dispute or contest.

tr.v. frayed, fray·ing, frays Archaic
1. To alarm; frighten.

2.
 appears to be more important. Finally, the results are not very sensitive to the criminal record of the candidate. The revenue divisions of the state, which are incidentally also indicators of the level of development within the state is more important in the results. The reason for criminal record not being important could be that only 11 percent of the candidates have criminal record. In the southern state of Karnataka, criminal antecedents of the candidates is not as predominant an issue as in some of the other states. Data with respect to other state assemblies is not readily available for comparison purposes.

Nevertheless, an analysis of the candidates from different states for the 2004 Lok Sabha elections reveals that only 9.8 percent of the candidates from Karnataka were known to have criminal antecedents, where as the corresponding percentages were 23.3 percent in West Bengal West Bengal: see Bengal.
West Bengal

State (pop., 2001: 80,176,197), northeastern India. It is bordered by Nepal and Bangladesh and the states of Orissa, Jharkhand, Bihar, Sikkim, Assam, and Meghalaya and has an area of 34,267 sq mi (88,752 sq km);
, 20.1 percent in Bihar and 19.6 in Uttar Pradesh Uttar Pradesh (`tär prä`dĭsh), state (2001 provisional pop. 166,052,859), 92,804 sq mi (240,363 sq km), N central India. The capital is Lucknow.  (Election Watch).

5. CONCLUSIONS

The Supreme Court judgment of 2002 with respect to the disclosures of the background of candidates for elections in India Elections in India are more than a process of voting someone to rule the nation. Since independence, elections in India have evolved a long way, but all along elections have been a significant cultural aspect of Independent India.  resulted in providing voters with sufficient information. While this information was primarily meant to enable the voters to make a well-informed choice, the availability of such information made it possible to build effective predictive models for forecasting the election results. Two techniques namely classification trees and artificial neural networks were used to build the predictive models for the Karnataka Assembly elections. Over-sampling technique was used to eliminate the predictive biases introduced by the skewness Skewness

A statistical term used to describe a situation's asymmetry in relation to a normal distribution.

Notes:
A positive skew describes a distribution favoring the right tail, whereas a negative skew describes a distribution favoring the left tail.
 of the data set. The overall accuracy of the predictive models varied from 90 to 98 percent. The important variables in predicting the election outcomes were age, education, ownership of assets, liabilities, type of (political) party, as well as the number of candidates in the fray. The extent of economic development as indicated by the revenue divisions of the state was also found to be an important predictor. The real test of the predictive model would be to apply the model to the data on candidates in the next general elections and validate the results.

REFERENCES:

Chatfield, C. (1998) 'Time Series Forecasting with Neural Networks: A Comparative Study using the Airlines Data', A lied Statistics 47, Part 2, pp. 231-250

Chawla N, Bower K, Hall L, Kegelmeyer W. "SMOTE smote  
v.
Past tense and a past participle of smite.


smote
Verb

the past tense of smite
: Synthetic Minority Over-sampling Technique", Journal of Artificial Intelligence Research The Journal of Artificial Intelligence Research (usually known as JAIR; ISSN 1076-9757) is a free on-line peer-reviewed scholarly journal publishing papers in all areas of artificial intelligence. , Morgan Kaufman Publishers, pp321-357, 2002

Chawla N. "C4.5 and Imbalanced Data Sets: Investigating the Effects of Sampling Method, Probability Estimate, and Decision Tree Structure", in Workshop on Learning from Imbalanced Data Sets II, ICML (International Conference on Machine Learning) An annual conference devoted to algorithms used in self-learning systems. Since its inception in 1983, ICML is held in North America or Europe. See ECML. , Washington DC, USA, 2003

Drummond C. and Holte. R. C. "C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling", In Workshop on Learning from Imbalanced Data Sets II, International Conference on Machine Learning, Washington De, USA, 2003.

Election Watch, Association for Democratic Reforms, Ahmedabad, India, 2004

Japkowicz N. and Stephen S. "The class imbalance problem: a systematic study". Intelligent Data Analysis, 6(5): 429-450, 2002.

Nagadevara, V and Tara S. N., "Improving the Effectiveness of Post Literacy Programme Through Data Mining Techniques", Towards E-Government Management Challenges, Ed. MP Gupta, Tata McGraw Hill Publishing Company, New Delhi New Delhi (dĕl`ē), city (1991 pop. 294,149), capital of India and of Delhi state, N central India, on the right bank of the Yamuna River. , 2004

Nagadevara, V "Application of Neural Prediction Models This article outlines the various propagation models currently used by the wireless industry for signal transmission at both 900 MHz and 1800 MHz. We start with the foundation of free-space transmission, followed by Picquenard’s multiple knife edge diffraction model.  in Healthcare", Proceedings of the 2nd International Conference on e-Governance, Nov 29-Dec 1, 2004, Colombo, Sri Lanka Sri Lanka (srē läng`kə) [Sinhalese,=resplendent land], formerly Ceylon, ancient Taprobane, officially Democratic Socialist Republic of Sri Lanka, island republic (2005 est. pop. , pp 139-148.

Nagadevara V., "Improving the Effectiveness of Hotel Loyalty Programmes through Data Mining", Proceedings of the "International Conference on Services Management, Mar 11-12, 2005, New Delhi, India

Weiss G. M. "Mining with rarity: a unifying framework", ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field.  SIGKDD SIGKDD Special Interest Group on Knowledge Discovery in Data and Data Mining (ACM)  Explorations Newsletter Special issue on learning from imbalanced datasets Volume 6, Issue 1 June 2004

Vishnuprasad Nagadevara, Indian Institute of Management Bangalore The Indian Institute of Management Bangalore (IIMB) is one of India's premier management institutes. It was established in the year 1973. It is widely considered to be one of the toughest to get in MBA programs in the world. , INDIA

Dr. Vishnuprasad Nagadevara earned his Ph. D. from Iowa State University Academics
ISU is best known for its degree programs in science, engineering, and agriculture. ISU is also home of the world's first electronic digital computing device, the Atanasoff–Berry Computer.
, Ames Iowa. He is currently Professor in the Quantitative Methods and Information Systems Area at the Indian Institute of Management Bangalore. His current research interests are Data Mining, Application of Management Techniques to Education and Entrepreneurship.
TABLE 1. PARTY-WISE POSITIONS IN THE 11TH AND 12TH
LEGISLATIVE ASSEMBLY OF KARNATAKA

                              11TH       12TH
Party                       Assembly   Assembly

Indian National Congress      133         66
Bharatiya Janata Party         43         79
Janata Dal (S)                 10         58
Janata Dal (U)                 18         5
Independents                   19         12
Others                         1          4
Total                         224        224

TABLE 2. PROFILE OF THE CANDIDATES--
ASSEMBLY ELECTIONS OF KARNATAKA

Category                         Frequency   Percent

Education

Unknown                             272       16.58
Prima School                        43        2.62
High School                         353       21.51
Pre-University                      231       14.08
Graduate                            447       27.24
Post Graduate                       294       17.92

Gender

Female                              86        5.24
Male                               1555       94.76

Belongs to Incumbent Party In the constituency

Does not belong                    1469       89.52
Belongs                             172       10.48

Incumbent

Not incumbent                      1490       90.80
Incumbent Candidate                 151       9.20

Type of Part

Unknown                             387       23.58
BJP                                 177       10.79
Congress                            196       11.94
Other National Party                210       12.80
JD (S)                              193       11.76
Other Regional Party                295       17.98
Independent                         183       11.15

Ownership of Houses

Does Not Own                        592       36.08
Owns                                590       35.95
Unknown                             459       27.97

Ownership of Commercial Buildings

Does Not Own                        600       36.56
Owns                                573       34.92
Unknown                             468       28.52

Belong to Ruling Party

Does not Belong                    1445       88.06
Belongs                             196       11.94

Age

Less than 30                        110       6.70
30 to 40                            428       26.08
40 to 50                            507       30.90
50 to 60                            405       24.68
60 to 70                            146       8.90
More than 70                        22        1.34

Has Government Dues

Unknown                             437       26.63
No Dues to Government              1140       69.47
Owes dues to Government             64        3.90

Has Dues to Financial Institutions

Unknown                             526       32.05
No Dues to Fls                      917       55.88
Owes Dues to Fls                    198       12.07

Has Dues to Banks

Unknown                             396       24.13
No Dues to Banks                    532       32.42
Owes Dues to Banks                  713       43.45

Owns Agricultural Land

Unknown                             379       23.10
No Agricultural Land                368       22.43
Owns Agricultural Land              894       54.48

Criminal Record

Does not have Criminal Record      1455       88.67
Has Criminal Record                 186       11.33

No. of Candidates in the Constituent

<=6                                 427       26.02
7 to 9                              457       27.85
10 to 12                            344       20.96
12                                  413       25.17

Reserved Constituency

Does not belong                    1391       84.77
Belongs                             250       15.23

TABLE 3. PREDICTIONS BASED ON THE TWO MODELS
BEFORE ELIMINATING THE IMBALANCE

                         Prediction

                 Neural Network

                   Lose       Win     Total

Actual   Lose      1427        19     1446
                   9869%     1.31%
         Win        44        151      195
                  22.56%     77.44%
         Total     1471       170     1641

Error                     3.84%

                        Prediction

                 Classification Tree

                   Lose       Win     Total

Actual   Lose      1422        24     1446
                  98.34%     1.66%
         Win        153        42      195
                  78.46%     21.54%
         Total     1575        66     1641

Error                     10.79%

TABLE 4. PREDICTIONS BASED ON THE TWO MODELS
AFTER ELIMINATING THE IMBALANCE

                       Prediction

                 Neural Network

                 Lose      Win      Total

Actual   Lose     1425       21     1446
                 98.55%    1.45%
         Win       24       756      780
                  3.08%    96.92%
         Total    1449      777     2226
Error             2.02%

                        Prediction

                 Classification Tree

                 Lose      Win      Total

Actual   Lose     1257      189     1446
                 86.93%    13.07%
         Win       56       724      780
                  7.18%    92.82%
         Total    1313      913     2226
Error            11.01%

TABLE 5. ARTIFICIAL NEURAL NETWORK--SENSITIVITY VALUES

Demographic
Characteristics                  Ownership

Variable       Sensitivity       Variable        Sensitivity

                               Agricultural
Age                7.4             Land              4.1
                                Commercial
Education          7.2           Buildings           4.1
                                Residential
Gender             1.3           property            2.9
                                 Immovable
                                  Assets             2.5
                                  Movable
                                  Assets             3.6
                                   Total
                                  Assets             4.7

Liabilities                  Political Factors

Variable       Sensitivity   Variable            Sensitivity

    Bank                          No. of
    Loan           3.9          Candidates           4.7
   Loans                         Incumbent
  from Fls         4.3             Party             3.0
Liabilities
  to Govt.         5.1           Incumbent           1.7
   Total
Liabilities        5.3          Party Type           19
                                  Ruling
                                   Party             1.9

   Others

Variable       Sensitivity

  Criminal
   Record          2.4
  Original
 Divisions         3.8

  Reserved         1.2
  Revenue
 Divisions         4.9
COPYRIGHT 2005 International Academy of Business and Economics
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Nagadevara, Vishnuprasad
Publication:Journal of Academy of Business and Economics
Geographic Code:9INDI
Date:Mar 1, 2005
Words:4907
Previous Article:Build-operate-transfer of airport in fuzzy cost of capital and fuzzy capital budgeting.
Next Article:Multicriteria decision making on selection of decision analysis software.
Topics:



Related Articles
A computer eyes the heavens. (neural network trained to classify galaxies by type)(Brief Article)
APPLICATION OF ARTIFICIAL NEURAL NETWORKS FOR THE CLASSIFICATION OF REMOTE SENSING SPECTRAL REFLECTANCE DATA OF FUNGAL INFECTED SOYBEAN LEAF.(Brief...
IMPLEMENTING DATA MINING FOR BETTER CRM.(Technology Information)
Mathematics, computer science and statistics.(various articles)
ALGORITHMS KEY TO EACH STAGE OF MODELING PROCESS.(Numerical Algorithms Group's NAG Data Mining Components)
Data mining and customer relationship marketing in the banking industry.
Physics and engineering: a procedural method for determining model order for feedforward neural networks.(Brief Article)
Credit scoring using data mining techniques.
A comparison of neural networks and econometric discrete dependent variable models in prediction of occupational attainment.
Detecting a pattern: insurers can use predictive modeling to segment risks, reveal new market potential and maximize profits.(Predictive Modeling)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles