Printer Friendly
The Free Library
14,599,499 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Evolutionary stability in the infinitely repeated prisoners' dilemma played by two-state Moore machines.


I. Introduction

A large literature which attempts to explain how cooperative outcomes be supported in the repeated Prisoners' Dilemma (RPD RPD Rapid
RPD Radiation Protection Dosimetry
RPD Rapid Product Development
RPD Rochester Police Department
RPD Recurrent Pattern Detection (Commtouch anti-spam engine)
RPD Relative Percent Difference
RPD Removable Partial Denture
) has emerged in recent years. An interesting branch of this literature has analyzed an·a·lyze  
tr.v. an·a·lyzed, an·a·lyz·ing, an·a·lyz·es
1. To examine methodically by separating into parts and studying their interrelations.

2. Chemistry To make a chemical analysis of.

3.
 the infinite RPD as a game in which two "metaplayers" choose a strategy which is implemented by a finite finite - compact  automation, or Moore machine In the theory of computation, a Moore machine is a finite state automaton where the outputs are determined by the current state alone (and do not depend directly on the input). The state diagram for a Moore machine will include an output signal for each state. . This line of research has been especially interesting because it allows us to capture the notion of "bounded" rationality in the players. Also, the use of these machines as devices to implement strategies permits us to quantify Quantify - A performance analysis tool from Pure Software.  the notion of strategic complexity. We can then apply the idea that complexity is costly to the analysis, and the results have proven very interesting.

So far, the work in this area has applied the concepts of Nash equilibrium Noun 1. Nash equilibrium - (game theory) a stable state of a system that involves several interacting participants in which no participant can gain by a change of strategy as long as all the other participants remain unchanged  and evolutionary stability in determining what reasonable outcomes are in these games. However, it has frequently been that pointed out these equilibrium concepts may be either too restrictive or not restrictive enough to be useful in analyzing the infinitely repeated Prisoners' Dilemma. That is, if we use the Nash equilibrium as the appropriate equilibrium concept, the set of equilibrium outcomes is infinitely large. On the other hand, if we require the equilibrium to be evolutionarily stable, the set of such equilibrium outcomes is frequently empty. It has been suggested that the requirement for evolutionary stability may be too stringent a criterion.

The purpose of this essay is to examine what happens in these games in an evolutionary framework under various conditions. The results of the simulations I performed and report here indicate certain combinations of strategies may exist which, although not evolutionary stable in the sense of Maynard Smith [14], prove to be invasion-proof against permissible per·mis·si·ble  
adj.
Permitted; allowable: permissible tax deductions; permissible behavior in school.



per·mis
 mutant (programming) mutant - Microsoft's term for a mutex which is generally used in user mode but can also be used in kernel mode. According to this terminology a mutex is only used in kernel mode.

["Microsoft Windows NT Workstation Resource Kit"].
 strategies. This is akin to saying a set of possible strategies may exist which cannot be invaded, but neither the individual strategies nor the mixed strategy represented by the population is evolutionarily stable. We can imagine the mix strategies changing as various mutants attempt to invade in·vade  
v. in·vad·ed, in·vad·ing, in·vades

v.tr.
1. To enter by force in order to conquer or pillage.

2.
 the population but the set of strategies remaining the same as the one with which we started. In fact, it may well be the case that this set of strategies will not last forever, but it may last for a very long time. I formalize this idea later in the essay. In order to test the robustness of the hypothesis, a number of different mutation mutation, in biology, a sudden, random change in a gene, or unit of hereditary material, that can alter an inheritable characteristic. Most mutations are not beneficial, since any change in the delicate balance of an organism having a high level of adaptation to its  schemes were simulated to examine what sort of outcome we should expect to find in a population similar to the one I explore.

I begin by reviewing the applicable literature and discussing the philosophy underlying this exercise. Then I describes the set of automata automata - automaton  I consider as well as the various evolutionary model I simulated. As always, the results we obtain depend upon the assumptions of the model. The assumptions in this case include the set of permissible strategy implementing machines and the rules for how these strategic evolve over time. This line of research differs from other attempts to capture the results of an evolutionary story in two important ways. First,since the strategies I consider are all those which can be implemented by an automation with not more than two states, there is no predisposition predisposition /pre·dis·po·si·tion/ (-dis-po-zish´un) a latent susceptibility to disease that may be activated under certain conditions.

pre·dis·po·si·tion
n.
1.
, either intentional in·ten·tion·al  
adj.
1. Done deliberately; intended: an intentional slight. See Synonyms at voluntary.

2. Having to do with intention.
 or unintentional, toward a given outcome. The claim can be made that the strategies which were entered in Robert Axelrod's now famous Prisoners' Dilemma tournament [3] were in some sense biased toward the TIT-FOR-TAT tit-for-tat
Adjective

done in return or retaliation for a similar act: a spate of tit-for-tat killings [earlier tip for tap]
 (TFT (Thin Film Transistor) The term typically refers to active matrix screens on laptop computers. Active matrix LCD provides a sharper screen display and broader viewing angle than does passive matrix. See LCD and thin film.

TFT - Thin Film transistor
) type strategies because Axelrod identified TFT as the most successful strategy in a preliminary round of the tournament. The simulations I performed are not predisposed pre·dis·pose  
v. pre·dis·posed, pre·dis·pos·ing, pre·dis·pos·es

v.tr.
1.
a. To make (someone) inclined to something in advance:
 toward a particular outcome except through the evolutionary processes I use, and these are made explicit so we can either them or reject them on more reasonable grounds. Second, I implement different schemes for mutation which seem plausible in a sense I will explain later. Using Axelrod's tournament as an example again, we can imagine mutation entering his dynamic ecological ecological

emanating from or pertaining to ecology.


ecological biome
see biome.

ecological climax
the state of balance in an ecosystem when its inhabitants have established their permanent relationships with each
 process. Such mutant strategies could easily effect the final population in significant ways. However, what a reasonable mutant looks like in Axelrod's framework is not clear. In this essay, I formalize how the strategies are modeled so we can then evaluate the results based on the underlying assumptions.

II. Cooperation and the Infinite RPD

The Prisoner's Dilemma prisoner's dilemma

Imaginary situation employed in game theory. One version is as follows. Two prisoners are accused of a crime. If one confesses and the other does not, the one who confesses will be released immediately and the other will spend 20 years in prison.
 is well known bimatrix game which has been used to study such economic problems as oligopolistic collusion An agreement between two or more people to defraud a person of his or her rights or to obtain something that is prohibited by law.

A secret arrangement wherein two or more people whose legal interests seemingly conflict conspire to commit Fraud
, international trade, and public goods provision. The players in the game must simultaneously choose between " Cooperate" (C) and "Defect: (D). The payoffs I will use in this analysis are described in Figure I. Regardless of what one's opponent does, a player does better by defecting. This means (D,D) is the only Nash equilibrium in the one-shot game. The dilemma is that if the player could be induced induced /in·duced/ (in-dldbomacst´)
1. produced artificially.

2. produced by induction.

induced,
adj artificially caused to occur.


induced

induction.
 to cooperate they would both do better than if they receive the equilibrium payoff. In fact, any finitely fi·nite  
adj.
1.
a. Having bounds; limited: a finite list of choices; our finite fossil fuel reserves.

b. Existing, persisting, or enduring for a limited time only; impermanent.
 repeated version of the Prisoners' Dilemma has defection at every stage as the only perfect equilibrium outcome. It is not until we consider the infinitely repeated game that cooperation can be sustained. The "Folk Theorem Folk theorem may refer to:
  • Ethno-cultural studies of mathematics.
  • Mathematical folklore, theorems that are widely known to mathematicians but can't be traced back to an individual.
" of repeated games tells us any individually rational payoff vector can be the outcome of a perfect equilibrium of an infinitely repeated game when the payoffs are calculated as "the limit of the mean."(1)

John Maynard Smith Professor John Maynard Smith,[1] F.R.S. (6 January 1920 – 19 April 2004) was a British evolutionary biologist and geneticist. Originally an aeronautical engineer during the Second World War, he then took a second degree in genetics under the well-known biologist J.  and G. R. Price [15] are among those credited with the introduction of evolutionary game theory Evolutionary game theory (EGT) is the application of population genetics-inspired models of change in gene frequency in populations to game theory. It differs from classical game theory by focusing on the dynamics of strategy change more than the properties of strategy equilibria. , but perhaps the most well known of the theoretical works on the subject is Maynard Smith's 1082 book Evolution and the Theory of Games theory of games
n.
See game theory.

Noun 1. theory of games - (economics) a theory of competition stated in terms of gains and losses among opposing players
game theory
. Axelrod's [3] analysis of the success of cooperation in the repeated Prisoners' Dilemma (RPD) is probably the most famous of many studies of the game in an evolutionary framework. Evolutionary game theory has frequently been used to study coordination games In game theory, coordination games are a class of games with multiple pure strategy Nash equilibria in which players choose the same or corresponding strategies. For a classic example of a coordination game, consider the 2-player, 2-strategy game, with the payoff matrix shown on , common interest games, and the Prisoners' Dilemma. Since its introduction many people in a broad range of disciplines have applied it to the study of natural and social science problems. Here I provide a brief review of some of those studies, especially those analyses which deal with the Prisoners' Dilemma.

Robert Axelrod
For the actor, see Robert Axelrod (actor).


Robert Axelrod (born 1943) is a Professor of Political Science and Public Policy at the University of Michigan. He has appointments in the Department of Political Science and the Gerald R.
, in his 1984 book The Evolution of Cooperation, suggested cooperation is evolutionary stable in the RPD. However, the definition of evolutionary stability he employed differs from that used by Maynard Smith [4]. Specifically, Axelrod applied a concept he called "collective stability" which requires only that a strategy be a Nash equilibrium when it plays itself. In choosing this definition, Axelrod disallowed the possibility an alternate best reply strategy, which earns an equal payoff against both the indigenous and invading in·vade  
v. in·vad·ed, in·vad·ing, in·vades

v.tr.
1. To enter by force in order to conquer or pillage.

2.
 strategies, would be able to successfully infiltrate infiltrate /in·fil·trate/ (in-fil´trat)
1. to penetrate the interstices of a tissue or substance.

2. the material or solution so deposited.


in·fil·trate
v.
1.
 the native population.

This idea is embodied em·bod·y  
tr.v. em·bod·ied, em·bod·y·ing, em·bod·ies
1. To give a bodily form to; incarnate.

2. To represent in bodily or material form:
 in the formal definition of an evolutionary stable strategy (ESS (1) (Electronic Switching System) A large-scale computer from Lucent used to route telephone calls in a telephone company office. The 5ESS is a Class 5 central office switch, and the 4ESS is a Class 4 tandem office switch. ). A strategy [Sigma SIGMA - A scientific visual programming environment from NASA.

http://fi-www.arc.nasa.gov/fia/projects/sigma/.
], which can be either a pure or mixed strategy, is an ESS of a symmetric game In game theory, a symmetric game is a game where the payoffs for playing a particular strategy depend only on the other strategies employed, not on who is playing them. If one can change the identities of the players without changing the payoff to the strategies, then a game is  if and only if: u([Sigma], [Sigma]) [is greater than or equal to] u([Sigma]', [Sigma]) and [Mathematical Expression A group of characters or symbols representing a quantity or an operation. See arithmetic expression.  Omitted] for all [Sigma]' [is not equal to] [Sigma], where u([[Sigma].sub.1] [Sigma].sub.2]) represents the payoff to a player if he plays strategy [Sigma].sub.1] and his opponent chooses strategy [Sigma].sub.2]. In words, these conditions require an ESS be a best reply to itself, and if there are alternate best replies, the ESS must do strictly better against an alternate best reply than the alternate best reply does against itself. The first requirement above means any ESS is a Nash equilibrium. However, the converse (logic) converse - The truth of a proposition of the form A => B and its converse B => A are shown in the following truth table:

A B | A => B B => A ------+---------------- f f | t t f t | t f t f | f t t t | t t
 is not true because of the second condition, often called the "stability condition," which Axelrod eliminated as part of his definition of evolutionary stability [3, 217]. Therefore, it is in this weaker sense that TFT is stable. There were a number of possible strategies which met this requirement but did not do well in Axelrod's

The above definition of an ESS is actually a characterization A rather long and fancy word for analyzing a system or process and measuring its "characteristics." For example, a Web characterization would yield the number of current sites on the Web, types of sites, annual growth, etc.  of a more basic requirement which holds in pairwise random matching models.(2) Assuming von Neumann-Morgenstern utility functions, the following must hold for some sufficiently small sufficiently small - suitably small  proportion of invaders Generically speaking, invaders are those who participate in an invasion, often in a militaristic context. Other uses of the word include:
  • Invaders (comics), a Marvel Comics group of World War II superheroes created in 1975 by Roy Thomas.
, [Epsilon 1. (language) EPSILON - A macro language with high level features including strings and lists, developed by A.P. Ershov at Novosibirsk in 1967. EPSILON was used to implement ALGOL 68 on the M-220. ] > O: [Mathematical Expression Omitted] This requires the expected payoff from playing [Sigma] in a population with proportion of [Epsilon] of [Sigma]' is strictly greater than the expected utility of those playing [Sigma]'. Loosely speaking, then, an ESS is a strategy which cannot be invaded by an arbitrary small proportion of potential invading strategies.

Robert Boyd The name Robert Boyd is shared by:
  • Robert Boyd, 1st Lord Boyd
  • Robert Boyd, 4th Lord Boyd
  • Robert Boyd, 5th Lord Boyd
  • Robert Boyd, 7th Lord Boyd (1595–1628)
  • Robert Boyd, 8th Lord Boyd (c.
 and Jeffrey Lorberbaum [7] have shown no pure strategy is evolutionary stable in this more restricted sense in the infinite RPD. Specifically, they show a population of TFT players can be invaded by the appropriate mixture of Suspicious TIT-FOR-TAT, which is the strategy which defects on the first move and then reciprocates the opponent's move, and TIT-FOR-TWO-TATS (TF2T), which reciprocates defection only after two consecutive defections by the opponent. Moreover, they show that if every strategy is possible through mutation, no pure strategy is immune to invasion by some mixture of strategies. However, they suggest every strategy will not be possible through mutation, so cooperation based on reciprocity reciprocity

In international trade, the granting of mutual concessions on tariffs, quotas, or other commercial restrictions. Reciprocity implies that these concessions are neither intended nor expected to be generalized to other countries with which the contracting parties
 may indeed flourish This article is about magic term. For 2006 film, see Flourish (film).

A Flourish is a visual display of skill performed with playing cards to show the skill or ability of the performer.
. Joseph Farrell and Roger Ware [8] extended the work of Boyd and Lorberbaum to show that for any evolutionary stable mixture of strategies in the infinite RPD, every finite history must occur with positive probability. The implication of this result is no finite mixture of strategies is evolutionary stable in the infinite RPD. They use this result as evidence the ESS concept is too stringent a criterion to be useful.

More recently, Yong-Gwan Kim [12] showed that an ESS in the infinite RPD does not exist unless we allow perturbations. Then he applied Reinhard Selten's [20] concept of "limit ESS." This concept expands the set of evolutionary stable outcomes by allowing perturbations which may put some minimum probability on other strategies in the game. In this way, ties which arise may be broken to allow for more equilibria.(3) Kim was able to prove a "Folk Theorem" of sorts which states we can find a limit ESS outcome which is arbitrarily close to any convex combination A convex combination is a linear combination of data points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum up to 1.  of the payoffs from two purely defecting strategies and two purely cooperating strategies.

If we limit the set of possible strategies to those which can be modeled as finite automata Finite Automata - Finite State Machine , or Moore machines, we can include measures of complexity in the players' utility functions. This has led to interesting results and has only recently been used to make evolutionary arguments. It was Robert Aumann Yisrael Robert John Aumann (Hebrew: ישראל אומן‎) (born June 8, 1930) is an Israeli mathematician and a member of the United States National Academy of Sciences.  [2] who first suggested modeling strategies as machines with a finite number of states as a way to handle bounded rationality Many models of human behavior in the social sciences assume that humans can be reasonably approximated or described as "rational" entities (see for example rational choice theory). . Abraham Neyman [17] showed if we model strategies this way and limit the number of states exogenously, we can find equilibrium machines which cooperate at every stage of the game if the number of states is less than the number of iterations of the game. The intuition intuition, in philosophy, way of knowing directly; immediate apprehension. The Greeks understood intuition to be the grasp of universal principles by the intelligence (nous), as distinguished from the fleeting impressions of the senses.  here is easy to see. Suppose the Prisoners' Dilemma will be repeated one hundred times. As long as the machines which implement the strategies have fewer than one hundred states, cooperation can be maintained by the GRIM GRIM Grassroots Impeachment Movement
GRIM Glass Refractive Index Measurement
GRIM Guidance, Rationale, and Interoperability Modalities
GRiM Good Riflemen in Motion (gaming clan)
GRIM Global Real-Time Interactive Map (NASA) 
 strategy.(4) In order to beat the GRIM strategy, an opponent must be able to identify the last stage of the game so it can defect. Since it requires one hundred states to count that high, any machine with fewer than one hundred states cannot improve on the payoff to playing GRIM. Roy Radner [18] used a similar argument to show how cooperation can be maintained using strategies which are implemented by finite automata which are similarly bounded in size.

Dilip Abreu and Ariel Rubinstein Ariel Rubinstein (born April 13, 1951) is an Israeli economist who works in game theory. He was educated at the Hebrew University of Jerusalem, 1972-1979, in both mathematics and economics.  [1] have studied models where complexity, which in their model is defined to be the number of states in the automation, is endogenously en·dog·e·nous  
adj.
1. Produced or growing from within.

2. Originating or produced within an organism, tissue, or cell: endogenous secretions.
 determined. The metagame strategies are implemented by finite automata, and the complexity of the machine enters the players' decisions by making the metagame payoffs depend positively on stage game payoffs and negatively on complexity. By this I mean that if two machines yield the same stage game payoffs, the machine with fewer states has higher metagame payoffs. One model they analyzed had complexity enter the metagame payoffs lexicographically. Abreu and Rubinstein were able to reduce the set of equilibrium outcomes to the rational payoff vectors on the main and alternate diagonals of the set of feasible outcomes which provide each player with more than his security level. The interpretation Abreu and Rubinstein put on their model is that of a rational decision maker who must choose someone to play the game for him. However, the player who implements the strategy can only carry out simple instructions. We can think of these decision rules as "rules of thumb" which evolve over time. Another way to think of this is to assign some evolutionary process the roll of the metaplayer which selects the strategies. It is this interpretation which I will explore.

Ken Binmore and Larry Samuelson [5] have done interesting work recently using the model developed by Abreu and Rubinstein. They show that if we consider the same type of utility functions used by Abreu and Rubinstein, any evolutionary stable strategy has both players earning the cooperative, or utilitarian,(5) payoff. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, Abreu and Rubinstein refine the set of equilibrium payoffs by considering complexity of the implementing machine, and Binmore and Samuelson further refine this set to one outcome with an evolutionary stability argument. This result depends crucially on the definition of complexity we choose. Jeffrey Banks and Rangarajan Sundaram [4] have shown if we use preferences which are lexicographic lex·i·cog·ra·phy  
n.
The process or work of writing, editing, or compiling a dictionary.



[lexico(n) + -graphy.
 in complexity and use the number of transitions in the Moore machine as the measure of complexity, the only Nash equilibrium machine defects always.

These ideas are summarized in Figure 2. The equilibrium payoffs allowed by the "Folk Theorem" are all those vectors in the shaded region. The equilibrium payoff set from the Abreu/Rubinstein models contains the rational on the main and alternate diagonals which assure each player a payoff greater than one. Binmore and Samuelson's evolutionary model reduced the set to the unique point (3,3), and the Banks/Sundaram model yields the one equilibrium outcome in which both players always defect and receive the payoff vector (1,1).

Selecting the "right" equilibrium is frequently the result of some ad hoc For this purpose. Meaning "to this" in Latin, it refers to dealing with special situations as they occur rather than functions that are repeated on a regular basis. See ad hoc query and ad hoc mode.  decision rule. For example, many economists and game theorists This is a list of notable economists, mathematicians, political scientists, and computer scientists whose work has added substantially to the field of game theory.
  • Derek Abbott - Quantum game theory and Parrondo's games
 argue that when multiple equilibria exist the solution to the game should have the players achieving an efficient outcome. Using the terminology of Harsanyi [10] and Harsanyi and Selten [11], the players should choose this outcome based on payoff dominance (6) if possible. This intuition is especially appealing for repeated games since there is recurrent recurrent /re·cur·rent/ (re-kur´ent) [L. recurrens returning]
1. running back, or toward the source.

2. returning after remissions.


re·cur·rent
adj.
1.
 interaction between the players. The argument for selecting payoff dominant equilibrium outcomes is especially strong when some sort of coordinating device is present. For example, if we allow preplay communication between players, we should expect the equilibrium they agree upon will be payoff dominant if possible. Even in the absence of such a coordinating device, many game theorists and economists expect to see the payoff dominant equilibrium outcome. In the infinitive infinitive: see mood; tense.  RPD, of course, this means we should expect to see a cooperative outcome However, if such selection criteria are to be meaningful, they should be based on first principles and not chosen because they give nice results.

This area of game theory has just recently become the target of significant research. The results so far have proven interesting but not conclusive Determinative; beyond dispute or question. That which is conclusive is manifest, clear, or obvious. It is a legal inference made so peremptorily that it cannot be overthrown or contradicted. . The work I described by Binmore and Samuelson is the first research in this area. They prove that any symmetric equilibrium In game theory, a symmetric equilibrium is an equilibrium where both players use the same strategy (possibly mixed) in the equilibrium. In the Prisoner's Dilemma game pictured to the right, the only Nash equilibrium is (D, D).  strategy which yields less than the cooperative payoff can be invaded by a strategy which does not well, in stage game payoffs, as a native does if it plays a native, but it does better against itself than it does against a native. The invading machines are able to send a signal to their opponents. If they are not playing their own type, it makes no difference in the metagame payoffs because payoffs are computed as the limit of the mean, and in the limit they get the same payoff as the native strategy. However, they earn greater average payoffs than the native strategy because they do better when playing strategies like themselves. The idea behind signalling to opponents is nothing new. Arthur Robson [19] speaks of a "secret handshake A secret handshake is a series of hand gestures that indicate loyalty to a club, clique, or subculture. The purpose of the secret handshake is to identify exclusive group members, and consequently to prevent inclusion of outsiders. " which identifies members of a certain group to each other.

Drew Fudenberg and Eric Maskin Eric Stark Maskin (born December 12, 1950) is a American economist and co-winner, along with Leonid Hurwicz and Roger Myerson, of the 2007 Nobel Prize in Economics "for having laid the foundations of mechanism design theory." Biography
Eric Maskin is the Albert O.
 [9] have shown, independently of Binmore and Samuelson, that cooperation in RPD games is evolutionary stable in an environment where only strategies of finite complexity can be used, metagame payoffs are calculated as the limit of the mean, and noise exists. In their model, the term "noise" refers to some chance an action may be misperceived by a player's opponent. John Miller [16] performed interesting simulations of the evolution of automata which were modeled as a string of digits. The impact of noise on the model was one of the factors he examined. He simulated the evolution of population playing a finite RPD in which there were strictly positive probabilities of errors in the transition of information about what a player's opponent did on the previous move. (7) His results are difficult to characterize in a sentence or two; however, the methodology he used is interesting and the results indicate the effects of noise on a model like this are not negligible This article or section is written like a personal reflection or and may require .
Please [ improve this article] by rewriting this article or section in an .
.

There is no definitive theoretical result explaining why a certain outcome should emerge from some evolutionary process. We are caught between arguments like those of Farell and Ware, which indicate no strategy or mix of strategies is evolutionarily stable in the infinite RPD, and those which identify one particular strategy as the only reasonable result of an evolutionary process. With this in mind, I intend to explore the evolution of strategies which can implemented by two-state Moore Machines in the infinitive RPD. Axelrod's work has been standard in this area. However, as I pointed out earlier, his results may have been part of a self-fulfilling prophecy self-fulfilling prophecy, a concept developed by Robert K. Merton to explain how a belief or expectation, whether correct or not, affects the outcome of a situation or the way a person (or group) will behave.  in which TFT was identified as the most successful strategy. Another problem with Axelrod's work is that the next generation's population was determined only by the replication In database management, the ability to keep distributed databases synchronized by routinely copying the entire database or subsets of the database to other servers in the network.

There are various replication methods.
 dynamic process, and new strategies were not allowed to enter. In other words, the strategies were assumed to be perfect in their ability to breed true. In the next section, I will describe the basics of the simulations I performed. I described how I modeled the strategies, how I chose to model possible mutations, and how I attempted to capture the idea of strategic complexity.

III. The Model

In this essay, I consider only two-player games. The underlying game is the version of the Prisoners' Dilemma described in Figure 1. More formally, I will consider the game G which is specified by the four-tuple <[S.sub.1], [S.sub.2], [[Pi].sub.1], [[Pi].sub.2]> where [S.sub.i] = {C,D} the strategy set and [[Pi].sub.i] : [S.sub.1] X [S.sub.2] [right arrow] R is the payoff function. These elements are summarized in the bimatrix form of the game.

The infinitely repeated game [G.sup.[infinity infinity, in mathematics, that which is not finite. A sequence of numbers, a1, a2, a3, … , is said to "approach infinity" if the numbers eventually become arbitrarily large, i.e. ]] = <[R.sub.1], [R.sub.2], [P.sub.1], [P.sub.2]> is constructed with G as the stage game. Each player's strategy space, [R.sub.i], is the set of functions which map any history of play into [S.sub.i]. That is, [R.sub.i] ={f : [h.sub.t] [right arrow] [S.sub.i]} where [h.sub.t] is the history of the map up to and including time t. The payoff functions [P.sub.1] and [P.sub.2] are defined as the limit of the mean of the stage payoffs. This is always defined because we are considering only strategies which can be implemented using finite automata. Since the machines must eventually enter a cycle, we know this limit exists. Therefore, we have [Mathematical Expression Omitted] where [r.sub.i] [Epsilon] [R.sub.i].

I also use Abreu and Rubinstein's definition of an automation selection game. Here G# is the automation selection game defined as <[A.sub.1], [A.sub.2], [U.sub.1], [U.sub.2]>. The strategy space [A.sub.i] is the set of Moore machines which I will describe more completely later. The functions [U.sub.i] are the true utility, or profit, functions of the players. At various times during their precise form will vary. However, these functions basically adjust the payoff functions [P.sub.i] for complexity. In this automation selection game we can think of automation a [Epsilon] [A.sub.i] as actually being Player i in [G.sup.[infinity]]. In a sense, this automation is Player i's agent who follows the simple instructions represented by the Moore machine.

Now I will define more formally the kind of machine I have in mind as implementing these simple instructions. A Moore machine is described by a four-tuple <Q, [q.sub.0], [Lambda], [Mu]>, where Q = {[q.sub.0], [q.sub.1], [q.sub.2],..., [q.sub.m]} is a finite set In mathematics, a set is called finite if there is a bijection between the set and some set of the form where n is a natural number. (The value n = 0 is allowed; that is, the empty set is finite.) An infinite set is a set which is not finite.  of states, [q.sub.0] [Epsilon] Q is the initial state, [Lambda] : Q [right arrow] {C, D} is the output function which maps the state into a strategic choice, and [Mu] : Q x {C,D} [right arrow] Q is te transition function which maps the current state and the opponent's choice into a state (not necessarily different). Moore machines not only allow us to model strategies repeated games concisely con·cise  
adj.
Expressing much in few words; clear and succinct.



[Latin conc
 and conveniently, they let s analytically an·a·lyt·ic   or an·a·lyt·i·cal
adj.
1. Of or relating to analysis or analytics.

2. Dividing into elemental parts or basic principles.

3.
 calculate payoffs for infinite games.

These machines can be represented quite easily as demonstrated in Figure 3. Each circle in a machine represents a possible state of the machine, and the state on the left is the initial state. I depict de·pict  
tr.v. de·pict·ed, de·pict·ing, de·picts
1. To represent in a picture or sculpture.

2. To represent in words; describe. See Synonyms at represent.
 the action taken when the machine is in that state by the upper case letter inside the circle. The arrows indicate how the machine transitions after a play of the stage game, and the transition depends on the opponent's choice indicated by lower case letters. The strategies which are "nice" by Axelrod's definition are those which begin by cooperating and continue cooperating as long as the opponent does. An example of a nice strategy is GRIM or TFT.

Determining a reasonable mutation scheme is not without problems. One alternative is to assume it is possible for one strategy to mutate mu·tate  
intr. & tr.v. mu·tat·ed, mu·tat·ing, mu·tates
To undergo or cause to undergo mutation.



[Latin m
 to any other strategy with equal likelihood. I call this "uniform mutation." However, this is not a very satisfying way to proceed. If we are thinking of these machines as decision rules evolving over time, we must develop some idea of "closeness" in the strategies and only allow strategies to mutate to those which are close. Another idea we should try to capture is that simplifying mutations are more likely than complicating com·pli·cate  
tr. & intr.v. com·pli·cat·ed, com·pli·cat·ing, com·pli·cates
1. To make or become complex or perplexing.

2. To twist or become twisted together.

adj.
1.
 mutations. I will use this idea later as a way to pick up the costs of complexity.

In order to capture these notions of "closeness" and simplification, I developed a mutation scheme based on how these strategies would appear if they were machines in the physical sense and subject to breakdowns. We can think of these machines as consisting of transmitters and the appropriate wires hooked up to receivers to control which signal is transmitted. The simplest machines are those which need only transmit either "cooperate" or "defect." These correspond to the single state machines which always defect or always cooperate. Two-state machines are made up of the two transmitters which send the signals corresponding to cooperating and defection. I chose to model these strategies so the machines attempt to move to the next state, or change the signal it sends, after taking an action. However, the transition may be blocked for one of two reasons. First, there may be no circuit from one state to the next. In this case a machine will continually con·tin·u·al  
adj.
1. Recurring regularly or frequently: the continual need to pay the mortgage.

2.
 send the same response because it can reach no other state. Another way the transition can be inhibited in·hib·it  
tr.v. in·hib·it·ed, in·hib·it·ing, in·hib·its
1. To hold back; restrain. See Synonyms at restrain.

2. To prohibit; forbid.

3.
 is by perceiving an appropriate action from the opponent to prevent the transition. We can think of these inhibitors as open switches which keep the state from changing. For example, consider the GRIM strategy which I have modeled in Figure 4. The GRIM machine will begin cooperating and attempt to switch to the defecting state unless the machine receives the signal that the opponent cooperated. The receiver, indicated by a boxed "c," will keep the circuit open as long as it receives the signal that its opponent cooperated in the last stage. If the machine does not detect that its opponent cooperated, the switch will close, completing the circuit which allows the machine to make the transition to the defecting state.

This stylized styl·ize  
tr.v. styl·ized, styl·iz·ing, styl·iz·es
1. To restrict or make conform to a particular style.

2. To represent conventionally; conventionalize.
 model of how strategies are implemented offers an opportunity to consider what sensible mutations look like. For example, we can assume a reasonable mutation involves a wire in the machine breaking. We can imagine a strategy being misinterpreted by the player and some transition being missed. If the mutant machine continues to function as well or better in the environment than the machine from which it evolved, we can expect the mutant machine will be copied.

In this essay I will consider "breakdowns" of the machines, or possible mutations in which a wire breaks. Also, I assume it is possible for the signal to get reversed in the machine so the automaton automaton: see robot; robotics  sends C when it should send D and vice versa VICE VERSA. On the contrary; on opposite sides. . This is not the only way to model these strategies as mechanical or electrical devices. However, this is one way which is relatively simple and yields reasonable mutation patterns. I will not examine increasing complexity through mutation. This model of the strategies allows us one opportunity to capture the cost of complexity. The more complex a machine is, the more likely it is to break down into other simpler machines. Therefore, if complexity adds nothing to the payoffs, it will not endure in the population.

Another way I try to capture the costs of complexity is to model parts of the machines as having some cost. This can be applied to the evolutionary dynamic process in one of two ways which closely follow the notions of complexity described by Abreu/Rubinstein and Banks/Sundaram. To capture the idea that states are costly, I adjust the payoffs so the single state machines receive a premium in the play of the game. It turns out the size of the premium affects the result in an evolutionary framework. Also, I impose a cost for maintaining each of these stylized wires/transitions in the machines. Again, the payoff matrix is modified to reflect this idea.

The model of Abreu and Rubinstein does not lend itself to evolutionary simulations. Operationalizing the idea of lexicographic preferences Lexicographic preferences (lexicographical order based on the order of amount of each good) describe comparative preferences where an economic agent infinitely prefers one good (X) to another (Y).  is not straightforward in an evolutionary context. In the model I have in mind, the proportion of the population which plays a particular strategy at time t + 1 depends on how well that strategy does at time relative to the average member of the population. Lexicographic preferences do not allow such simple averaging of payoffs, but we can handle the idea that preferences depend on complexity by imposing costs on the metaplayers.

The evolutionary process I employ in the simulations has two parts. The first attempts to capture the idea that successful strategies or rules of thumb will tend to be copied. The second part reflects the idea that over time these strategies will not be perfectly duplicated, or will not breed true in biological terms. First I will describe the replication dynamic, and then I will describe how the strategies mutate.

To see how the replication dynamic process works, imagine a game in which there is a strategy space A with n strategies or automata, A = {[a.sub.1], [a.sub.2], ..., [a.sub.n]. The RPD is then played at time t, and each strategy earns a payoff depending on the strategy with which it is matched. These payoffs can be represented in a n X n matrix B with element [b.sub.ij] being the payoff to a player who uses machine [a.sub.i] if he is matched against a player who uses [a.sub.j]. Then we have [Mathematical Expression Omitted] Also, let p(t) = [([p.sub.1] (t), [p.sub.2] (t), ..., [p.sub.n] (t))].sup.T] be the vector of proportions of each type of player in the population at time t [Epsilon] {0, 1, 2, ...}. We can then define the expected payoff to a player of strategy [a.sub.i] at time t as [(Bp(t)).sub.i]. This is the ith element of the vector Bp(t). Also, the expected payoff for a member of the population at time t is [p(t).sup.T] Bp(t). The process we are considering has the proportion of strategy i at time t + 1 equal to its proportion at time t multiplied mul·ti·ply 1  
v. mul·ti·plied, mul·ti·ply·ing, mul·ti·plies

v.tr.
1. To increase the amount, number, or degree of.

2. Mathematics To perform multiplication on.
 by the ratio of its own expected payoff to the expected payoff of all players. Then we have the following dynamic process: [p.sub.i] (t + 1) = [[p.sub.i] (t)[[Bp(t)).sub.i]/[(p(t).sup.T] Bp(t))]. It is easy to see whether a strategy's proportion in the population gets larger or smaller depends on whether or not it is doing better or worse than average. The proportion of a strategy in the next generation depends not only on its own relative performance, but also on its proportion in the current population. This captures the idea that a strategy must be both successful and observed by other strategies to be copied.

The dynamic process described above is based on the idea that some machines or rules of thumb would be so unsuccessful they would probably not be used in the future, while the superior machines would be imitated. We can also think of the payoffs from playing the RPD as fitness in the biological sense. That is, strategies with higher payoffs are able to reproduce re·pro·duce
v.
1. To produce a counterpart, an image, or a copy of something.

2. To bring something to mind again.

3. To generate offspring by sexual or asexual means.
 (asexually a·sex·u·al  
adj.
1. Having no evident sex or sex organs; sexless.

2. Relating to, produced by, or involving reproduction that occurs without the union of male and female gametes, as in binary fission or budding.

3.
) more successfully than strategies with lower payoffs. In either case, we would expect to see the best strategies flourish and the worst strategies die out. This process allows us to evaluate how well a strategy will do when the less effective ones cease being important, and only the best strategies remain to play each other. The dynamic process I used simulates evolution with an infinitely large population and random matching. Essentially, when a player of a certain strategy type plays the game, he is playing against a mixed strategy which is represented by the different proportions in the population.

In order to capture the notion that strategies mutate in the evolutionary process, I used a matrix M to represent the rate at which strategies change into each other. A typical element [m.sub.ij] is the probability a strategy of type j mutates Mutates
Undergoes a spontaneous change in the make-up of genes or chromosomes.

Mentioned in: Antiretroviral Drugs
 into a type i player. Therefore, after the reproduction dynamic process described above has taken place, the strategies undergo a mutation process which is described by M. Then, we can find the proportions of the population, p*(t + 1), which play the infinite RPD in period t + 1 with the equation p*(t = 1) = Mp(t + 1) where p(t + 1) is the vector of proportions which are the result of the dynamic replication process described above.

Some very simple examples may help see what the mutation matrix does. For example, if we assume one strategy can change into any other strategy with equal probability, [Mu], then the matrix M has form [m.sub.ij] = [Mu] for i [is not equal to] j and [m.sub.ij = 1 - (n - 1) [Mu] if i = j and there are n total strategies. On the other hand, if no mutation takes place the above is true with [Mu] = 0. The only restrictions we need to place on this matrix are (1) [Mathematical Expression Omitted] and (2) [m.sub.ij] [Epsilon] [0, 1] [Mathematical Expression Omitted], j [Epsilon] {1, 2, ..., n}. These restrictions are completely reasonable and merely require an individual either stay the same or change. We do not allow one machine of type a to turn into some other number of type b machines.

With the stylized machines we can derive a mutation matrix. Rather than list the entire matrix, I will just show how one strategy mutates. Consider the strategy which implements the cooperative outcome in the Abreu and Rubinstein automaton selection game. A diagram diagram /di·a·gram/ (di´ah-gram) a graphic representation, in simplest form, of an object or concept, made up of lines and lacking pictorial elements.  of the stylized machine is depicted de·pict  
tr.v. de·pict·ed, de·pict·ing, de·picts
1. To represent in a picture or sculpture.

2. To represent in words; describe. See Synonyms at represent.
 at the top of Figure 5. Each of the possible mutations is indicated by an arrow beginning at the original machine. The machines are identified by two letters enclosed en·close   also in·close
tr.v. en·closed, en·clos·ing, en·clos·es
1. To surround on all sides; close in.

2. To fence in so as to prevent common use: enclosed the pasture.
 in a box. The 26 possible two-state machines are depicted in Figure 6. It is easy to see what happens as each of the imagined wires breaks. For example, if the wire which attempts to change actions from D to C (this represents the transition from the defecting state to the cooperating state in the diagram of the Moore machine) breaks, the machine will only be able to send the D signal. If, however, the wire from the receiver on the top of the diagram were to malfunction mal·func·tion
v.
1. To fail to function.

2. To function improperly.

n.
1. Failure to function.

2. Faulty or abnormal functioning.
, this machine would mutate into AC. If the signals got reversed, this CC machine would become cc. We need not go any further to see the CC machine can mutate into DD, AC, CA, CD, or cc if only one of the wires breaks. For simplicity I assume only one wire can break in any generation. We can think of this as approximating independent mutations where the mutation rates In genetics, the mutation rate is the chance of a mutation occurring in an organism or gene in each generation (or, in the case of multicellular organisms, cell division). See Luria-Delbrück experiment.  are so small the probability of two mutations in a given generation is negligible.

Finally, before actually discussing the results of the simulations, I will describe the twenty-six strategies I used. These strategies can be represented by all of the possible one and two state Moore machines. These simple strategies can be thought of as decision rules which are very easily implemented. It is not correct to say these machines are restricted to a memory of only one period. In a sense, the GRIM strategy has a memory of infinite length. However, the strategies I describe here are unable to allow for strategies which need to count to any number greater than one. It is interesting to note the most successful strategy in Axelrod's RPD tournament, TFT, is represented here. Nearly all the strategies which enjoyed some degree of success in the tournament used TFT as the main rule.(8) All of the strategies are described in Figure 6.

Eight of the possible twenty-six strategies form a Nash equilibrium when they play against a machine identical to themselves. These equilibrium strategies include all of the "nice" strategies except "always cooperate." Using the notation notation: see arithmetic and musical notation.


How a system of numbers, phrases, words or quantities is written or expressed. Positional notation is the location and value of digits in a numbering system, such as the decimal or binary system.
 here, these are ca, cb, cc, and cd. In addition, the utility maximizing equilibrium strategy from the Abreu and Rubinstein model,(9) CC, is an equilibrium strategy when it plays itself. The three other strategies which form equilibrium when they play themselves are the "always defect" strategy (DD), aa, and AC. The aa machine cooperates on the first move and then begins defecting forever. The AC machine begins by defecting and then switches to a cooperating state where it stays until the opponent defects. An opponent's defection will move this machine back to the initial state. Once in the initial state again, it repeats the same sequence of play by defecting once and then moving to the cooperating state until it detects another defection.

IV. Simulation Results

In this section I will briefly describe the results of the simulations I performed. I elaborate on the result only briefly in this section. I save most of the discussion for the next section where I analyze which strategies do well under the various mutation schemes and draw conclusions about what characteristics the successful strategies have. These will not be in strict agreement with the characteristics identified by Axelrod. There are some circumstances CIRCUMSTANCES, evidence. The particulars which accompany a fact.
     2. The facts proved are either possible or impossible, ordinary and probable, or extraordinary and improbable, recent or ancient; they may have happened near us, or afar off; they are public or
 in which "niceness"(10) appears to be a successful characteristic, but under other circumstances it may not be.

The simulations were accomplished on a Zenith zenith, in astronomy, the point in the sky directly overhead; more precisely, it is the point at which the celestial sphere is intersected by an upward extension of a plumb line from the observer's location.  248 computer with programs written in Turbo Pascal An early Pascal compiler for DOS from Borland used in a wide variety of applications from accounting to complex commercial products. Turbo Pascal for Windows provides an object-oriented programming environment for Windows development. . In the evolutionary dynamic simulations Dynamic Simulation is similar to a physics engine, the technology used in many powerful computer graphics software programs, like 3ds Max, Maya, Lightwave, and many others to simulate physical characteristics. , each machine was assigned as·sign  
tr.v. as·signed, as·sign·ing, as·signs
1. To set apart for a particular purpose; designate: assigned a day for the inspection.

2.
 a fitness based on how well it does against the other strategies and other machines like itself. The proportion in the next generation, before mutation, is the proportion in the current generation times the ratio of the individual strategy's fitness to the average fitness in the population. After evolutionary process takes place, a strategy changes into another with a probability determined by the appropriate mutation scheme.

Evolution without Mutation

I performed one hundred simulations of one thousand generations without any mutation as a control to determine which, if any, particular strategies are the likely results of a strictly ecological process. These simulations began from randomly selected starting points Noun 1. starting point - earliest limiting point
terminus a quo

commencement, get-go, offset, outset, showtime, starting time, beginning, start, kickoff, first - the time at which something is supposed to begin; "they got an early start"; "she knew from the
 in the unit simplex in R(26). One thousand generations of the evolutionary dynamic process described above were then simulated. These simulations were repeated one hundred times. The significance of the ecological simulations is that once a strategy nearly dies out, there are no trembles trembles

porcine congenital tremor syndrome.
 to increase its proportions again. This ecological simulation is very similar to what Axelrod did with the results of his RPD tournament.

I graphically describe the average of the proportions of the population for the final generation in each of the simulations in Figure 7. It is clear the cooperative outcomes proved the most evolutionarily fit in this environment. In fact, we can see the only strategies to survive this dynamic process are the "nice" ones, and all of them survived.(11) It is interesting that the strategy which is most exploitable in some sense, "always cooperate, survives in this ecological process. This is because it does well against those other strategies which do well. Specifically, this strategy does well against the other "nice" strategies, and those "nice" strategies drive the "nasty" strategies to near extinction extinction, in biology, disappearance of species of living organisms. Extinction occurs as a result of changed conditions to which the species is not suited.  quickly enough so the nicest and infinitely forgiving strategy, ALL C, can survive.

Evolution with Uniform Mutation

I then performed similar simulations using uniform mutation with the same one hundred random starting points I used in the previous simulation. I chose a mutation rate of 0.0001. I present the data in Figure 8 in the same format I used for the evolution without mutation. The same set of strategies that did well in the last simulation are successful in this simulation.(12) The two strategies which appear to do significantly differently are TFT and ALL C. TIT-FOR-TAT does not do as well in the presence of mutation, and ALL C performs better. The intuition for why TFT does not do as well seems relatively clear TFT is not as aggressive in exploiting poor strategies as GRIM. Hence, when these poor strategies appear, GRIM does relatively better than TFT. Hence, TFT's proportion of the population diminishes over time.

Note that in order for a strategy which is a large proportion of the population to keep from becoming smaller, it must do well relative to the others because of the mutation process. The larger the proportion of a given strategy there is in the population, the better that strategy must do to keep from shrinking in its proportion. If all strategies did equally well, this mutation scheme would equilibrate e·quil·i·brate  
v. e·quil·i·brat·ed, e·quil·i·brat·ing, e·quil·i·brates

v.intr.
To be in or bring about equilibrium.

v.tr.
To maintain in or bring into equilibrium.
 when all strategies are in equal proportion. However, the periodic introduction of poor strategies allows GRIM to do better than TFT. The improvement of ALL C seems to be the result of the mutation scheme. Although ALL C doesn't do very well against "nasty" strategies, there are not many of them around. Since ALL C is a small proportion of the population, it is a net gainer when the mutation process works. Additionally, since most of the population is nice it does not do very badly in terms of payoffs, so over time it grows in size until it attains the levels found in this simulation. We can think of strategies like GRIM as "enforcers" in this game. That is, GRIM keeps the proportions of the "nasty" strategies so ALL C can survive.

Evolution with Stylized Mutation

The work of Abreu and Rubinstein hypothesizes less complex strategies are better than more complex strategies. The notion I attempt to capture in this simulation is that more complex strategies will tend to break down more often than less complex strategies. Hence, unless the extra complexity results in increased stage game payoffs, the more complex strategies will tend to be replaced by less complex strategies through evolution. We can think of these "rules of thumb" mutating to simpler rules. If making a rule less complex does not reduce its payoffs in the stage games, we should expect the simpler rule to be copied more often. Again, which rules are successful will depend on the environment.

I simulated one hundred evolutionary processes of one thousand generations once again, and I started at the same random starting points as in the previous two simulations with the same mutation rate of 0.0001. I summarize sum·ma·rize  
intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es
To make a summary or make a summary of.



sum
 the results in Figure 9. Again, the GRIM strategy does much better than any of the others. The intuition is slightly different here. Not only does the GRIM strategy exploit irrational ir·ra·tion·al
adj.
Not rational; marked by a lack of accord with reason or sound judgment.


irrational adjective Unreasonable, illogical
 strategies, but it is one of the least complex two state machines in terms of the model I chose because it only needs to perceive one deviation DEVIATION, insurance, contracts. A voluntary departure, without necessity, or any reasonable cause, from the regular and usual course of the voyage insured.
     2.
 and perform one transition. Hence, it will increase in proportion as a result of mutation from other successful strategies as well as its own strategic fitness.

Mutants Who Enter as Groups (Uniform Mutation)

Next, I attempted to capture the possibility that strategies can grow in their own small communities and then attempt to invade the population all at once. The biological story which goes with this simulation has a certain group of animals which are separated from the rest of their species when the land they are on is cut off from the main land mass by water The animals live in isolation and develop their own characteristics. At some point, a land bridge forms, and this relatively large group of mutants enters the population. I attempt to capture the essence of this thought experiment by allowing one percent of the population to become a mutant type and attempt to invade the population. Then I allow the population dynamics Population dynamics is the study of marginal and long-term changes in the numbers, individual weights and age composition of individuals in one or several populations, and biological and environmental processes influencing those changes.  to settle down again. Once the mutant strategy's influence is absorbed by the population, another band of mutants enters. This process continues.

I performed this simulation one time for thirty thousand generations. I chose to use the mid-point of the unit simplex in R(26) as the initial distribution. That is, I began with every strategy as 1/26 of the population. Also, I chose to regard a particular strategy as being extinct once its proportion of the population falls below [10.sup.-5]. This allows us to more easily identify when the population stabilizes. In this exercise, I allow all of the strategies to enter with equal probability. That is, I assume uniform mutation. I show the final population in Figure 10. It is interesting to note that after the population stabilizes from the initial starting point, it changes very little. Specifically, after initial stabilization Stabilization

The action undertakes a country when it buys and sells its own currency to protect its exchange value.
Actions registered competitive traders undertake by on the NYSE to meet the exchange requirement that 75% of their traded be stabilizing, meaning that sell orders
 and before mutants attempt to invade the population the "nice" strategies are in these proportions: ca, 0.575; cb, 0.152; cc. 0.164; cd, 0.105; and dd, 0.005. Here again, the "nice" strategies do very well.

We cannot say these strategies are immune from invasion in this model because there is a very small probability the "always cooperate" strategy will be the invading mutant for a sufficient number of consecutive generations so the population can be successfully invaded by "always defect." Since this probability, although extremely small, is greater than zero, the event will take place if the process continues long enough. However, we can say this set of strategies is nearly invasion-proof in the above sense. This idea is akin to the concept formalized for·mal·ize  
tr.v. for·mal·ized, for·mal·iz·ing, for·mal·iz·es
1. To give a definite form or shape to.

2.
a. To make formal.

b.
 by Binmore and Samuelson [6] which they call a Payoff-equivalent, Polymorphous polymorphous /poly·mor·phous/ (-mor´fus) polymorphic.

polymorphous

polymorphic.
 Modified ESS (PPMESS) without the complexity criteria. The idea behind this concept is certain groups of strategies may exist which cannot be successfully invaded The individual proportions of each strategy will change as various type of potential invaders appear, but the population cannot be invaded.

It is somewhat surprising that the ALL C strategy survives in such a large proportion. The reason for this is clear. Although the ALL C strategy can be exploited, it does not exist in large enough proportions to allow a "nasty" strategy to invade. It is not difficult to see that if only the GRIM, ALL C, and ALL D strategies were possible, ALL D would not be able to invade a "nice" population until ALL C accounted for at least 2/3 of the population. However, this is extremely unlikely because as ALL D appears through mutation it keeps the proportion of ALL C low.

Evolution with Mutants Who Enter as Groups (Stylized Mutation)

In this simulation I attempted to capture the same elements as in the last simulation except I use the mutation scheme from our stylized strategy implementing machines. In this example the strategies which attempt to invade in a group must come from the set of mutants which are possible from one of the strategies in the population. The probability a strategy mutates in this simulation is its proportion of the population. Then any possible mutant from that strategy is equally likely. Here we require those who are "stranded strand 1  
n.
The land bordering a body of water; a beach.

v. strand·ed, strand·ing, strands

v.tr.
1. To drive or run ashore or aground.

2.
 on the island" and then become the invading mutants be reasonably similar to the population from which they came. This simulation was performed like the previous one. The final population is described in Figure 11.

The success of GRIM and ALL C is evident here. These two strategies, which appeared successful in the previous simulations, are the only two survivors of this process. The intuition here again relates to their relative simplicity. Here we seem to have captured the idea that simple strategies will be more evolutionary fit. Although GRIM is more complex than ALL C, the fact that ALL D enters periodically as a mutant invader keeps ALL C in small enough proportions so ALL D cannot successfully invade the population. The same sort of symbiotic relationship symbiotic relationship (sim´bīot´ik),
n in implantology, that relationship assumed by an implant and the natural teeth to which it has been splinted.
 described by Binmore and Samuelson is present again. I will discuss the reason why the "stable" group of strategies in this model are different than those in Binmore and Samuelson's model in the next section.

Evolution with Costly States

In this subsection subsection
Noun

any of the smaller parts into which a section may be divided

Noun 1. subsection - a section of a section; a part of a part; i.e.
 the automation represent costly rationality in addition to bounded rationality. We can think of this as increased strategic complexity requiring more time and resources. Rather than deal with different types of mutation schemes, I will allow mutation in these simulations from our stylized model of the strategies with a mutation rate of 0.0001. Here I want to focus primarily on the effects of costly complexity on the evolutionary dynamic process.

In order to capture the idea that states are costly, I supplement the payoffs of the one state machines (ALL C and ALL D) by 0.05 and 0.1. I simulated five thousand generations of this dynamic process which began from the center of the unit simplex in R(26). I present the results in Figures 12 and 13. These diagrams show how the population changes over time. I have chosen to represent only the GRIM, ALL C, and ALL D strategies because their dynamic behavior accounts for all of the interesting phenomena.

Again, the GRIM and ALL C strategies play an important role in the evolutionary process. When having states is less costly, we can see the ALL D strategy cannot successfully invade the population on a permanent basis. However, when ALL C gets very large, which will occur through both mutation and the supplemented payoffs, the ALL D strategy nearly invades but then dies out before it can complete the invasion. However, if the cost of maintaining a state is sufficiently large In mathematics, the phrase sufficiently large is used in contexts such as:
is true for sufficiently large
, ALL D can successfully invade. Since it is one of the simples machines, there will not be any mutation which can threaten its existence. Note, however, this result is dependent on specific values for complexity costs.

We can see in Figures 12 and 13 that if states are sufficiently costly, ALL D will be the evolutionary outcome. The population will initially tend toward

being all "nice." Since the "nasty" strategies are in very small proportions in the population, the ALL C strategy does very well because of the higher payoff. When it becomes a sufficiently large population, it is invaded by ALL D. If the supplemental payoff is not great enough, ALL D will enjoy a brief period of success, but will again be displaced displaced

see displacement.
 by "nice" strategies, especially GRIM, and the cycle starts again.

Evolution with Costly Transitions

In these simulations I attempt to capture the idea that maintaining transitions is costly. That is, any time a decision must be made by the person implementing these rules, there is an associated cost. The major difference in this simulation relative to the previous ones is that in these simulations GRIM is more successful than any of the other "nice" two state machines. The reason is that GRIM is less complex than some of other strategies. That is, it is an easier rule to apply than rules like TFT, but clearly it is more complex than ALL C. Again, GRIM is able to take advantage of any of the poor strategies and reap the benefits of cooperation.

The evolutionary results are depicted in Figures 14 and 15. Here I use penalties for complexity of 0.01 and 0.1 per "wire" in the stylized machines. We can see that as long as the penalty is not too great, the population will tend to cycle. It begins by evolving to nearly all GRIM. However, since ALL C earns a higher payoff because it is less complex, it eventually becomes a very large portion of the population. When the proportion of ALL C in the population is large enough, ALL D's proportion grows rapidly. Then, unless the penalty is too large, GRIM emerges as the dominant strategy, and the cycle begins once more. If the penalty is large enough, though, ALL D will eventually dominate the population.

V. Discussion

These simulations suggest a number of things. The most conspicuous con·spic·u·ous  
adj.
1. Easy to notice; obvious.

2. Attracting attention, as by being unusual or remarkable; noticeable. See Synonyms at noticeable.
 result is the evolutionary success of the GRIM strategy. This is surprising since it has not appeared as the result of any previous evolutionary simulation I am aware of. The argument against the GRIM strategy being successful is that it is not forgiving. This is not a problem in these simulations because two-state Moore machines cannot probe for weakness in the strategies periodically. However, there is something to be learned from this result. One possible reason for GRIM's success it that it can exploit poor strategies. It is not difficult to see that TFT can never do strictly better than its opponent because it merely mirrors its opponent. GRIM can exploit irrational play better than TFT. When GRIM and TFT play the other strategies, GRIM is usually better at taking advantage of bad play. Of the twenty-four other strategies, GRIM earns more than TFT against fifteen of them. On the other hand TFT does better than GRIM against only two of them.

The results of these simulations fall far short of suggesting GRIM is evolutionarily superior to all other strategies in the RPD, but they do show a profound weakness in the TFT strategy. A strategy which just copies its opponent cannot take advantage of poor play which will be present through such processes as mutation. These simulations do, however, suggest the most successful strategies will be those able to take advantage of irrational play in an evolutionary situation where mutants enter over time.

It is obvious the GRIM strategy doesn't exploit all irrational play because the ALL C strategy survived in positive proportions in all of the simulations without complexity costs. This is because the GRIM strategy cannot identify the ALL C strategy when they play each other. Binmore and Samuelson [6] suggested the possibility irrational strategies and rational strategies could exist together as the consequence of an evolutionary dynamic process. In these simulations there may be, in the long run, a substantial number of irrational players. However, the fact that they look exactly like GRIM when playing it keeps them from being exploited. Also, any strategy which attempts to exploit these irrational strategies is forced to take the noncooperative payoff in the game most of the time. Therefore, the "nice" strategy GRIM protects, in a sense, the irrational players. I refer to it as na "enforcer" strategy which keeps the "nasty" strategies from invading the population. If for some reason the number of ALL C players were to increase, ALL D could invade the population. In the simulations I performed that was an extremely unlikely event. Generally, whenever the proportion of ALL C grew relative to other strategies, the ALL D and other "nasty" strategies would force it to become smaller. However, because nearly all of the strategies in the population were "nice," ALL C survived in small proportions while the "nasty" strategies earned very small payoffs relative to the cooperative strategies and, hence, died off rapidly.

The "nice" strategies in this simulation form something akin to Binmore and Samuelson's Payoff-equivalent Polymorphous MESS (PPMESS). However, it is clear they are not a PPMESS in the sense of Binmore and Samuelson. The reason for this gets to the heart of the differences between the Abreu/Rubinstein and Binmore/Samuelson models and the evolutionary processes I simulated here.

The simulations I performed have a number of different strategies entering over time either simultaneously or as a single strategy type in a group. In other words, my simulations have the populations being repeatedly perturbed per·turb  
tr.v. per·turbed, per·turb·ing, per·turbs
1. To disturb greatly; make uneasy or anxious.

2. To throw into great confusion.

3.
. However, most studies in this area analyze the stability of populations against a single perturbation perturbation (pŭr'tərbā`shən), in astronomy and physics, small force or other influence that modifies the otherwise simple motion of some object. The term is also used for the effect produced by the perturbation, e.g. . Here, though, we look at how population mixtures fare while being repeatedly attacked by possible mutant strategies.

A discussion of Binmore and Samuelson's results will help clarify the issue. They suggest a mixture of the cc, cd, CC, and CA machines form a PPMESS. That is, an appropriate mixture of these strategies cannot be successfully invaded by a mutant strategy. This is true because they earn equal payoffs against each other and have equal complexity. There is a symbiotic relationship among these strategies which protects them from invasion. However, this is true only if we consider these perturbations as one time events Certainly this mix of strategies is immune to a small proportion of invading mutants. Although the exact mix of strategies may change after a mutant invasion is thwarted thwart  
tr.v. thwart·ed, thwart·ing, thwarts
1. To prevent the occurrence, realization, or attainment of: They thwarted her plans.

2.
, the same set of strategies will survive. However, in repeated simulations I found these machines are not immune to invasion when mutants repeatedly attempt to enter the population because the GRIM strategy will be able to invade after some number of unsuccessful tries. When a GRIM machines plays the Abreu/Rubinstein utility maximizing equilibrium machine (UTIL UTIL Utility
UTIL Utilities
UTIL Utilization
) it earns higher payoffs against UTIL than UTIL earns against it. Hence, we have a situation where cc, cd, and CA attempt to invade the population over time through mutation and are successful because they earn the cooperative payoff when they play themselves and UTIL. However, after these strategies enter the population, GRIM can successfully invade if it appears often enough. Each time GRIm attempts to invade, the proportion of UTIl remaining decreases relative to cc and cd (both "nice" strategies) because GRIM earns the cooperative payoff against them and does better against UTIL than UTIL does against GRIM. Also, GRIM earns the maximum possible payoff against CA. Therefore, after enough attempts GRIM will successfully invade the population.

The ideas of perfect equilibria and trembles are important to these simulations. Reinhard Selten Reinhard Selten (October 5, 1930) is a German economist.

Selten was born in Breslau (Wrocław) in Lower Silesia, now in Poland. For his work in game theory, Selten won the 1994 Nobel Memorial Prize in Economics (shared with John Harsanyi and John Nash).
 [20] and others have used the idea of trembles in the play of a game to select among multiple equilibria. He also applies this idea to evolutionary stability to expand the set of evolutionarily viable strategies. However, applying the idea of trembles to the play of the automation selection game is not straightforward. We cannot allow a machine to "misplay mis·play  
n. Sports & Games
A mistaken or unskillful play.

tr.v. mis·played, mis·play·ing, mis·plays
To make a misplay of.

Noun 1.
" during an infinite RPD because there appears to be no sensible way to define such a tremble. If a machine were to change strategies with positive during the game, the change in payoffs could be discontinuous discontinuous /dis·con·tin·u·ous/ (dis?kon-tin´u-us)
1. interrupted; intermittent; marked by breaks.

2. discrete; separate.

3. lacking logical order or coherence.
 because we calculate the payoffs as the limit of the mean. To see this, suppose two GRIM strategies were playing the RPD, and with some arbitrarily small positive probability one changes to ALL D during the game. We know the change will occur in finite time. Hence, the payoffs will switch to the defecting payoff for both machines. Such a discontinuity dis·con·ti·nu·i·ty  
n. pl. dis·con·ti·nu·i·ties
1. Lack of continuity, logical sequence, or cohesion.

2. A break or gap.

3. Geology A surface at which seismic wave velocities change.
 makes this sort of tremble unusable. Instead, I consider trembles in the evolutionary process in these simulations. That is, what happens if the strategies do not breed true. Another way of looking at the problem is to imagine what happens if there are perturbations to the payoff matrix.

These trembles point out another difference between these simulations and the work by Abreu/Rubinstein and Binmore/Samuelson. In this analysis, there is no such thing as an unused state. That is, because mutant strategies can appear over time, the idea of unused states means nothing here. This goes back to the fact that in other evolutionary models only one strategy or a mix of strategies enters at a time. Or, perhaps more precisely, we only analyze how a strategy or mix of strategies does against a potential mix of invaders. Without the aid of computer simulations it is very difficult to capture the interactions between strategies as well as the impact of a certain mutation scheme.

It is also interesting to note that if the penalty for complexity is sufficiently small, the "nice" strategies will remain immune to successful invasion. It is not until the cost of complexity gets sufficiently large that it affects the qualitative results of the evolutionary simulation. We can think of complexity as being the cost of monitoring your opponent and responding accordingly. If the cost of monitoring actions increases enough, the defecting outcome will prevail.

Finally, I will comment on the obvious success of the GRIM strategy in these simulations. We must keep in mind we considered only two-state machines. This is a severe limitations on the complexity we allow. In fact, a strategy essentially identical to GRIM was submitted to Axelrod's RPD tournament and finished fifty-second out of sixty-three strategies. However, the success of GRIM in this tournament captures something which has frequently been overlooked in this sort of model. For a strategy to survive over time, it should be able to exploit possible bad strategies. The idea that TFT should be in some sense the "best" strategy in an infinite RPD has the significant failing that TFT doesn't do strictly better than any strategy it meets. It seems reasonable to expect a strategy which succeeds over time should be able to take advantage of bad players who appear.

VI. Summary

This essay reports the results of a number of computer simulations of infinite RPDs. The results are somewhat surprising because the GRIM strategy is clearly the most successful of the cooperative strategies. I attribute this to the fact that GRIM is able to both exploit poor strategies and earn the cooperative payoff against other "nice" strategies. The results do not suggest GRIM would be the "best" strategy in all environments.

This work supports various results of Binmore and Samuelson's recent work. That is, the idea of a PPMESS seems valid in evolutionary models. The differences in our results stem from the fact that the strategies is my simulations are subject to repeated perturbations, and they analyze a game which is more tranquil TRANQUIL - 1966. ALGOL-like language with sets and other extensions, for the Illiac IV. "TRANQUIL: A Language for an Array Processing Computer", N.E. Abel et al, Proc SJCC 34 (1969).  since they look at what strategies can invade a population if the intruders appear only once either one at a time or in certain mixes. The reason for performing these simulations is to capture what happens when many things are happening at once. Analyzing these problems analytically is intractable intractable /in·trac·ta·ble/ (in-trak´tah-b'l) resistant to cure, relief, or control.

in·trac·ta·ble
adj.
1. Difficult to manage or govern; stubborn.

2.
 because of the complex interrelationships among the strategies and the dynamic process. We are able to see how all of these forces affect the final population under these specific conditions through the use of computer simulations. (1)Robert Aumann [2] provides a proof and further discussion. (2)Maynard Smith [14] offers an excellent discussion of this point. (3)The idea is similar to Selten's concept of perfect equilibrium. For an excellent discussion, see Samuelson [19]. (4)The GRIM strategy begins by cooperating, but if either player defects it defects forever. (5)By this they mean the sum of the payoff is maximized. (6)A payoff dominant equilibrium outcome is one in which both players obtain a higher payoff than they would get in any other equilibrium outcome. Obviously these do not always exist. (7)Specifically, he considered error rates of 1 percent and 5 percent. (8)Linster [13] discusses Axelrod's prisoners' dilemma tournament with alternate payoff schemes. (9)I follow Binmore and Samuelson [6] and call this "UTIL." (10)Axelrod [3] referred to strategies which are not the first to defect as "nice." (11)By "surviving" I mean they were represented in the population by a proportion greater than [10.sup.10]. (12)Here I refer to those strategies which survived in a proportion at least three times the mutation rates as doing well.

References

[1]Abreu, Dilip and Ariel Rubinstein, "The Structure of Nash Equilibrium in Repeated Games with Finite Automata." Econometrica, November 1988, 1259-81. [2]Aumann, Robert J., "Survey of Repeated Games," in Essays in Game Theory and Mathematical Economics Mathematical economics refers to the application of mathematical methods to represent economic theory or analyze problems posed in economics. Expositors maintain that it allows formulation and derivation of key relationships in the theory with clarity, generality, rigor, and  in Honor of Oskar Morgenstern Oskar Morgenstern (January 24, 1902 - July 26, 1977) was a German born Austrian economist who, working with John von Neumann, helped found the mathematical field of game theory (see Neumann-Morgenstern utility).

Morgenstern was born in Görlitz, Germany.
, symposium symposium

In ancient Greece, an aristocratic banquet at which men met to discuss philosophical and political issues and recite poetry. It began as a warrior feast. Rooms were designed specifically for the proceedings.
 volume. Mannheim: Bibliographisches Institut The German publishing company "Bibliographisches Institut" was founded 1826 in Gotha by Joseph Meyer, moved 1828 to Hildburghausen and 1874 to Leipzig. Its production over the years includes such well-known titles as "Meyers Lexikon" , 1981, pp. 11-42. [3]Axelrod, Robert. The Evolution of Cooperation. New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
: Basic Books, 1984. [4]Banks, Jeffrey and Rangarajan Sundaram, "Repeated Games, Finite Automata, and Complexity." Games and Economic Behavior Games and Economic Behavior (GEB) is a journal of game theory published by Elsevier.[1] First published in 1989, it is considered to be the leading journal of game theory and one of the top journals in economics.  June 1990, 97-117. [5]Binmore, Kenneth G. and Larry Samuelson. "Notes on Evolutionary Stable Strategies in Repeated Games Played Games played (most often abbreviated as G or GP) is a statistic used in team sports to indicate the total number of games in which a player has participated (in any capacity); the statistic is generally applied irrespective of whatever portion of the game is contested.  by Finite Automata." Unpublished manuscript, The University of Michigan (body, education) University of Michigan - A large cosmopolitan university in the Midwest USA. Over 50000 students are enrolled at the University of Michigan's three campuses. The students come from 50 states and over 100 foreign countries. , 1989. [6]-- and --. "Evolutionary Stability in Repeated Games Played by Finite Automata." Unpublished manuscript, The University of Michigan, 1990. [7]Boyd, Robert and Jeffrey Lorberbaum, "No Pure Strategy is Evolutionary Stable in the Repeated Prisoner's Dilemma Game." Nature. May 1987, 58-59. [8]Farrell, Joseph and Roger Ware, "Evolutionary Stability in the Repeated Prisoner's Dilemma." Theoretical Population Biology Population biology is a study of biological populations of organisms, especially in terms of biodiversity, evolution, and environmental biology.

Malthus can almost be considered an early population biologist, even though his training was in economics and the term population
. 1989, 61-66. [9]Fudenberg, Drew and Eric Maskin. "Evolution and Cooperation in Noisy Noisy is the name or part of the name of six communes of France:
  • Noisy-le-Grand in the Seine-Saint-Denis département
  • Noisy-le-Roi in the Yvelines département
  • Noisy-le-Sec in the Seine-Saint-Denis département
 Repeated Games." Harvard Institute for Economic Research Discussion Paper 1469, Harvard University Harvard University, mainly at Cambridge, Mass., including Harvard College, the oldest American college. Harvard College


Harvard College, originally for men, was founded in 1636 with a grant from the General Court of the Massachusetts Bay Colony.
, 1990. [10]Harsanyi, John. Rational Behavior and Bargaining Equilibrium in Games and Social Situations. Cambridge: Cambridge University Press Cambridge University Press (known colloquially as CUP) is a publisher given a Royal Charter by Henry VIII in 1534, and one of the two privileged presses (the other being Oxford University Press). , 1977. [11]-- and Reinhard Selten. A General Theory of Equilibrium Selection Equilibrium selection is a concept from game theory which seeks to address reasons for players of a game to select a certain equilibrium over another. The concept is especially relevant in evolutionary game theory, where the different methods of equilibrium selection respond to  in Games. Cambridge, Mass.: MIT MIT - Massachusetts Institute of Technology  Press, 1989. [12]Kim, Yong-Gwan, "Evolutionary Stable Strategies in the Repeated Prisoner's Dilemma." Unpublished manuscript, The University of California, San Diego UCSD is consistently ranked among the top ten public universities for undergraduate education in the United States by U.S. News & World Report.[3] It is a Public Ivy. [1] For graduate studies, most of UCSD's Ph.D. , 1989. [13]Linster, Bruce G. "Essays on Cooperation and Competition." Ph.D. Dissertation dis·ser·ta·tion  
n.
A lengthy, formal treatise, especially one written by a candidate for the doctoral degree at a university; a thesis.


dissertation
Noun

1.
, The University of Michigan, 1990. [14]Maynard Smith, John. Evolution and the Theory of Games. Cambridge: Cambridge University Press. 1982. [15]--and George R. Price George R. Price (1922 – January 6, 1975) was an American population geneticist. Originally a physical chemist and later a science journalist, he moved to London in 1967, where he worked in theoretical biology at the Galton Laboratory, making three important contributions: , "The Logic of Animal Conflict." Nature, November 1973, 15-18. [16]Miller, John. "The Evolution of Automata in the Repeated Prisoner's Dilemma." Unpublished manuscript. The University of Michigan, 1987. [17]Neyman, Abraham, "Bounded Complexity Justifies Cooperation in the Finitely Repeated Prisoner's Dilemma." Economic Letters, 1985, 227-29. [18]Radner, Roy. "Can Bounded Rationality Resolve the Prisoner's Dilemma?", in Essays in Honor of General Debreu. New York: North-Holland, 1986. [19]Robson, Arthur J. "Efficiency in Evolutionary Games: Darwin, Nash, and the Secret Handshake." University of Western Ontario Western is one of Canada's leading universities, ranked #1 in the Globe and Mail University Report Card 2005 for overall quality of education.[2] It ranked #3 among medical-doctoral level universities according to Maclean's Magazine 2005 University Rankings.  Department of Economics Working Paper, University of Western Ontario, 1989. [20]Selten, Reinhard, "Evolutionary Stability in Extensive Two-Person Games." Mathematical Social Sciences, 1983, 269-363.
COPYRIGHT 1992 Southern Economic Association
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1992, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Linster, Bruce G.
Publication:Southern Economic Journal
Date:Apr 1, 1992
Words:10655
Previous Article:The eleven principles of economics. (Kenneth G. Elzinga's presidential address evaluating Paul Samuelson's book entitled Economics) (Transcript)
Next Article:On the privatization of excludable public goods.
Topics:



Related Articles
Beyond chaos: ultimate unpredictability.
Cooperation evolves in computer tourney.
Recent Developments in Game Theory.
Return of the group.
Game Theory and the Social Contract, Vol. 1, Playing Fair.
Evolutionary Game Theory.
Coordination.(Teaching Economics with Classroom Experiments: A Symposium)(coordination problems in economic games)
Quantum Games.(game theory applied to quantum mechanics)
The evolution of fuzzy rules as strategies in two-player games.
Generous players: game theory explores the Golden Rule's place in biology.(prisoner's dilemma)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles