# Can subgame perfect equilibrium threats foster cooperation? An experimental test of finite-horizon folk theorems.

I. INTRODUCTIONIn this paper, we study the effect of equilibrium punishment threats on cooperation in a finitely repeated prisoners' dilemma (PD) game. To this end, we extend the standard PD stage game with the strategies "cooperate" and "defect" with an additional strategy as in the study by Schwartz, Young, and Zvinakis (2000) and Feinberg and Snyder (2002). Mutual play of this strategy constitutes a second payoff dominated equilibrium in the stage game. Given this extension and an adequate choice of payoff parameters we prove a folk theorem in the spirit of Benoit and Krishna (1985), according to which cooperative subgame perfect outcomes are possible despite the finite horizon of interaction. Without the extension, in contrast, backward induction predicts universal defection to be the unique subgame perfect equilibrium in the finitely repeated game. Given these theoretical arguments, we expect cooperation rates in the extended game to be higher than in the standard game.

In addition to introducing equilibrium punishment, we also vary its strategic stability. Depending on whether the additional strategy is weakly dominated or undominated, the additional stage game equilibrium is either weak or strict. While the folk theorem predicts higher cooperation rates for both extended games, a refinement concept (called strictly perfect or proper equilibrium) predicts higher cooperation rates only for the strict case. We test these competing theoretical predictions experimentally to answer the following research question. In a finitely repeated PD game, is it sufficient to introduce an equilibrium punishment

threat to increase cooperation or is it necessary for the punishment threat to be strictly self-enforcing?

Subjects in our experiment play the supergames repeatedly, similar to Selten and Stoecker (1986) and Bereby-Meyer and Roth (2006). In a between-subjects design we explore six distinct treatments. Treatments differ in the type of stage game (either standard PD, or weak game or strict game) and the time horizon of interaction with the same partner (either long or short). In the long horizon treatments, subjects

ABBREVIATIONS

EPD: Extended Prisoners' Dilemma

ORSEE: Online Recruitment System for Economic

Experiments

PD: Prisoners' Dilemma

doi: 10.1111/j.1465-7295.2011.00421.x

VERA ANGELOVA, LISA V. BRUTTEL, WERNER GUTH and ULRICH KAMECKE *

* We gratefully acknowledge the helpful comments of two anonymous referees.

Angelova: Research Fellow, Max Planck Institute of Economics, Strategic Interaction Group, Kahlaische Str. 10, 07745 Jena, Germany. Phone +49 (0)3641 686 637, Fax +49 (0)3641 686 667, E-mail popova@econ.mpg.de

Bruneh Assistant Professor for Behavioral Economics, Department of Economics, University of Konstanz, Box 131, 78457 Konstanz, Germany. Phone +49 (0)7531 88 3214, Fax +49(0)7531 88 2145, E-mail lisa.bruttel@unikonstanz.de

Guth: Director of the Strategic Interaction Group, Max Planck Institute of Economics, Kahlaische Str. 10, 07745 Jena, Germany. Phone +49 (0)3641 686 620, Fax +49 (0)3641 686 667, E-mail gueth@econ.mpg.de

Kamecke: Professor for Competition Policy, Department of Business and Economics, Humboldt-University Berlin, Spandauer Str. 1, 10099 Berlin. Phone +49 (0)30 2093 5895, Fax +49 (0)30 2093 5787, E-mail kamecke@ wiwi.hu-berlin.de

are rematched once after 16 rounds of play so that the incentives to cooperate are large. In the short horizon treatments, subjects play eight supergames with four rounds each. The horizon variation serves as a robustness check.

Our results support the prediction of the proper equilibrium. The strict extension generates a significant increase in cooperation rates, while behavior in the weak extension is indistinguishable from behavior in the standard PD game. These results are stable across horizons.

Our paper relates to the literature on the effect of punishment on cooperation. Generally, voluntary cooperation can be enhanced by nonequilibrium punishment or equilibrium punishment. In both cases punishment is costly both to the punishing and punished party. In studies with nonequilibrium punishment as described by Fehr and Gachter (2000) or Ostrom, Walker, and Gardner (1992), parties can impose damages on each other after interacting, but they would never do so if they were rational selfish agents. Nevertheless, these damages seem ineffective in increasing cooperation as long as they hurt the punisher less than the punished. (1) With equilibrium punishment, parties do not have to cause damages to each other: the mere presence of the punishment possibility (i.e., the threat) suffices to stabilize cooperation. The difference between the two types of punishment is that equilibrium punishment can increase cooperation without dropping the assumption of common opportunism (i.e., that individuals care only for monetary payoffs), while for explaining nonequilibrium punishment one has to give up this assumption. We focus on equilibrium punishment. Here, the question is whether punishment is a best reply to others' behavior and whether it suffices to discourage deviation from equilibrium play.

The remainder of the paper is organized as follows. In Section II, we describe the stage games and provide theoretical and behavioral predictions about behavior in the repeated games. The experimental protocol is described in Section III. Section IV presents the main findings and Section V concludes.

II. MODEL ANALYSIS

A. The Stage Games

Let i = 1, 2 denote the players in the one-shot game. In the baseline treatment PD each player has two actions, C ("cooperate") and D ("defect"). In the two other treatments, they additionally have action A ("avoid"). (2) A pair of actions is denoted by a = ([a.sub.1], [a.sub.2]), where the action of player 1 is listed first and that of player 2 second. We distinguish three symmetric payoff matrices of the one-shot interaction: a standard PD game, an extended PD game with a strict additional equilibrium ([EPD.sub.s]), and an extended PD game with a weak additional equilibrium ([EPD.sub.w]).

PD i = 2 i=1 C D C 18.18 0.21 D 21.0 9.9 [EPD.sub.s] i = 2 i=1 C D A C 18.18 0.21 0.0 D 21.0 9.9 0.0 A 0.0 0.0 3.3 and [EPD.sub.w] i = 2 i = 1 C D A C 18.18 0.21 3.3 D 21.0 9.9 3.3 A 3.3 3.3 3.3

The only pure strategy (3) equilibria of the extended stage games are (D, D) and (A, A). (4) However, (A, A) is an equilibrium in weakly dominated actions in [EPD.sub.w], whereas it is strict and therefore in undominated actions in [EPD.sub.s]. (5)

Correspondingly, we experimentally distinguish between W-treatments, where subjects play [EPD.sub.w], and S-treatments, where they play [EPD.sub.s] repeatedly with the same partner. S-treatments feature situations, where the alternative payoff of 3 can only be obtained by both players coordinating on "avoid." W-treatments capture situations, where 3 is the conflict payoff resulting when at least one party uses "avoid," that is, where the "avoid" outcome does not require coordination. A common feature of both repeated games is that the threat to continue with the payoff (3, 3) instead of (9, 9) can discourage myopically profitable deviations from mutual cooperation.

B. Subgame Perfect Equilibrium Outcomes in the Repeated Games

Let T [greater than or equal to] 2 denote the number of rounds of repeated play of either [EPD.sub.w] or [EPD.sub.s]. In each game, we observe histories [h.sub.t] [member of] [H.sub.t] up to round t (a vector of length 2 x T which, assuming appropriate information feedback between rounds, specifies all previous actions of the two players). A behavioral strategy profile a : H [right arrow] {C, D, A} x {C, D, A} specifies actions a ([h.sub.t]) = ([a.sub.1], [a.sub.2]) ([h.sub.t]) for all histories ht of all rounds t.

Constant play of actions (A, A) or (D, D) is obviously a subgame perfect equilibrium outcome of both repeated games, [EPD.sub.s] and [EPD.sub.w]. Another subgame perfect equilibrium is the (grim) strategy constellation ([a.sup.grim], [a.sup.grim]) for T-supergames with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

That is, players cooperate in all rounds except for the last one and defect in the last round as long as both cooperate. Otherwise they switch to A and keep it up until the end of the game. The proof is straightforward. After playing (C, C) in all rounds [tau] < T, it does not pay to deviate unilaterally from ([a.sup.grim], [a.sup.grim]) in the last round t = T as (D, D) is a strict equilibrium of the one-shot interaction. Similarly, deviating from ([a.sup.grim], [a.sup.grim]) in round t < T after some violation of "(C, C) for all "t < T" does not pay as constant play of (A, A) is a subgame perfect equilibrium of the supergame. Deviating unilaterally from ([a.sup.grim], [a.sup.grim]) in round t after "(C, C) for all [tau] < t" does not pay because the highest amount a player can gain from such a deviation is an additional payoff of 3 which will result in a periodic payoff of 3 rather than 18 or 9 in all later rounds. Even in case of t = T - 1, the additional gain of 3 in round t = T - 1 would cost 6 in round T. (6) Thus, ([a.sup.grim], [a.sup.grim]) is a subgame perfect equilibrium of the T [greater than or equal to] 2 supergame with stage game [EPD.sub.w] and [EPD.sub.s], respectively.

In the first section of Appendix 1, we show that the same argument justifies a large number of equilibrium outcomes. In rounds preceding outcome (D, D), the equilibrium payoffs are only restricted by two conditions: feasibility and individual rationality of payoffs (i.e., the number of rounds still to be played r (r = T - t) must guarantee the maximin payoffs of r. (3, 3)). This condition imposes a restriction only on the occurrence of outcomes (0, 0), (21,0), and (0, 21), so that we can conclude:

Folk Theorem-Like Result: For T [right arrow] [infinity] the set of average payoffs in a subgame perfect equilibrium of the finite supergames with T (< [infinity]) commonly known rounds of play and stage game [EPD.sub.w] or [EPD.sub.s] converges to a dense set of individually rational attainable average payoffs:

(2)

[omega] = {([[pi].sub.1], [[pi].sub.2]) | ([[pi].sub.1], [[pi].sub.2]) [greater than or equal to] (3, 3) and ([[pi].sub.1], [[pi].sub.2])

[member of] conv ((3, 3); (21,0); (18, 18); (0, 21))}

In particular, the subgame perfect equilibrium strategies ([a.sup.grim], [a.sup.grim]) predict that both players will "cooperate" in all rounds except in the last round when they both "defect" so that the average periodic payoffs converge to (18, 18) as T [right arrow] [infinity].

C. Refining Rationality

To equilibrate cooperation, players have to condition their equilibrium continuation on the history of the game, that is, they must threaten to continue the game with (A, A) instead of (D, D) after some "unwarranted" experience. Such reactions, used to prove the folk theorem, are eliminated by a simple dominance argument in the last round of [EPD.sub.w] but not [EPD.sub.s]. Using this dominance argument iteratively eliminates all but the defective strategies in [EPD.sub.w], because all strategies employing C or A after some history [h.sub.t] are dominated by a strategy using [a.sub.i] ([h.sub.t]) = D when all strategies satisfy [a.sub.i] ([h.sub.[[tau]]) = D for all [tau] > t.

However, the result of iterated elimination of dominated strategies may depend on the order in which the strategies are eliminated. (7) We can, however, use a backward elimination argument to prove two alternative results for [EPD.sub.w] which do not hold for [EPD.sub.s]. In the second section of Appendix 1, we show that "always defect" is the unique strictly perfect and the unique proper equilibrium (8) of [EPD.sub.w]. These game theoretic arguments suggest less cooperation in [EPD.sub.w] than in [EPD.sub.s].

Strong Rationality Refinement: The set of strictly perfect equilibrium outcomes as well as the set of proper equilibrium outcomes satisfy the "folk theorem-like result" above in a repeated [EPD.sub.s], while they both contain only defective play [a.sub.i] ([h.sub.t]) = D for all [h.sub.t] [member of] [H.sub.t] in a repeated [EPD.sub.w].

D. Behavioral Predictions

Theoretically we established two results. According to the folk theorem, both extended games will trigger higher cooperation than the standard PD game. According to the strong rationality refinement, only the strict game will increase cooperation rates. In this section we provide an intuitive explanation for the latter result. After that we discuss our behavioral expectation about the usage of the additional action A.

Exploiting the other by playing D can have more serious consequences for the exploiter in the strict than in the weak game. If the other chooses to punish the exploiter by playing A, the punishment in the weak game leads to a minimal payoff of 3, regardless of whether both manage to coordinate on mutual play of A or not. In the strict game, however, obtaining the payoff of 3 requires coordination. Unilateral use of A leads to a payoff of 0 to both. Hence, provoking a punishment and failing to coordinate in the strict game is more costly than in the weak game for the defector but also for the punisher. If subjects realize the former but not the latter, defections will be less frequent in the strict game. If subjects realize the latter, defections will be less frequent in the weak game. Which reasoning will prevail? Most probably the former because the latter requires higher levels of deliberation (Nagel 1995).

Moreover, in the strict game it appears to be much harder to coordinate on mutual cooperation once the partner plays A to punish defection. This is because choosing C unilaterally while the other is still playing A is more costly in the strict than in the weak game. And this is an implication of the fact that A is self-enforcing in the strict but not in the weak game. Anticipating these extra costs, subjects in the strict game are more likely to abstain from defecting in the first place.

Theoretically, A is expected to be used only off the equilibrium path. Therefore, we expect subjects to choose A rarely. Without actually being able to observe A, we can still measure its effectiveness as the difference in cooperation rates between each extended game and the standard PD. In the rare cases, in which we expect to see A, we can only speculate that its purpose will be to punish defection. It is worth exploring the reaction to such sanction.

III. EXPERIMENTAL DESIGN AND PROCEDURES

In a between-subjects design, we study the six treatments 16PD, 16W, 16S, 4PD, 4W, and 4S, which differ as to

* whether the stage game is PD, [EPD.sub.w] (W), or [EPD.sub.s] (S) and

* whether the number T of rounds is T = 16 or T = 4.

Subjects play 32 rounds of either PD or [EPD.sub.w], or [EPD.sub.s]. In 16PD, 16W, and 16S they play two supergames of 16 rounds each, with two different partners. In 4PD, 4W, and 4S they play eight successive supergames with four rounds each. (9) We use a random strangers design: subjects form matching groups of four; each subject is matched with another subject from her/his own matching group for the next supergame. We guarantee that no subject faces the same other subject in two successive supergames. We performed two sessions per treatment with 32 subjects per session. As one matching group of four subjects corresponds to one independent observation, we obtained 16 independent observations per treatment. In total, we recruited 384 undergraduate students from the University of Jena, using the online recruitment system for economic experiments (ORSEE) (Greiner 2004). On average, subjects earned 8.80 euros and spent 1 hour (15 minutes thereof on the instructive part) in the laboratory of the Max Planck Institute of Economics in Jena, Germany.

Upon arrival in the laboratory, subjects were randomly assigned to a cubicle, where they individually read the instructions. (10) After the instructions were also read aloud, subjects were able to familiarize themselves with the experiment during two or three (depending on the treatment) simulated test rounds. (11) Subsequently, they answered a questionnaire so that we could check their understanding of the game rules. After that, they participated in the computerized (12) experiment. During the experiment eye contact was not possible. Although subjects saw each other at the entrance to the lab, there was no way for them to guess with which person(s) from the crowd of 32 students they would be matched later on. Most subjects already had some previous experience with other experiments: more than 90% of the subjects had previously participated in at least one experiment.

We introduced two different time horizons into our experiment. Backward induction predicts the same cooperation rates in the PD game independent of the number of repetitions. However, previous experiments demonstrate that in long repeated PD games subjects play more cooperatively than in short ones (Dal B6 2005). Any difference between the simple and the extended PD game and between the two extended games might disappear when varying the horizon. The supergames with the shorter time horizon therefore provide a robustness check for our results.

IV. RESULTS

A. Cooperation Rates across Treatments

Average cooperation rates (13) by treatment are depicted in Figure 1. For both the long and the short horizon cooperation rates increase from PD over W toward S. However, only the strict game triggers significantly more cooperation compared to the standard PD, while the difference between the weak game and the standard PD is not statistically significant. Apparently, multiplicity of equilibria alone does not lead to more cooperation. Across horizons, for all game types (PD, W, S) cooperation is significantly higher for the longer horizon, confirming the results of Dal B6 (2005). The results of all pairwise comparisons between treatments with respect to means using a Wilcoxon rank-sum test are summarized in Table 1.

Figure 2 captures the evolution of cooperation over time in supergames with 16 rounds.

In round 17, subjects were assigned to a new partner (see the usual end game effect). The figure shows that the results above hold not only at the aggregate level but also for most rounds. When performing Wilcoxon rank-sum tests based on pairwise comparisons between treatments for each round, in almost all rounds cooperation rates in the strict game with the long horizon lie above those in the other two games (Appendix Table A 1). Cooperation rates in 16PD and 16W are never significantly different. In the short games, results are very similar, except for the fact that here we observe a significant difference between the strict and the other two games even in the last round of interaction with the same partner (see Figure 3 and Appendix Table A 1).

Result 1 An additional strict equilibrium significantly increases cooperation rates compared to the prisoner's dilemma game.

In our view, this is not a very surprising result. Although participants may not reason as presupposed by subgame perfect equilibrium outcomes stated by the folk theorem, they qualitatively seem to understand the preventive effect of the A-option. Thus, what is far more surprising is that this prevention only "works" when (A, A) is a strict equilibrium of the base game.

Result 2 An additional weak equilibrium does not increase cooperation rates compared to the prisoner's dilemma game.

Result 1 is in line with the theoretical prediction of the "strong rationality refinement," that is, the additional threat only increases the willingness to cooperate if it is strictly self-enforcing. The surprising result that an additional weak equilibrium of the stage game does not change behavior compared to the PD game also suggests that punishment option A whose mutual use does not qualify as an equilibrium will also be not very "preventive."

Table 2 reveals two sources of higher cooperation rates in the treatments with an additional strict equilibrium via transition probabilities between the different strategy profiles. On the right-hand side, one can see the transition frequency from outcome (D, D) in round t to each outcome (except for the outcomes including action A) in t + 1. Given that outcome (D, D) is prominent (compare the left-hand side of Table 2), reactions to it are important. First, (D, D) becomes less likely to be followed by (D, D) when passing from PD over W to S. Second, the percentage of players willing to unilaterally cooperate after a mutual defection is highest in the strict treatments.

B. Actual Use of the Additional Action A

The main purpose of action A is the prevention of defection from cooperation, that is, to discourage deviations from cooperation, as the additional action is mainly predicted off the

equilibrium path. (14) Hence, its actual use should be rare or at least become rare with experience. A's relative frequency is less than 3% in both the strict and the weak games (independent of the horizon), supporting the idea that action A is not supposed to be used.

Table 3 illustrates when players use action A and how they react when their partner has used A in the previous round. (15) A is mostly selected after the other has played D in the previous round, that is, after outcomes (C, D), (D, D), or (A, D). Some subjects choose A after having played D themselves, while the other has played C, that is, after an outcome (D, C), probably as a response to an expected punishment by the other player. Option A seems to be a (actual and expected) punishing device.

Do "punished" subjects become more cooperative in the next round? In fact, only a few start cooperating. Most subjects continue playing D. It seems the use of action A is ineffective in coordinating subjects on (C, C) in the subsequent round.

It is worth mentioning that the frequencies observed in Table 3 are due to just a few subjects who repeatedly use A. For example, in 16S only 10 out of 64 subjects employ A, and in 16W this number is 18.

Finally, the theoretical difference between [EPD.sub.s] and [EPD.sub.w] is based on the robustness of the stage game equilibrium (A, A). The frequency of this equilibrium is, however, very low. (A, A) appears once in 16S and twice in 16W. So (A, A) seems to be a coincidence rather than a systematic choice. The results are again very similar for the supergames with short horizon.

Result 3 The additional action A is seldom used, mainly to punish defection in the previous round. Subsequent to this punishment, however, only a few partners switch to C.

V. CONCLUSION

Folk theorem results do not require an infinite horizon. Multiple equilibria in the stage game suffice for subgame perfect equilibrium cooperation in all rounds except for the commonly known last round of interaction. Refining subgame perfection allows to distinguish between games with an additional strict versus weak equilibrium of the stage game. Experimentally we show that only the strict additional equilibrium has a measurable effect on cooperation rates. This result is robust to variations in the time horizon of interaction.

We justified our behavioral prediction of more cooperation in the strict than in the weak game with the intuition that defecting in the strict game may be more costly for the defector than defecting in the weak game. Further we argued that players at least partly ignore that for the same reason punishing in the strict game may be more costly also for the punisher and, thus, less credible. An experimental analysis of the intermediate game with bimatrix representation may sharpen our understanding about which of these arguments is the relevant one. In the table above A is undominated for player 1 and dominated for player 2. If punishment in the strict game is perceived as more costly only for the defector and therefore more credible, then in the intermediate game player 1 will defect less often than player 2. If, to the contrary, punishment in the strict game is perceived as costly for both the defector and the punisher and therefore less credible, player 1 will defect more often than player 2. We leave this for future research.

i = 2 i = 1 C D A C 18.18 0.21 0.0 D 21.0 9.9 0.0 A 3.3 3.3 3.3

Finally, let us resume our discussion started in Section I about punishment as a cooperation enhancing device. Our control experiment PD confirms previous findings that there is voluntary cooperation even when punishment cannot be equilibrated. (16) However, when punishment can be equilibrated by an additional strict equilibrium in the base game, voluntary cooperation increases significantly. Our general conclusion is that being able to equilibrate people's reciprocity inclination definitely strengthens their willingness to reciprocate and their anticipation of others' willingness to reciprocate. Both types of punishment, equilibrium and nonequilibrium, rely on reciprocity incentives. However, incentives seem stronger when they are equilibrated by an additional strict equilibrium.

APPENDIX 1 : EQUILIBRIUM PREDICTIONS

The Folk Theorem

Let us give up symmetry and explore the payoff space which can be justified by any pure strategy subgame perfect equilibrium as it is usually performed when establishing folk theorems.

Suppose [EPD.sub.w] is repeated T times so that the number of rounds left to be played is r = T ... 1. In each round r both players i select a stage game action [a.sub.i] [member of] {C, D, A}.

In the last round (r = 1), there are two pure subgame equilibrium strategies, (D, D) and (A, A), with additional subgame payoffs (9, 9) and (3, 3), respectively.

In the second last round (r = 2), the continuations are sufficient to discourage deviation from cooperation so that (C, C) is supported as equilibrium outcome with continuation (D, D) and the threat to select (A, A) instead after a deviation. For the same reason, the (less interesting) actions (D, A) and (A, D) can occur in equilibrium. Thus, the set of payoffs II2 = {(27, 27), (18, 18), (12, 12), (6, 6)} can be realized in the remaining two rounds by the corresponding subgame equilibrium strategies. In this round coordination failure, (C, D) and (D, C) and the actions (C, A) and (A, C) are never chosen in equilibrium, because the corresponding potential gains of 9 and 15 after a deviation cannot be compensated by equilibrium retribution in the last round.

One round earlier (r = 3), this is no longer valid, because the threat to continue with (18, 18), (12, 12), or (6, 6) instead of (27, 27), or with (6, 6) instead of (18, 18) discourages deviations. Similarly, the remaining asymmetric actions (C, A) and (A, C) can be stabilized in this round if the players continue with the equilibrium payoff (6, 6) instead of (27, 27). Thus, all nine action combinations may occur in an equilibrium in rounds r [greater than or equal to] 3 if the feasible continuations are restricted as described earlier. The resulting asymmetric and symmetric additional subgame payoffs are

[[PI].sub.3] = {(48, 27L (27, 48), (39, 18), (18, 39), (45, 45), (36, 36), (30, 30)}

[union] {(27, 27), (21,21), (15, 15), (9, 9)}.

The resulting restrictions on the action combinations can be summarized as follows: (17)

THEOREM 1 A combination of actions ([a.sub.1], [a.sub.2]) in round r of the repeated weak extended prisoners' dilemma (EPD) game is compatible with a subgame perfect equilibrium if and only if r [greater than or equal to] 3, or r = 2 and ([a.sub.1], [a.sub.2]) [member of] {(C, C), (D, D), (D, A), (A, D), (A, A)}, or r = 1 and ([a.sub.1], [a.sub.2]) [member of] {(D, D), (A, A)}.

Also in earlier rounds (r = 4, 5 ...), all nine action combinations are allowed by the equilibrium strategies. The corresponding set of equilibrium payoffs is generated by adding the potential stage payoffs {(18, 18), (21, 0), (0, 21), (9, 9), (3, 3)} to the set of equilibrium payoffs in the following round whenever the difference to the strongest potential punishment is sufficient to deter deviations. To construct the equilibrium payoffs in round r, we therefore have to identify the subgame equilibrium payoffs in round r - 1 which allow to punish both players with at least 3 (to reach [18, 18] in round r), which allow to punish the first player with at least 9 (to reach [0, 21] in round r), and which allow to punish the second player with at least 9 (to reach [21, 0] in round r). These three sets are defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In order to obtain the set of equilibrium payoffs in round r we finally add the corresponding equilibrium continuation payoffs as follows:

[[PI].sub.r] = ([[PI].sub.r-1] + {(9, 9), (3, 3)}) [union] ([[PI].sup.1.sub.r-1] + {(18, 18)})

[union] ([[PI].sup.2.sub.r-1] + {(21,0)}) [union] ([[PI].sup.3.sub.r-1] + {(0, 21)}).

For r = 4 (after eliminating double elements) this yields the set of possible additional subgame payoffs

[[PI].sub.4] = {(57, 36), (51,30), (36, 57), (30, 51), (48, 27), (27, 48), (36, 36)}

[union] {(30, 30), (24, 24), (18, 18), (12, 12)}

[union] {(63, 63), (54, 54), (48, 48), (45, 45), (39, 39), (33, 33)}

[union] {(69, 27), (60, 18), (66, 45), (57, 36), (51, 30), (48, 27), (42, 21)}

[union] {(27, 69), (18, 60), (45, 66), (36, 57), (30, 51), (27, 48), (21,42)}.

It is straightforward that a folk theorem holds for the resulting average equilibrium outcomes:

THEOREM 2 The set of average equilibrium payoffs [[PI].sub.r]/r converges to a dense set on the individually rational attainable average payoffs in

[omega] = {([[pi].sub.1], [[pi].sub.2]) | ([[pi].sub.1], [[pi].sub.2]) [greater than or equal to] (3, 3) and ([[pi].sub.1], [[pi].sub.2])

[member of] conv ((3, 3); (21,0); (18, 18); (0, 21))}.

To show this limit result, we approximate every interior point of [omega] as a rational convex combination [alpha]. (3, 3) + [beta] x (21, 0) + [gamma] x (18, 18) or [alpha] x (3, 3) + [beta] x (0, 21) + [gamma] x (18,18). Let us choose T so that [alpha]T [member of] N, [alpha]T [greater than or equal to] 4, [beta]T [member of] N and [gamma]T [member of] N, and let the players use ([alpha]T - 1) times (3, 3), [beta]T times (21,0), or (0, 21), respectively, [gamma]T times (18, 18) and (9, 9) once. This payoff scheme can be realized as an equilibrium outcome because it ends with equilibrium payoffs and because individual rationality ([[pi].sub.1], [[pi].sub.2]) [greater than or equal to] (3, 3) restricts the use of (21,0) sufficiently to allow the necessary threat for equilibrating such behavior.

In the strict extended prisoners' dilemma game [EPD.sub.s], the situation is more complicated as the additional payoff (0, 0) is more difficult to reach in an equilibrium. However, as this payoff is below the maximin stage game payoff, (18) this additional cell is only of limited relevance. There is no difference in the last round as the set of equilibrium payoffs coincides with those in the weak game. In the second last round (r = 2), the stage payoffs (0, 0) can be added to (9, 9) and (18, 18), because the threat to move on with (3, 3) instead is sufficient to stabilize the corresponding actions. Thus, [[PI].sub.2,strict] = {(27, 27), (18, 18), (12, 12), (9, 9), (6, 6)} contains one more element than [[PI].sub.2,weak]. This trick allows to fill the gaps between the average payoffs faster, but it does not change the limit result in Theorem 2 as it never affects the smallest average payoff (3, 3). The equilibrium actions are a little more restricted than above:

THEOREM 3 A combination of actions ([a.sub.1], [a.sub.2]) in round r of the repeated strict extended prisoners' dilemma game is compatible with a subgame perfect equilibrium if and only if r [greater than or equal to] 4, or r = 3 and ([a.sub.1], [a.sub.2]) [not member of] {(A,C),(C,A)}, or r = 2 and ([a.sub.1], [a.sub.2]) [member of] {(C, C), (D, D), (A, A)}, or r = 1 and ([a.sub.1], [a.sub.2]) [member of] {(C, C), (D, D)}.

Properness as a Refinement

In this section, we need additional notation. By [[alpha].sub.i] we denote mixed behavioral strategies with probabilities [[alpha].sub.i], (h) for the three possible actions a [member of] {C, D, A} after history h [member of] H. The normal form pure (mixed) strategies are denoted by s ([sigma]); [[sigma].sup.[epsilon]] is a sequence of completely mixed strategies with [epsilon] [right arrow] 0 and with [[sigma].sup.[epsilon].sub.i]([S.sub.i]) [greater than or equal to] [epsilon] for every pure strategy [s.sub.i];

([sigma] | [s.sub.i]) is the strategy vector in which the (mixed) strategy [[sigma].sub.i] is replaced by the (pure) strategy [s.sub.i], [pi] ([sigma]) = ([[pi].sub.1] ([sigma]), [[pi].sub.2] ([sigma])) is the expected payoff realized if the strategies [sigma] are played, and the payoff [[pi].sup.h.sub.i] ([sigma]) is the expected payoff of a mixed strategy conditional on the fact that history h has been reached (which is always well defined for completely mixed strategies).

The folk theorem holds in [EPD.sub.s] also if we restrict the attention to strictly perfect equilibria, that is, to equilibria which are stable with respect to every perceivable tremble. In [EPD.sub.w] the situation is drastically different. There the simple argument holds only for the defective equilibrium in which the players select the strategies [a.sub.i] ([h.sub.t]) = D for all t = 1 ... T. All other equilibria can be excluded by backward induction in the form of repeated elimination of dominated strategies sketched above: let players tremble such that future mistakes matter much less than present ones. From the requirement that (agent normal form) perfect equilibria must not use dominated behavioral strategies follows that players will not use dominated actions in the last round. So assume that the players select only behavioral strategies [a.sub.i] ([h.sub.t]) = D for all t > [??], then [a.sub.i] ([h.sub.[??]]) = D is the unique best reply after any [h.sub.[??]] [member of] [H.sub.[??]] to a completely mixed strategy, because potential gains from future trembles are negligible compared to the present gains. Thus, we can conclude that this tremble structure justifies only the defective action as equilibrium behavior:

THEOREM 4 Defective play [a.sub.i] ([h.sub.t]) = D for all [h.sub.t] [member of] H is the unique strictly perfect equilibrium in the agent normal form of [EPD.sub.w].

The concept of strict perfection is not accepted in the literature, because it requires so much stability that it does not always exist (van Damme 1987, 29). We therefore use the concept of proper equilibrium (Myerson 1978) to defend the uniqueness of defective play in [EPD.sub.w]. To impose rationality in decision nodes never reached, we restrict our attention to normal form proper equilibria in behavioral strategies which are approximated by [epsilon]-proper equilibria as proposed by van Damme (1987), 119. So we can prove

THEOREM 5 Defective play [a.sub.i]([h.sub.t]) = D for all [h.sub.t] [member of] H is the unique strategy which can be approximated by the normal form strategies of an [epsilon]-proper equilibrium of [EPD.sub.w].

We show that the behavioral strategies concentrate on an "always defect" continuation, that is, [[alpha].sup.[epsilon].sub.iD](h) [greater than or equal to] [[epsilon][alpha].sup.[epsilon].sub.ia](h) for both actions a [epsilon] {C, A} and for all histories h [epsilon] H so that "always defect" is the unique proper equilibrium which can be approximated by corresponding induced [epsilon]-proper behavioral equilibrium strategies.

We proceed by backward induction. Suppose the claim is true for all histories of length [rho] < r. Take an arbitrary history [h.sub.r] [member of] [H.sub.r] of length T - r and assume that the outcomes in round r are not concentrated on (D, D), that is, the limit distribution induced by [[alpha].sup.[epsilon]]([h.sub.r]) on the nine states puts a positive limit probability on some state ([a.sub.1], [a.sub.2]) [not equal to] (D, D) as [epsilon] [right arrow] 0. Let [D.sub.i]([h.sub.r]) be the set of player i's defective continuations of [h.sub.r], that is, the pure strategies which follow [h.sub.r] and select defect (D) in all subsequent decision nodes. Our induction assumption implies both players concentrating on defective continuations [d.sub.i] [member of] [D.sub.i] ([h.sub.r], ([a.sub.1], [a.sub.2])), no matter which state ([a'.sub.1], [a'.sub.2]) is realized after [h.sub.r]. So if ([a.sub.1], [a.sub.2]) ([h.sub.r]) [not equal to] (A, A), (at least) one of the players can improve his payoff by a deviation to a [d.sub.i] [member of] [D.sub.i] ([h.sub.r]): if [a.sub.j] [not equal to] A player i [not equal to] j receives a larger payoff after [h.sub.r], while the outcomes in the rest of the game are (9, 9) with probability [(1 - [epsilon]).sup.T-r] [right arrow] 0. Thus, the expected outcome difference in the remaining rounds becomes negligible in comparison to the gain after [h.sub.r].

So let ([a.sub.1], [a.sub.2]) = (A, A) and compare player l's payoff [[pi].sup.h.sub.1] ([[sigma].sup.[epsilon]] | [d.sub.1]) after a defective continuation [d.sub.1] [member of] [D.sub.1] ([h.sub.r]) and player 1's payoff [[pi].sup.h.sub.1]{[[sigma].sup.[epsilon]] | [[??].sub.1]) after any pure strategy [[??].sub.1], which continues with at least one later deviation from D after some history ([h.sub.r], (A, A) ...). Both payoffs contain the same constant value which is realized up to [h.sub.r]. In round r player 1 receives more than 3 if player 2 trembles to C or D, while ([[sigma].sup.[epsilon]] | [[??].sub.1]) leads to a payoff of 3. In later rounds player 1 may gain or lose from further trembles if he continues with ([[sigma].sup.[epsilon]] | [d.sub.1]), while ([[sigma].sup.[epsilon]] | [[??].sub.1]) generates a loss of at least 6 against player 2's regular strategy at least once. Thus, we get [[pi].sub.h]([[sigma].sup.[epsilon]] | [d.sub.1]) - [[pi].sub.h]([[sigma].sup.[epsilon]] | [[??].sub.1]) [greater than or equal to] [epsilon] (21 (T - r)) + 6. (1 - [epsilon]) > 0 for [epsilon] sufficiently small so that the requirement of [epsilon]-proper trembles implies that player l's trembles satisfy [[sigma].sup.[epsilon].sub.1] ([d.sub.1]) [greater than or equal to] [epsilon] x [[sigma].sup.[epsilon].sub.1] ([[??].sub.1]).

Finally, we use this condition to compare player 2's defective continuations [d.sub.2] [member of] [D.sub.2] ([h.sub.r]) with the pure strategy [[??].sub.2] [member of] [D.sub.2] ([h.sub.r], (A, A)), on which he is supposed to concentrate. The defective continuation [d.sub.2] yields a higher payoff than [[??].sub.2] ([r.sup.h.sub.2] ([[sigma].sup.[epsilon] | [d.sub.2]) > [[phi].sup.h.sub.2] ([[sigma].sup.[epsilon] | [[??].sub.2]])) if player 1 trembles after [h.sub.r] and it-may give less (or more) if player 1 trembles in later rounds. The condition [[sigma].sup.[epsilon].sub.1] ([d.sub.1]) [greater than or equal to] [epsilon] x [[sigma].sup.[epsilon].sub.1] ([[??].sub.1]) implies that later trembles are much less likely; the resulting expected present gains dominate potential future losses for small [epsilon]. This implies that strategies which put a positive weight on the action [a.sub.i] ([h.sub.r]) = A cannot be best replies to [sigma] and are therefore by van Damme's (1987, 30) Lemma 2.3.2 incompatible with an [epsilon]-proper equilibrium.

APPENDIX 2: INSTRUCTIONS

Welcome and thank you for participating in this experiment. Please read the instructions carefully. From now on we ask you to remain seated and to stop communicating with other participants. If you have any questions, please raise your hand. We will come to your place and answer your questions in private. It is very important that you follow these rules. Any violation will lead to your exclusion from the experiment and any payment.

The instructions are identical for all participants.

You will participate in the following sub-experiment two [eight]19 times. Every sub-experiment consists of 16 [four] rounds. Within the same sub-experiment you will be interacting with the same participant. Whenever a sub-experiment is finished, the other participant will be replaced. [It is possible that you interact with a participant you have already interacted with. However, it is impossible that you interact with the same other participant for two consecutive sub-experiments.]

In each round, you and the other participant will be simultaneously asked to choose one of three {two} (20) alternatives A, B, or C (21) {A, or B}. Depending on your own decision and the decision of the other participant, your earnings are given by the following table. (22,23)

For example, if you choose A while the other participant chooses B, you will earn 0 ECU and the other will earn 21 ECU. If you choose B and the other chooses A, you will receive 21 ECU and the other 0 ECU. At the end of each round, you will be informed about

* your own decision

* the decision of the other participant

* your earnings from the current round

* your total earnings from the current sub-experiment.

Your earnings from the two [eight] sub-experiments will be added up and paid to you in cash at the end of the experiment. The exchange rate is 66 ECU per 1 euro. Additionally, you will receive a show-up fee of 2.50 euros.

After reading these instructions, you can familiarize yourself with the experiment during three {two} test rounds. The test rounds are not relevant 1 or your earnings. Then you will be asked to answer some control questions. After this, the experiment will start. In the end we will ask you to fill in a brief questionnaire.

(1.) Ahlert, Crhger, and Guth (2001) show that equal damages for punishers and punished do not work.

(2.) In the experimental instructions we used a neutral frame for the strategies: "cooperate" was called "A," "defect" "B" and "avoid" "C."

(3.) In [EPD.sub.s] a mixed strategy equilibrium exists according to which both players use "defect" with probability 1/4 and "avoid" with probability 3/4.

(4.) Of course, the only pure strategy equilibrium of the PD game is (D, D).

(5.) Feinberg and Snyder (2002) use a workhorse that is very similar to our [EPD.sub.w]. However, the focus of their paper substantially differs from ours: they study the effect of imperfect information about the actions of the other player on collusion in repeated duopoly markets, while we are interested in comparisons between standard and extended PD games that allow for equilibrium punishment.

(6.) In our experiment and the corresponding theoretical analysis, we concentrate on supergames in which the cumulative payoffs of all T rounds are paid after the last round. However, our arguments are equally valid if the players discount their payoffs with some sufficiently large [delta] < 1. The cooperative outcome, for example, is obtained, whenever [delta] > 1/2.

(7.) Even though our backward elimination employs only "nice" weak dominance in the sense of Marx and Swinkels (1997), we cannot apply their result, because their condition of "Transference of Decision Maker Indifference" does not hold for our repeated game.

(8.) Strict perfection (Okada 1981) and properness (Myerson 1978) are refinements of trembling hand perfectness (Selten 1975). Strict perfection requires stability with respect to every tremble, while (the weaker) properness imposes a payoff dependent structure on the trembles. Both question the idea of uniform trembles (Harsanyi and Selten 1988). In Appendix 1 we use both, trembling hand perfectness and properness.

(9.) Imposing the same total number of rounds across treatments keeps the stakes constant across treatments and also the effects of fatigue, etc. We preferred this over controlling for the number of repetitions of supergames across treatments. Nevertheless, we can compare learning between treatments by focusing on the first repetition only.

(10.) For a translation of the instructions from the German, see Appendix 2.

(11.) In the test rounds subjects did not interact with another subject but with the computer.

(12.) We used the software z-Tree, Fischbacher (2007).

(13.) Results of the following analysis do not substantially change when we consider medians instead of means. Therefore, we do not discuss them separately.

(14.) However, the folk theorem shows that there are equilibria in which the players use action A earlier on the equilibrium path.

(15.) The last actions are, of course, triggered by second last ones which suggests to assess also the conditioning on earlier than last behavior. Our attempts to explore such conditioning on earlier than last behavior were, however, inconclusive.

(16.) Here we abstract from experimentally not induced incomplete information (Kreps et al., 1982) or weaker notions of rationality (Radner 1980).

(17.) Of course, not all actions can be combined to an equilibrium history so that there are further restrictions on the set of equilibrium strategies.

(18.) If one chooses, for example D with probability 1/4 and A with probability 3/4, one is sure to receive at least an expected payoff of 3/4 regardless of what the other does.

(19.) In square brackets: short horizon.

(20.) In curly brackets: PD treatments.

(21.) Here, "A" corresponds to "cooperate," "B" to "defect," and "C" to "avoid."

(22.) In round brackets: weak game. The table for the PD game consists only of the cells that include choices A and B.

(23.) ECU, experimental currency units.

The The Earnings Decision of My of the Other My the Other Earnings Participant Decision Participant in ECU in ECU A A 18 18 A B 0 21 A C 0 (3) 0 (3) B A 21 0 B B 9 9 B C 0 (3) 0 (3) C A 0 (3) 0 (3) C B 0 (3) 0 (3) C C 3 3 TABLE A1 Round-wise Wilcoxon Rank-Sum p Values for Means Long 16S versus 16S versus 16W versus Horizon Round 16W 16PD 16PD 1 0.09 0.01 0.41@ 2 0.14@ 0.01 0.39@ 3 0.04 0.02 0.69@ 4 0.02 0.01 0.77@ 5 0.06 0.44@ 0.27@ 6 0.16@ 0.07 0.57@ 7 0.05 0.04 0.77@ 8 0.03 0.01 0.69@ 9 0.04 0.00 0.30@ 10 0.02 0.06 0.82@ 11 0.02 0.01 0.80@ 12 0.04 0.02 0.59@ 13 0.03 0.03 0.84@ 14 0.03 0.03 0.89@ 15 0.19@ 0.19@ 0.92@ 16 0.36@ 0.86@ 0.43@ Short 4S versus 4S versus 4W versus Horizon Round 4W 4PD 4PD 1 0.09 0.05 0.56@ 2 0.08 0.02 0.4@ 3 0.13@ 0.17@ 0.64@ 4 0.01 0.05 0.77@ Notes: Wilcoxon rank-sum p-values with nonsignificant values in italics; test conducted by round; null hypothesis: two independent samples are from populations with the same distributing; 16 independent observations per treatment. Notes: Wilcoxon rank-sum p-values with nonsignificant values in italics with @ indicated.

REFERENCES

Ahlert, M., A. Cruger, and W. Guth. "How Paulus Becomes Saulus. An Experimental Study of Equal Punishment Games." Homo Oeconomicus, 18, 2001,303-18.

Benoit, J. P., and V. Krishna. "Finitely Repeated Games." Econometrica, 53(4), 1985, 905-22.

Bereby-Meyer, Y., and A. E. Roth. "The Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation." American Economic Review, 96(4), 2006, 1029-42.

Bruttel, L. V., W. Guth, and U. Kamecke. Forthcoming. "Finitely Repeated Prisoners' Dilemma Experiments without a Commonly Known End." International Journal of Game Theory, DOI: 10.1007/s00182-0110272-z.

Dal Bo, P. "Cooperation under the Shadow of the Future: Experimental Evidence from Infinitely Repeated Games." American Economic Review, 95(5), 2005, 1591-604.

Fehr, E., and S. Guchter. "Cooperation and Punishment in Public Goods Experiments." American Economic Review, 90(4), 2000, 980 94.

Feinberg, R., and C. Snyder. "Collusion with Secret Price Cuts: An Experimental Investigation." Economics Bulletin, 3(6), 2002, l - 11.

Fischbacher, U. "z-Tree: Zurich Toolbox for Ready-Made Economic Experiments." Experimental Economics, 10(2), 2007, 171-78.

Greiner, B. "An Online Recruitment System for Economic Experiments," in Forschung und Wissenschaftliches Rechnen 2003, GWDG Bericht 63, edited by K. Kremer and V. Macho. Goettingen: Ges. fuer Wiss. Datenverarbeitung, 2004, 79-93.

Harsanyi, J. C., and R. Selten. A General Theory of Equilibrium Selection in Games. Cambridge, MA: MIT Press, 1988.

Kreps, D. M., P. Milgrom, J. Roberts, and R. Wilson. "Rational Cooperation in the Finitely Repeated Prisoners' Dilemma." Journal of Economic Theory, 27, 1982, 245 -52.

Marx, L. M., and J. M. Swinkels. "Order Independence for Iterated Weak Dominance." Games and Economic Behavior, 18, 1997, 219-45.

Myerson, R. B. "Refinements of the Nash Equilibrium Concept." International Journal of Game Theory, 15, 1978, 133-54.

Nagel R. "Unraveling in Guessing Games: An Experimental Study." American Economic Review, 85(5), 1995, 1313-26.

Okada, A. "On Stability of Perfect Equilibrium Points." International Journal of Game Theory, 10(2), 1981, 67-73.

Ostrom, E., J. Walker, and R. Gardner. "Covenants With and Without a Sword: Self-Governance Is Possible." American Political Science Review, 86(2), 1992, 404-17.

Radner, R. "Collusive Behavior in Noncooperative Epsilon-equilibria of Oligopolies with Long but Finite Lives." Journal of Economic Theory, 22(2), 1980, 136-54.

Schwartz, S., R. A. Young, and K. Zvinakis. "Reputation without Repeated Interaction: A Role for Public Disclosures." Review of Accounting Studies, 5, 2000, 351-75.

Selten, R. "Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games." International Journal of Game Theory, 4(1), 1975, 25-55.

Selten, R., and R. Stoecker. "End Behavior in Sequences of Finite Prisoner's Dilemma Supergames." Journal of Economic Behavior and Organization, 7, 1986, 47-70.

van Damme, E. Stability and Perfection of Nash Equilibria. Berlin Heidelberg: Springer Verlag, 1987.

TABLE 1 Wilcoxon Rank-Sum p Values for Means; Null Hypothesis: Two Independent Samples Are from Populations with the Same Distribution; 16 Independent Observations per Treatment Treatment 16S versus 16W 16S versus 16PD 16W versus 16PD p values .036 .030 .706 Treatment 4S versus 4W 4S versus 4PD 4W versus 4PD p values .065 .048 .650 Treatment 16PD versus 4PD 16S versus 4S 16W versus 4W p values .004 .001 .002 TABLE 2 Relative Frequencies of Outcomes and Transition Frequencies from Outcome (D, D) in Round t to Outcomes (D, D), (C, D), (D, C), (C, C) in Round t + 1 Average 16PD (%) 16W (%) 16S(%) Frequency of (D, D) 32 26 13 (C, D) or (D, C) 16 13 12 (C, C) 52 55 74 After (D, D) 16PD (%) 16W (%) 16S(%) Frequency of (D, D) 84 79 77 (C, D) or (D, C) 14 1l 18 (C, C) 2 1 1 Notes: The numbers for each treatment do not sum up to 100% because outcomes including A were excluded. To abstract from the usual endgame effect, we do not count outcomes in rounds 16 and 32, as well as reactions to (D, D) in rounds 16 and 32. Only reactions to (D, D) within an interaction with the same partner are analyzed, that is, reactions to (D, D) in round 17 were not counted. TABLE 3 Absolute Frequency of Decision A after a Given Outcome in the Previous Round and Absolute Frequency of Decisions C, D, or A after Outcomes (C, A), (D, A), or (A, A) by Treatment; Subjects per Treatment = 64, Observations per Treatment = 2,048 Absolute Frequency of A-choices after ... 16S 16W Own C and other's C 0 0 Own C and other's D 9 9 Own C and other's A 0 0 Own D and other's C 3 6 Own D and other's D 5 23 Own D and other's A 1 1 Own A and other's C 0 0 Own A and other's D 5 16 Own A and other's A 0 0 Reaction after the Other Has Played A 16S 16W C after own C and other's A 4 4 C after own D and other's A 1 5 C after own A and other's A 1 3 D after own C and other's A 1 1 D after own D and other's A 9 38 D after own A and other's A 1 1 A after own C and other's A 0 0 A after own D and other's A 1 1 A after own A and other's A 0 0

Printer friendly Cite/link Email Feedback | |

Author: | Angelova, Vera; Bruttel, Lisa V.; Guth, Werner; Kamecke, Ulrich |
---|---|

Publication: | Economic Inquiry |

Article Type: | Abstract |

Geographic Code: | 4EUGE |

Date: | Apr 1, 2013 |

Words: | 8890 |

Previous Article: | How health insurance affects health care demand--a structural analysis of behavioral moral hazard and adverse selection. |

Next Article: | A laboratory study of auctions with a buy price. |

Topics: |