8 Repeated Games
%if doesn’t work
%random variable %random vector or matrix%differentiated vector or matrix
%#TODO%#TODO
% Basic notation - general % Basic notation - general % Special symbols % Basic functions %image %cardinality % Operators - Analysis % Linear algebra % Linear algebra - Matrices % Statistics %at time % Useful commands % Game Theory %T-stage game %strategy in a repeated game %successor function %player function%action function
% Optimization$$
Recall the Prisoner’s dilemma game
\(C\) | \(S\) | |
---|---|---|
\(C\) | \((-5,-5)\) | \((0,-20)\) |
\(S\) | \((-20,0)\) | \((-1,-1)\) |
Imagine that the criminals are being arrested repeatedly. Can they somewhat reflect upon their experience in order to play “better”? In this chapter, we consider strategic-form games played repeatedly:
- for finitely many rounds, the final payoff of each player will be the average of payoffs from all rounds;
- infinitely many rounds, here we consider a discounted sum of payoff and the long-run average payoff.
We will also analyze Nash and subgame-perfect equilibria.
We stick with pure strategies only!
8.1 Finitely Repeated Games
Let \(G = (\left\{1,2\right\}, (S_1, S_2), (u_1, u_2))\) be a finite strategic-form game of two players. We shall use \(S^t\) to denote \(S^t = \underbrace{S \times \cdots \times S}_{t\text{ times}} = \prod_{i = 1}^t S\).
Definition 8.1 A \(T\)-stage game \({G}_{T{\text -}\mathrm{rep}}\) based on \(G\) proceeds in \(T\) stages so that in a stage \(t \geq 1\), players choose a strategy profile \(\boldsymbol{s}^t = (s_1^t, s_2^t)\). After \(T\) stages, both players collect the average payoff \(\sum_{t = 1}^T u_i(\boldsymbol{s}^t)/T\).
Definition 8.2 A history of length \(0 \leq t \leq T\) is a sequence \(h = \boldsymbol{s}^1 \cdots \boldsymbol{s}^t \in S^t\) of \(t\) strategy profiles. Denote by \(H(t)\) the set of all histories of length \(t\) and let \(\epsilon\) represent the empty history.
A pure strategy for player \(i\) in a \(T\)-stage game \({G}_{T{\text -}\mathrm{rep}}\) is a function \[ \tau_i : \bigcup_{t = 0}^{T - 1}H(t) \to S_i, \] which for every possible history chooses a next step for player \(i\).
Every strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \({G}_{T{\text -}\mathrm{rep}}\) induces a sequence of pure strategy profiles \(w_{\boldsymbol{\tau}} = \boldsymbol{s}^1 \cdots \boldsymbol{s}^T\) in \(G\) so that \(s_i^t = \tau_i(\boldsymbol{s}^1 \cdots \boldsymbol{s}^{t-1})\). Given a pure strategy profile \(\boldsymbol{\tau}\) in \({G}_{T{\text -}\mathrm{rep}}\) such that \(w_{\boldsymbol{\tau}} = \boldsymbol{s}^1 \cdots \boldsymbol{s}^T\), define the payoffs \(u_i(\boldsymbol{\tau}) = \sum_{t = 1}^T u_i(\boldsymbol{s}^t)/T\).
Example 8.1 Again remember the Prisoner’s dilemma:
\(C\) | \(S\) | |
---|---|---|
\(C\) | \((-5,-5)\) | \((0,-20)\) |
\(S\) | \((-20,0)\) | \((-1,-1)\) |
This time, we shall assume it is a 3-stage game. Examples of histories might be: \[ \epsilon, (C,S), (C,S)(S,S), (C,S)(S,S)(C,C). \] As we have \({G}_{3{\text -}\mathrm{rep}}\) this time, the last history is final, which induces that this history sequence can be obtained with \[\begin{align*} \tau_1(\epsilon) &= C, & \tau_1((C,S)) &= S, & \tau_1((C,S)(S,S)) &= C, \\ \tau_2(\epsilon) &= S, & \tau_2((C,S)) &= S, & \tau_2((C,S)(S,S)) &= C. \end{align*}\]
Thus \(w_{\boldsymbol{\tau}} = (C,S)(S,S)(C,C)\) and \[\begin{align*} u_1(\boldsymbol{\tau}) &= (0 + (-1) + (-5))/3 = -2 \\ u_2(\boldsymbol{\tau}) &= (-20 + (-1) + -(5))/3 = - \frac {26} 3 \end{align*}\]
8.1.1 Finitely Repeated Games in Extensive-Form
Every \(T\)-stage game \({G}_{T{\text -}\mathrm{rep}}\) can be defined as an imperfect-information extensive-form game \({{G}_{\mathrm{imp}}^{\mathrm{rep}}} = ({{G}_{\mathrm{perf}}^{\mathrm{rep}}}, \boldsymbol{I})\) such that \({{G}_{\mathrm{perf}}^{\mathrm{rep}}} = (\left\{1,2\right\}, A, H, Z, \chi, \rho, \pi, h_0, \boldsymbol{u})\) where
- \(A = S_1 \cup S_2\);
- \(H = (S_1 \times S_2)^{< T} \cup (S_1 \times S_2)^{< T} \cdot S_1\) (intuitively, elements of \((S_1 \times S_2)^{\leq k}\) are possible histories and \((S_1 \times S_2)^{< k} \cdot S_1\) is used to simulate a simultaneous play of \(G\) letting player 1 choose first and player 2 second);
- \(Z = (S_1 \times S_2)^T\);
- \(\chi(\epsilon) = S_1\) and \(\chi(h \cdot s_1) = S_2\) for \(s_1 \in S_1\), and \(\chi(h \cdot (s_1, s_2)) = S_1\) for \((s_1, s_2) \in S\);
- \(\rho(\epsilon) = 1\) and \(\rho(h \cdot s_1) = 2\) and \(\rho(h \cdot (s_1, s_2)) = 1\);
- \(\pi(\epsilon, s_1) = s_1\) and \(\pi(h \cdot s_1, s_2) = h \cdot (s_1, s_2)\) and \(\pi(h \cdot (s_1, s_2), s'_1) = h \cdot (s_1, s_2) \cdot s'_1\);
- \(h_0 = \epsilon\) and \(u_i((s_1^1, s_2^1)(s_1^2, s_2^2) \cdots (s_1^T, s_2^T)) = \sum_{t = 1}^T u_i(s_1^t, s_2^t)/T\).
Now the collection of information sets is defined as follows: Let \(h \in H_1\) be a node of player 1, then
- there is exactly one information set of player 1 containing \(h\) as the only element,
- there is exactly one information set of player 2 containing all nodes of the form \(h \cdot s_1\) where \(s_1 \in S_1\).
Intuitively, in every round, player 1 has complete information about the results of past plays, thus player 1 chooses a pure strategy \(s_1 \in S_1\), but player 2 is not informed about \(s_1\), but still has complete information about results of all previous rounds. Hence player 2 chooses a pure strategy \(s_2\) and both players are informed about the result.
8.1.2 Equilibria
Definition 8.3 A strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in a \(T\)-stage game \({G}_{T{\text -}\mathrm{rep}}\) is a Nash equilibrium if for every \(i \in \left\{1,2\right\}\) and every \(\tau_i'\) we have \[ u_i(\tau_1, \boldsymbol{\tau}_{-i}) \geq u_i(\tau'_i, \boldsymbol{\tau}_{-i}). \]
To define a subgame-perfect equilibrium we use the following notation. Given a history \(h = \boldsymbol{s}^1 \cdots \boldsymbol{s}^t\) and a strategy \(\tau_i\) of player \(i\), we define strategy \(\tau_i^h\) in \((T - t)\)-stage game based on \(G\) by \[ \tau_i^h(\bar{\boldsymbol{s}}^1 \cdots \bar{\boldsymbol{s}}^{\bar t}) = \tau_i(\overbrace{\boldsymbol{s}^1 \cdots \boldsymbol{s}^t}^h \bar{\boldsymbol{s}}^1 \cdots \bar{\boldsymbol{s}}^{\bar t}) \] for every sequence \(\bar{\boldsymbol{s}}^1 \cdots \bar{\boldsymbol{s}}^{\bar t}\) (i.e. \(\tau_i^h\) behaves as \(\tau_i\) after \(h\)).
This can be sort of interpreted as a closure (from functional programming languages terminology).
Definition 8.4 A strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in a \(T\)-stage game \({G}_{T{\text -}\mathrm{rep}}\) is a subgame-perfect Nash equilibrium (SPE) is for every history \(h\) the profile \((\tau_1^h, \tau_2^h)\) is a Nash equilibrium in the \((T - |h|)\)-stage game based on \(G\).
Example 8.2 Consider now a \(T\)-stage game based on the Prisoner’s dilemma:
\(C\) | \(S\) | |
---|---|---|
\(C\) | \((-5,-5)\) | \((0,-20)\) |
\(S\) | \((-20,0)\) | \((-1,-1)\) |
Surely, playing \((C,C)\) (the Nash equilibrium in the strategic-form game) for every \(t \leq T\) gives us a SPE in the \(T\)-stage. But is this the only SPE in this game?
Theorem 8.1 Let \(G\) be an arbitrary finite strategic-form game. If \(G\) has a unique Nash equilibrium, then playing this equilibrium every time is the unique SPE (but not necessarily a unique Nash equilibrium) in the \(T\)-stage game based on \(G\).
Proof. By backward induction, players have to play the Nash equilibrium in the last stage. As the behavior in the last stage does not depend on the behavior in the \((T-1)\)-th stage, they have to play the NE also in the \((T-1)\)-th stage. Then the same holds in the \((T-2)\)-th stage, etc.
8.1.2.1 Nash Equilibria
As Theorem 8.1 states, there is only one subgame-perfect equilibrium in a \(T\)-stage game based on Prisoner’s dilemma, but what are other Nash equilibria (if there are any)?
\(C\) | \(S\) | |
---|---|---|
\(C\) | \((-5,-5)\) | \((0,-20)\) |
\(S\) | \((-20,0)\) | \((-1,-1)\) |
To simplify our discussion, we use the following notation: \(X - YZ\), where \(X,Y,Z \in \left\{C,S\right\}\) denotes the following strategy:
- in the first phase, play \(X\);
- in the second phase, play \(Y\) if the opponent plays \(C\) in the first phase, otherwise play \(Z\).
It can be shown that there are 4 Nash equilibria, as there are exactly four profiles that lead to \((C,C)(C,C)\) history – in these profiles, each player plays either \(C-CC\) or \(C-CS\).
Now we shall turn our attention to the underlying strategic-form game for a moment. Here, the strategy \(C\) strictly dominates \(S\). But in the 2-stage game based on the Prisoner’s dilemma, if player 2 plays \(S - CS\), then the best responses of player 1 are \(S-CC\) and \(S-CS\). On the other hand, if player 2 plays \(S-CS\), then the best responses are \(C-SC\) and \(C-CC\), so there is no strictly dominant strategy for player 1 (which would be among the best responses for all strategies of player 2).
The strategy \(S-CS\) is usually called “tit-for-tat”.
8.1.2.2 Subgame-Perfect Equilibria
Let \(\boldsymbol{s} = (s_1, s_2)\) be a Nash equilibrium in \(G\). Define a strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \({G}_{T{\text -}\mathrm{rep}}\) where
- \(\tau_1\) chooses \(s_1\) in every stage;
- \(\tau_2\) chooses \(s_2\) in every stage.
Lemma 8.1 Using \(\boldsymbol{s}\) and \(\boldsymbol{\tau}\) as we have defined above, \(\boldsymbol{\tau}\) is a subgame-perfect equilibrium in \({G}_{T{\text -}\mathrm{rep}}\) for every \(T \geq 1\).
Proof. Apparently, changing \(\tau_i\) in some stage(s) may only result in the same or worse payoff for player \(i\), since the other player always plays \(s_2\) independent of the choices of player 1.
The Lemma 8.1 may be generalized by allowing players to play different equilibria in particular stages, i.e., consider a sequence of Nash equilibria \(\boldsymbol{s}^1, \boldsymbol{s}^2, \dots, \boldsymbol{s}^T\) in \(G\) and assume that in stage \(l\) player \(i\) plays \(s^l_i.\)
But this still does not cover all possible subgame-perfect equilibria in finitely repeated games! Consider the following game \(G\):
\(m\) | \(f\) | \(r\) | |
---|---|---|---|
\(M\) | \((4,4)\) | \((-1,5)\) | \((0,0)\) |
\(F\) | \((5,-1)\) | \((1,1)\) | \((0,0)\) |
\(R\) | \((0,0)\) | \((0,0)\) | \((3,3)\) |
The Nash equilibria in this strategic form game \(G\) are \((F,f)\) and \((R,r)\). Now consider a 2-stage game \({G}_{2{\text -}\mathrm{rep}}\) and strategies \(\tau_1, \tau_2\) where
- \(\tau_1\): chooses \(M\) in stage 1. In stage 2 plays \(R\) if \((M,m)\) was played in the first stage, and plays \(F\) otherwise;
- \(\tau_2\): chooses \(m\) in stage 2. In stage 2 plays \(r\) if \((M,m)\) was played in the first stage, and plays \(f\) otherwise.
Although both players do not play Nash equilibrium in the first stage, it still is subgame-perfect equilibrium. The idea is that both players agree to play a Pareto optimal profile. If both comply, then a favorable Nash equilibrium is played in the second stage. If one of them betrays then a “punishing” Nash equilibrium is played.
8.2 Infinitely Repeated Games
Definition 8.5 Let \(G = (\left\{1,2\right\}, (S_1, S_2), (u_1, u_2))\) be a finite strategic-form game of two players. An infinitely-repeated game \(G_{\mathrm{irep}}\) based on \(G\) proceeds in stages so that in each stage, say \(t\), players choose a strategy profile \(\boldsymbol{s}^t = (s_1^t, s_2^t)\).
Recall that a history of length \(t \geq 0\) is a sequence \(h = \boldsymbol{s}^1 \cdots \boldsymbol{s}^t \in S^t\) of \(t\) strategy profiles. Denote again by \(H(t)\) the set of all histories of length \(t\).
Definition 8.6 A pure strategy for player \(i\) in the infinitely repeated game \(G_{\mathrm{irep}}\) is a function \[ \tau_i : \bigcup_{t = 0}^\infty H(t) \to S_i, \] which for every possible history chooses a next step for player \(i\).
Every pure strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \(G_{\mathrm{irep}}\) induces a sequence of pure strategy profiles \(w_{\boldsymbol{\tau}} = \boldsymbol{s}^1 \boldsymbol{s}^2 \cdots\) in \(G\) so that \(s_i^t = \tau_i(\boldsymbol{s}^1 \cdots \boldsymbol{s}^{t-1})\). Here for \(t = 1\) we have that \(\boldsymbol{s}^1 \cdots \boldsymbol{s}^{t-1} = \epsilon\), which again denotes the empty history.
8.2.1 Discounted payoff
Let \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) be a pure strategy profile in \(G_{\mathrm{irep}}\) such that \(w_{\boldsymbol{\tau}} = \boldsymbol{s}^1 \boldsymbol{s}^2 \cdots\)
Definition 8.7 Given \(0 < \delta < 1\), we define \(\delta\)-discounted payoff of player \(i\) by \[ u_i^\delta(\boldsymbol{\tau}) = (1 - \delta) \sum_{t = 0}^\infty \delta^t \cdot u_i(\boldsymbol{s}^{t+1}). \]
Given a strategic-form game \(G\) and \(0 < \delta < 1\), we denote by \(G_{\mathrm{irep}}^\delta\) the infinitely repeated game based on \(G\) together with the \(\delta\)-discounted payoffs.
Definition 8.8 A strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) is a Nash equilibrium in \(G_{\mathrm{irep}}^\delta\) if for both \(i \in \left\{1,2\right\}\) and for every \(\tau_i'\) and every \(\boldsymbol{\tau}_{-i}\) we have that \[ u_i^\delta(\tau_i, \boldsymbol{\tau}_{-i}) \geq u_i^\delta(\tau_i', \boldsymbol{\tau}_{-i}). \]
Given a history \(h = \boldsymbol{s}^1 \cdots \boldsymbol{s}^t\) and a strategy \(\tau_i\) of player \(i\), we define a strategy \(\tau_i^h\) in the infinitely repeated game \(G_{\mathrm{irep}}\) by \[ \tau_i^h(\bar{\boldsymbol{s}}^1 \cdots \bar{\boldsymbol{s}}^{\bar t}) = \tau_i(\boldsymbol{s}^1 \cdots \boldsymbol{s}^t \bar{\boldsymbol{s}}^1 \cdots \bar{\boldsymbol{s}}^{\bar t}) \] for every sequence \(\bar{\boldsymbol{s}}^1 \cdots \bar{\boldsymbol{s}}^{\bar t}\) (i.e. \(\tau_i^h\) behaves as \(\tau_i\) after \(h\)).
Definition 8.9 Now \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) is a subgame-perfect equilibrium in \(G_{\mathrm{irep}}^\delta\) if for every history \(h\) we have that \((\tau_1^h, \tau_2^h)\) is a Nash equilibrium.
Note that \((\tau_1^h, \tau_2^h)\) must be a NE also for all histories of \(h\) that are not visited when the profile \((\tau_1, \tau_2)\) is used.
Example 8.3 Consider the infinitely repeated game \(G_{\mathrm{irep}}\) based on the Prisoner’s dilemma:
\(C\) | \(S\) | |
---|---|---|
\(C\) | \((-5,-5)\) | \((0,-20)\) |
\(S\) | \((-20,0)\) | \((-1,-1)\) |
Our goal now will to be calculate the Nash and subgame-perfect equilibria in \(G_{\mathrm{irep}}^\delta\) for a given discount \(\delta\). Consider a pure strategy profile \((\tau_1, \tau_2)\) where \(\tau_1(\boldsymbol{s}^1 \cdots \boldsymbol{s}^T) = C\) for all \(T \geq 1\) and \(i \in \left\{1,2\right\}\). Just as this was a subgame-perfect equilibrium in the finitely repeated Prisoner’s dilemma, it stays a SPE in the infinitely repeated Prisoner’s dilemma too (as it maximizes the payoff of each stage separately – as if it was a one-shot game).
8.2.2 Grim Trigger and Simple Folk Theorem
Now consider the infinitely repeated Prisoner’s dilemma from Example 8.3 and the grim trigger profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) where \[ \tau_i(\boldsymbol{s}^1 \cdots \boldsymbol{s}^T) = \begin{cases} S, & T = 0, \\ S, & \boldsymbol{s}^l = (S,S) \quad \forall l \in \left\{1, \dots, T\right\}, \\ C, & \text{otherwise}. \end{cases} \]
Notice that \((S,S)\) provides better payoffs than NE in \(G\) in each stage of \(G_{\mathrm{irep}}\). Thus it seems logical to remain silent as long, as the other player cooperates, but then keep confessing if he ever betrays us. Suppose that player \(i\) starts with this strategy and considers deviating in stage \(k\) to receive a payoff of 0 instead of \(-1\). Thereafter, his opponent chooses \(C\), and so he will also choose \(C\) in the remainder of the game. The use of the grim trigger strategies therefore defines a Nash equilibrium iff the equilibrium payoff of \(-1\) is at least as large as the payoff from deviating to \(C\) in stage \(k\) and ever after (deviation to \(S\) from \(C\) is surely suboptimal) 1: \[\begin{align*} (1 - \delta) \sum_{t = 0}^\infty \delta^t (-1) &\geq (1 - \delta) \left( \sum_{t = 0}^{k-1} \delta^t (-1) + \underbrace{\delta^k \cdot 0}_0 + \sum_{t = k+1}^\infty \delta^t (-5) \right) \\ \sum_{t = 0}^\infty \delta^t (-1) &\geq \sum_{t = 0}^{k-1} \delta^t (-1) + \sum_{t = k+1}^\infty \delta^t (-5) \\ \sum_{t = k}^\infty \delta^t (-1) &\geq \sum_{t = k+1}^\infty \delta^t (-5) \\ &\iff \\ \sum_{t = 0}^\infty \delta^t (-1) &\geq \sum_{t = 1}^\infty \delta^t (-5) \\ \frac {-1} {1 - \delta} &\geq 5 + \sum_{t = 0}^\infty \delta^t (-5) = 5 + \frac {-5}{1 - \delta} \\ \frac 4 {1 - \delta} &\geq 5 \\ \delta &\geq \frac 1 5. \end{align*}\]
In general, let \(G = (\left\{1,2\right\}, (S_1, S_2), (u_1, u_2))\) be two-player strategic form game where \(u_1, u_2\) are bounded on \(S = S_1 \times S_2\) (but \(S\) may be infinite) and let \(\boldsymbol{s}^*\) be a Nash equilibrium in \(G\). Let \(\boldsymbol{s}\) be a strategy profile in \(G\) satisfying \(u_i(\boldsymbol{s}) > u_i(\boldsymbol{s}^*)\) for all \(i \in N\). Consider the following grim trigger for \(\boldsymbol{s}\) using \(\boldsymbol{s}^*\) strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \(G_{\mathrm{irep}}\) where \[ \tau_i(\boldsymbol{s}^1 \cdots \boldsymbol{s}^T) = \begin{cases} s_i, & T = 0, \\ s_i, & \boldsymbol{s}^l = \boldsymbol{s} \quad \forall l \in \left\{1, \dots, T\right\}, \\ s_i^*, & \text{otherwise}. \end{cases} \] Then for \[ \delta \geq \max_{i \in \left\{1,2\right\}} \frac {\max_{s'_i \in S_i} u_i(s'_i, \boldsymbol{s}_{-i}) - u_i(\boldsymbol{s}) } {\max_{s'_i \in S_i} u_i(s'_i, \boldsymbol{s}_{-i}) - u_i(\boldsymbol{s}^*)} \] we have that \((\tau_1, \tau_2)\) is a subgame-perfect equilibrium in \(G_{\mathrm{irep}}^\delta\) and \(u_i^\delta(\boldsymbol{\tau}) = u_i(\boldsymbol{s})\).
8.2.2.1 Examples
Example 8.4 Consider the infinitely repeated game \(G_{\mathrm{irep}}\) based on the following game \(G\):
\(m\) | \(f\) | \(r\) | |
---|---|---|---|
\(M\) | \((4,4)\) | \((-1,5)\) | \((3,0)\) |
\(F\) | \((5,-1)\) | \((1,1)\) | \((0,0)\) |
\(R\) | \((0,3)\) | \((0,0)\) | \((2,2)\) |
There is one Nash equilibrium in \(G\), that is \((F,f)\). Consider the grim trigger for \((M,m)\) using \((F,f)\), i.e., the profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \(G_{\mathrm{irep}}\) where
- \(\tau_1\): plays \(M\) in a given stage if \((M,m)\) was played so far in all previous stages, and plays \(F\) otherwise;
- \(\tau_2\): plays \(m\) in a given stage if \((M,m)\) was played so far in all previous stages, and plays \(f\) otherwise;
Then this strategy profile \(\boldsymbol{\tau}\) is a subgame-perfect equilibrium in \(G_{\mathrm{irep}}\) for all \(\delta \geq \frac 1 4\). Also, \(u_i(\boldsymbol{\tau}) = 4\) for all \(i \in \left\{1,2\right\}\). There is also a grim trigger for \((R,r)\) using \((F,f)\), which is a SPE in \(G_{\mathrm{irep}}\) for \(\delta \geq \frac 1 2\).
Example 8.5 (Tacit Collusion) Consider the Cournot duopoly, see Section 3.3.3, game model \(G = (N, (S_i)_{i \in N}, (u_i)_{i \in N})\)
- \(N = \left\{1,2\right\}\);
- \(S_i = [0, \kappa]\);
- \(u_1(q_1, q_2) = q_1(\kappa - q-1 - q_2) - q_1 c_1 = (\kappa - c_1) q_1 - q_1^2 - q_1 q_2\) and
\(u_2(q_1, q_2) = q_2(\kappa - q-1 - q_2) - q_2 c_2 = (\kappa - c_2) q_2 - q_2^2 - q_1 q_2\).
For simplicity we assume that \(c_1 = c_2 = c\) and denote \(\theta= \kappa - c\).
If the firms sign a binding contract to produce only \(\theta/4\), their profit would be \(\theta^2/8\) which is higher than the profit \(\theta^2/9\) for playing the Nash equilibrium \((\theta/3, \theta/3)\) in \(G\). However, such contracts are forbidden in many countries (including US). Is then still possible that the firms will behave selfishly (i.e. only maximizing their profits) and still obtain such payoffs? In other words, is there a SPE in the infinitely repeated game based on \(G\) (with a discount factor \(\delta\)) which gives the payoffs \(\theta^2/8\)?
Consider the grim trigger profile for \((\theta/4, \theta/4)\) using \((\theta/3, \theta/3)\) Nash equilibrium. Then player \(i\) will
- produce \(q_i = \theta/4\) whenever all profiles in history are \((\theta/4, \theta/4)\);
- whenever one of the players deviates, produce \(\theta/3\) from that moment on.
Assume that \(\kappa = 100\) and \(c = 10\) (which gives \(\theta= 90\)), this is a SPE in \(G_{\mathrm{irep}}\) for \(\delta \geq 0.5294\cdots\) and it results in \((\theta/4, \theta/4)(\theta/4, \theta/4) \cdots\) with discounted payffos \(\theta^2/8\).
8.3 Long-Run Average Payoff
In this section, we assume that all payoffs in the game \(G\) are positive and that \(S\) is finite!
Let \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) be a strategy profile in the infinitely repeated game \(G_{\mathrm{irep}}\) such that \(w_{\boldsymbol{\tau}} = \boldsymbol{s}^1 \boldsymbol{s}^2 \cdots\)
Definition 8.10 We define a long-run average payoff for player \(i\) by \[ u^{\mathrm{avg}}_i(\boldsymbol{\tau}) = \limsup_{T \to \infty} \frac 1 T \sum_{t = 1}^T u_i(\boldsymbol{s}^t). \]
The long-run average payoff \(u^{\mathrm{avg}}_i(\boldsymbol{\tau})\) is well-defined if the limit \(u^{\mathrm{avg}}_i(\boldsymbol{\tau}) = \lim_{T \to \infty} \frac 1 T \sum_{t = 1}^T u_i(\boldsymbol{s}^t)\) exists.
Here, \(\limsup\) is necessary in general, because \(\tau_i\) may cause non-existence in the limit (but in \(\mathbb{R}\) with the usual choice of metrics, we can always select a convergent subsequence).
Given a strategic-form game \(G\), we denote by \(G_{\mathrm{irep}}^{\mathrm{avg}}\) the infinitely repeated game based on \(G\) together with the long-run average payoff.
Definition 8.11 A strategy profile \(\boldsymbol{\tau}\) is a Nash equilibrium if \(u^{\mathrm{avg}}_i(\boldsymbol{\tau})\) is well-defined for all \(i \in N\), and for every \(i\) and every \(\tau'_i\) we have that \[ u^{\mathrm{avg}}_i(\tau_i, \boldsymbol{\tau}_{-i}) \geq u^{\mathrm{avg}}_i(\tau_i', \boldsymbol{\tau}_{-i}). \]
Note that we demand the existence of the defining limit of \(u^{\mathrm{avg}}_i(\tau_i, \boldsymbol{\tau}_{-i})\) but the limit does not have to exist for \(u^{\mathrm{avg}}_i(\tau_i', \boldsymbol{\tau}_{-i})\).
Moreover, \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) is a subgame-perfect equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\) if for every history \(h\) that \((\tau_1^h, \tau_2^h)\) is a Nash equilibrium.
Example 8.6 Consider the infinitely repeated game based on the Prisoner’s dilemma with long-run average payoff:
\(C\) | \(S\) | |
---|---|---|
\(C\) | \((-5,-5)\) | \((0,-20)\) |
\(S\) | \((-20,0)\) | \((-1,-1)\) |
The grim trigger profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) where \[ \tau_i(\boldsymbol{s}^1 \cdots \boldsymbol{s}^T) = \begin{cases} S, & T = 0, \\ S, & \boldsymbol{s}^l = (S,S) \quad \forall l \in \left\{1, \dots, T\right\}, \\ C, & \text{otherwise}. \end{cases} \] is a SPE that gives the long-run average payoff \(-1\) to each player.
The intuition behind the grim trigger works just like for the discounted payoff – whenever a player \(i\) deviates, the player \(-i\) starts playing \(C\) for which the best response of player \(i\) is also \(C\). So we obtain \((S,S) \cdots (S,S)(X,Y)(C,C)(C,C)\cdots\) (here \((X,Y)\) is either \((C,S)\) or \((S,C)\) depending on who deviates). Apparently, the long-run average payoff is \(-5\) for both players, which is worse than \(-1\).
However, other payoffs can be supported by Nash equilibrium. Consider e.g. a strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) such that
- both players cyclically play as follows:
- 9 times \((S,S)\);
- once \((S,C)\);
- if one of the players deviates, then, from that moment on, both play \((C,C)\) forever.
Then \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) is also a SPE. Apparently, \(u^{\mathrm{avg}}_1(\boldsymbol{\tau}) = \frac 9 {10} (-1) + \frac {-20} {10} = - \frac {29} {10}\) and \(u^{\mathrm{avg}}_2(\boldsymbol{\tau}) = \frac 9 {10}(-1) + 0 = -\frac {9} {10}\). Hence player 2 gets a better payoff than from the “best” profile \((S,S)\).
8.4 Folk Theorems
The previous examples suggest that other (possibly all?) convex combinations of payoffs may be obtained by means of Nash equilibria. This observation forms a basis for a bunch of theorems, collectively referred to as Folk Theorems.
No author is listed since these theorems had been known in the games community long before they were formalized.
In this section, we prove several versions of Folk Theorem concerning achievable payoffs for repeated games. We consider the following variants:
- Long-run average payoffs and SPE;
- Long-run average payoffs and Nash equilibria.
Similar theorems can also be proven for the discounted payoff.
8.4.1 Feasible Payoffs
Definition 8.12 We say that a vector of payoffs \(\boldsymbol{v} = (v_1, v_2) \in \mathbb{R}^2\) is feasible if it is a convex combination of payoffs for pure strategy profiles in \(G\) with rational coefficients, i.e. if there are rational numbers \(\beta_{\boldsymbol{s}}\), here \(\boldsymbol{s} \in S\), satisfying \(\beta_{\boldsymbol{s}} \geq 0\) and \(\sum_{\boldsymbol{s} \in S} \beta_{\boldsymbol{s}} = 1\) such that for both \(i \in \left\{1,2\right\}\) holds \[ v_i = \sum_{\boldsymbol{s} \in S} \beta_{\boldsymbol{s}} \cdot u_i(\boldsymbol{s}). \] We assume that there is \(m \in \mathbb{N}\) such that each \(\beta_{\boldsymbol{s}}\) can be written in the form \(\beta_{\boldsymbol{s}} = \gamma_{\boldsymbol{s}} / m\).
The following theorems can be extended to a notion of feasible payoffs using arbitrary, possibly irrational, coefficients \(\beta_{\boldsymbol{s}}\) in the convex combination. Roughly speaking, this follows from the fact that each real number can be approximated with rational numbers up to an arbitrary error. However, the proofs are technically more involved.
Theorem 8.2 Let \(\boldsymbol{s}^*\) be a pure strategy Nash equilibrium in \(G\) and let \(\boldsymbol{v} = (v_1, v_2)\) be a feasible vector of payoffs satisfying \(v_i \geq u_i(\boldsymbol{s}^*)\) for both \(i \in \left\{1,2\right\}\). Then there is a strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \(G_{\mathrm{irep}}\) such that
- \(\boldsymbol{\tau}\) is a subgame-perfect equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\);
- \(u^{\mathrm{avg}}_i(\boldsymbol{\tau}) = v_i\) for \(i \in \left\{1,2\right\}\).
Proof. Consider a strategy profile \(\boldsymbol{\tau}\) in \(G_{\mathrm{irep}}\) which gives the following behavior:
- unless one of the players deviates, the players play cyclically all profiles \(\boldsymbol{s} \in S\) so that each \(\boldsymbol{s}\) is always played for \(\gamma_{\boldsymbol{s}}\) rounds;
- whenever one of the players deviates, then, from that moment on, each player \(i\) plays \(\boldsymbol{s}^*_i\).
Trivially, \(\boldsymbol{u}^{\mathrm{avg}}(\boldsymbol{\tau}) = \boldsymbol{v}\). We shall now verify that \(\boldsymbol{\tau}\) is a subgame-perfect equilibrium. Consider a fixed history \(h\), we show that \(\boldsymbol{\tau}^h = (\tau_1^h, \tau_2^h)\) is a Nash equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\).
- If \(h\) does not contain deviation from the cyclic behavior (1.), then \(\boldsymbol{\tau}^h\) continues according to (1.), thus \(u^{\mathrm{avg}}_i(\boldsymbol{\tau}^h) = v_i\).
- If \(h\) contains a deviation from (1.), then \[ w_{\boldsymbol{\tau}^h} = \boldsymbol{s}^* \boldsymbol{s}^* \cdots \] and thus \(u^{\mathrm{avg}}_i(\boldsymbol{\tau}^h) = u_i(\boldsymbol{s}^*)\).
- Now if a player \(i\) deviates from \(\tau_i^h\) to \(\bar \tau^h_i\) in \(G_{\mathrm{irep}}^{\mathrm{avg}}\), then \[ w_{(\bar \tau^h_i, \boldsymbol{\tau}^h_{-i})} = \alpha (s_i^1, \boldsymbol{s}'_{-i})(s_i^2, \boldsymbol{s}^*_{-i})(s_i^3, \boldsymbol{s}^*_{-i})\cdots \] where \(\alpha\) is a sequence of profiles following the cyclic behavior (1.), \(s_i^1, s_i^2, \dots\) are strategies of \(S_i\) and \(\boldsymbol{s}'_{-i}\) is a strategy of \(S_{-i}\). However, then \(u^{\mathrm{avg}}_i(\bar \tau^h_i, \boldsymbol{\tau}^h_{-i}) \leq u_i(\boldsymbol{s}^*) \leq v_i\) since \(\boldsymbol{s}^*\) is a Nash equilibrium and thus \(u_i(s_i^k, \boldsymbol{s}^*_{-i}) \leq u_i(\boldsymbol{s}^*)\) for all \(k \geq 1\).
Intuitively said, player \(-i\) punishes player \(i\) by playing \(\boldsymbol{s}^*_{-i}\).
8.4.2 Individual Rationality
Definition 8.13 A tuple \(\boldsymbol{v} = (v_1, v_2) \in \mathbb{R}^2\) is individually rational if for both \(i \in \left\{1,2\right\}\) holds \[ v_i \geq \min_{\boldsymbol{s}_{-i}\in S_{-i}} \max_{s_i \in S_i} u_i(s_i, \boldsymbol{s}_{-i}). \]
That is, \(v_i\) is least as large as the value that player \(i\) may secure by playing best responses to the most hostile behavior of player \(-i\).
Note that here, player \(-i\) chooses his “most hostile behavior” before player \(i,\) who then reacts to it maximizing his payoff in this unfavorable situation, i.e. he minimizes the harm done by player \(-i\) to him. Hence player \(-i\) is able to hurt player \(i\) less than if the order of maximization/minimization was the other way around.
Example 8.7 Consider a game given by the following table:
\(L\) | \(R\) | |
---|---|---|
\(U\) | \((-2,2)\) | \((1,-2)\) |
\(M\) | \((1,-2)\) | \((-2,2)\) |
\(D\) | \((0,1)\) | \((2,3)\) |
Here any \(\boldsymbol{v} = (v_1, v_2)\) such that \(v_1 \geq 1\) and \(v_2 \geq 2\) is individually rational.
Theorem 8.3 Let \(\boldsymbol{v} = (v_1, v_2)\) be a feasible and individually rational vector of payoffs. Then there is a strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \(G_{\mathrm{irep}}\) such that
- \(\boldsymbol{\tau}\) is a Nash equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\);
- \(\boldsymbol{u}^{\mathrm{avg}}(\boldsymbol{\tau}) = \boldsymbol{v}\).
Proof. We will use a slightly modified strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) in \(G_{\mathrm{irep}}\) from the proof of Theorem 8.2:
- unless one of the players deviates, the players play cyclically all profiles \(\boldsymbol{s} \in S\) so that each \(\boldsymbol{s}\) is always played for \(\gamma_{\boldsymbol{s}}\) rounds;
- whenever a player \(i\) deviates, the opponent \(-i\) starts playing a strategy \(\boldsymbol{s}^{\min}_{-i}\in \mathop{\mathrm{argmin}}_{\boldsymbol{s}_{-i}\in S_{-i}} \max_{s_i \in S_i} u_i(s_i, \boldsymbol{s}_{-i})\).
It is trivial to see that \(\boldsymbol{u}^{\mathrm{avg}}(\boldsymbol{\tau}) = \boldsymbol{v}\). Now if a player \(i\) deviates, then his long-run average payoff cannot be higher than \(\min_{\boldsymbol{s}_{-i}\in S_{-i}} \max_{s_i \in S_i} u_i(s_i, \boldsymbol{s}_{-i}) \leq v_i\), so \(\boldsymbol{\tau}\) is a Nash equilibrium.
Theorem 8.4 If a strategy profile \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) is a Nash equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\), then \(\left( u^{\mathrm{avg}}_1(\boldsymbol{\tau}), u^{\mathrm{avg}}_2(\boldsymbol{\tau}) \right)\) is individually rational.
Proof. Suppose that \(\left( u^{\mathrm{avg}}_1(\boldsymbol{\tau}), u^{\mathrm{avg}}_2(\boldsymbol{\tau}) \right)\) is not individually rational. Without loss of generality, assume that \(u^{\mathrm{avg}}_1(\boldsymbol{\tau}) < \min_{s_2 \in S_2} \max_{s_1 \in S_1} u_1(s_1, s_2)\). Now let us consider a new strategy \(\bar \tau_1\) such that for every history \(h\) the pure strategy \(\bar \tau_1(h)\) is a best response to \(\tau_2(h)\). But then, for every history \(h\), we have \[ u_1(\bar \tau_1(h), \tau_2(h)) \geq \min_{s_2 \in S_2} \max_{s_1 \in S_1} u_1(s_1, s_2) > u^{\mathrm{avg}}_1(\boldsymbol{\tau}). \]
So clearly \(u^{\mathrm{avg}}_1(\bar \tau_1, \tau_2) > u^{\mathrm{avg}}_1(\boldsymbol{\tau})\) which contradicts the fact that \(\boldsymbol{\tau}= (\tau_1, \tau_2)\) is a Nash equilibrium.
Note that if irrational convex combinations are allowed in the definition of feasibility, see Definition 8.12, then vectors of payoffs for Nash equilibria in \(G_{\mathrm{irep}}^{\mathrm{avg}}\) are exactly feasible and individually rational vectors of payoffs. Indeed, the coefficients \(\beta_{\boldsymbol{s}}\) in the definition of feasibility are exactly frequencies with which the individual profiles of \(S\) are played in the Nash equilibria.
8.4.3 Summary
We have proven that “any reasonable” (i.e. feasible and individually rational) vector of payoffs can be justified as payoffs for a Nash equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\) (where the future has “an infinite weight”). Concerning subgame-perfect equilibria, we have proven that any feasible vector of payoffs dominating a Nash equilibrium in \(G\) can be justified as payoffs for subgame-perfect equilibrium in \(G_{\mathrm{irep}}^{\mathrm{avg}}\).
This result can be generalized to arbitrary feasible and strictly individually rational payoffs by means of a more demanding construction.
For discounted payoffs, one can prove that an arbitrary feasible vector of payoffs dominating a Nash equilibrium in \(G\) can be approximated using payoffs for subgame-perfect equilibrium in \(G_{\mathrm{irep}}^\delta\) as \(\delta\) goes to 1, and even this result can be extended to feasible and strictly individually rational payoffs.
For a very detailed discussion of Folk Theorems see “A Course in Game Theory” by M. J. Osborne and A. Rubinstein.
8.5 Summary of Extensive-Form Games
We have considered extensive-form games (i.e., games on trees)
- with perfect information;
- with imperfect information.
We have considered pure strategies, mixed and behavioral strategies (see Kuhn’s Theorem 7.1). We have also studied Nash equilibria and subgame-perfect equilibria in pure strategies.
For finite perfect-information games, we have shown that
- there always exists a pure strategy SPE;
- SPE can be computed using backward induction in polynomial time.
On the other hand, for imperfect information games, the following holds:
- the backward induction can be used to propagate values through “perfect information nodes”, but “imperfect information parts” have to be solved by different means;
- solving imperfect information games is at least as hard as solving games in strategic form; however, even in the zero-sum case, most decision problems are NP-hard.
Finally, we discussed repeated games. We considered both, finitely as well as infinitely repeated games. For finitely repeated games we considered the average payoff and discussed the existence of pure strategy Nash and subgame-perfect equilibria with respect to the existence of Nash equilibria in the original strategic-form game.
For infinitely repeated games we considered both
- discounted payoff: we have formulated and applied a simple folk theorem: “grim trigger” strategy profiles can be used to implement any vector of payoffs strictly dominating payoffs for a Nash equilibrium in the original strategic-form game;
- long-run average payoff: we have proven that all feasible and individually rational vectors of payoffs can be achieved by Nash equilibria (a variant of the grim trigger).