博弈论(下)
Evolutionarily Stable Strategies
Payoff: fitness of individuals(the ability to pass on its genes to the next generation).
Pure Strategy: certain behavior determined by the gene mixed strategy:
(i) a random mechanism that trigger off each of the behaviors or
(ii) the proportion of population with certain behavior gene.
Consider a large population of ESS players playing .
A mutation occurs, producing a small numbers of players with mutant strategy , .
The proportion of the population with this mutant strategy is .
Then for a player using the expected payoff is .
A player using the payoff is .
We want , this is guaranteed if for :
(i):
(ii):
This is called an Evolutionary Stable Strategy.
Theorem: Suppose the payoff matrix is . Suppose for the column of is such that its diagonal element, , is the largest (strictly) of the column. Then playing the strategy is an ESS.
Theorem: Let [] be a given bimatrix game, where A is an n × n matrix. Let p= be a mixed strategy. Then p is an ESS iff
(*) >0 implies strategy is a BR row to p.
(**) Let B be the matrix such that if i, j are BR rows to p and otherwise. Then, < 0 where satisfies (i) = 0 if i is not a BR row to p and (ii) = 0.
Theorem: Let be a symmetric bimatrix game. There exists a mixed strategy p such that <p, p> is a SE.
Theorem: Let A be the Scissor-Rock-Paper payoff matrix. Then does not have an ESS.
The Evolution of Cooperation
Prisoner's Dilemma:
Cooperate | Defect | |
---|---|---|
Cooperate | R,R | S,T |
Defect | T,S | P,P |
T: Temptation, R: Reward, P: Punishment, S: Sucker's payoff
T > R > P > S ensures <D, D> is a PSE.
2R S + T is necessary for cooperation.
Repeated Prisoner's Dilemma: Playing Prisoner's Dilemma infinite number of times, $x at the game is worth $ , then it makes sense to compute total payoff.
Finite Automata:
Input: Some “history” of the game. Player II's action at the stage.
Output: Player I's action at the stage.
A finite automata can only remember a finite number of things.
1-memory automata: The player can only remember what happened in the previous move and make decision based on that. We have to input the first move. Then, the player will make a move based on the 4 possibilities: CC, CD, DC, DD, where the first letter denotes the move of the player concerned and the second letter denotes the move of the opponent.
Then, a pure strategy with 1-memory is given by f: {CC, CD, DC, DD} → {C, D}.
Examples:
All D: Defect all the times. First move=D, f(CC)=D, f(CD)=D, f(DC)=D, f(DD)=D.
PR: Permanent Retaliation. Cooperate, until, if ever, the opponent defects, then defect for ever. First move=C, f(CC)=C, f(CD)=D, f(DC)=D, f(DD)=D.
Tit-for-Tat: Cooperate first, then do what your opponent's previous move. First move=C, f(CC)=C, f(CD)=D, f(DC)=C, f(DD)=D.
AltDC: Start with D and then alternatively playing C and D. First move=D, f(CC)=D, f(CD)=D, f(DC)=C, f(DD)=C.
Expressing Mixed Strategies: When both players are using a 1-memory automata, we can use the technique of transition matrix in expressing the strategies. Each row of the matrix is a probability vector giving the probability of transiting to one of the 4 possible states (CC, CD, DC, DD).
Theorem 1: <All D, All D> is a SE.
Theorem 2: <PR, PR> is a SE if β is large enough (Assume T > R > P > S, ).
Theorem 3: When β is large enough, <TFT, TFT> is a SE (Assume T > R > P > S , 2R T + P, ).
A strategy is said to be • Nice: Start cooperating and never the first to defect.
• Retaliatory: It should reliably punish defection by its opponent.
• Forgiving: Having punished defection, it should be willing to try to cooperate again.
• Clear: It's pattern of play should be consistent and easy to predict.
Theorem 4: Let S be a nice strategy and that <S, S> is a SE. Then β is sufficiently large. Theorem 5: Let S be a nice strategy and that <S, S> is a SE. Then S should be provoked (when the opponent plays D, S must retaliate by playing D at some later move) by the defection of the opponent.
Cooperative Games
The cooperative theory is divided into two classes of problems depending on whether or not there is a mechanism for transfer of utility from one player to the other: Transferable utility (TU) or Nontransferable utility (NTU).
Feasible Sets of Payoff Vectors: For any game it is important to know the set of feasible payoffs vectors for the players. For a noncooperative game, this set is called the noncooperative feasible set.
The set of payoff vectors that the players can achieve if they cooperate is called the Cooperative Feasible Set.
Definition: The NTU feasible set is the convex hull of the mn points, for i =1, ... , m and j = 1, ... , n.
The distinguishing feature of the TU case is that the players may make side payments of utility as part of the agreement: by making a side payment, the payoff vector can be changed to .
Definition: The TU feasible set is the convex hull of the set of vectors of the form for i = 1, ... , m and j = 1, ... , n and for arbitrary real numbers s.
Definition: A feasible payoff vector, (), is said to be Pareto optimal if for any feasible payoff vector () such that and implies .
Cooperative Games with Transferable Utility
The TU Solution: If the players come to an agreement, then rationality implies that they will agree to play to achieve the largest possible total payoff, call it as the payoff to be divided between them. That is they will jointly agree to use some row and column such that . Such a joint choice is called their cooperative strategy.
Suppose now that the players have selected their threat strategies, say p for Player I and q for Player II. Then if agreement is not reached, Player I receives and Player II receives . The resulting payoff vector, .
This point is in the NTU feasible set and is called the disagreement point or threat point. They will split evenly the excess of σ over . This is called the Egalitarian Principle.
The resolution point is then . To select the threat optimally, Player I wants to maximize and Player II wants to minimize it. This is in fact a zero-sum game with matrix A-B: .
Let p* and q* denote optimal strategies of the 0-sum game A - B for Players I and II respectively, and let δ denote the value, . When these strategies are used, the disagreement point or threat point becomes . Since , we have as the TU solution: .
Cooperative Games with Non-Transferable Utility
Nash Bargaining Model: A bargaining problem is given by (S, u*, v*) such that
(i) S is a compact (i.e. bounded and closed), convex set, in the plane.
(ii) , is called the threat point, disagreement point or status-quo point.
Nash Solution to all bargaining problems: In the approach of Nash, “fair and reasonable” is defined by a few axioms:
(1) Feasibility. .
(2) Pareto Optimality. is Pareto optimal in S.
(3) Symmetry. If S is symmetric about the 45° line u = v, and if , then .
(4) Independence of irrelevant alternatives. If T is a closed convex subset of S, and if the threat point and , then .
(5) Invariance under change of location and scale. If , where and are given numbers, then .
Then it is shown that these axioms lead to a unique solution, denoted as for any .
Theorem: There exists a unique function f satisfying the Nash axioms. Moreover, if there exists a point (u, v) ⊆ S such that and , then is that point of S that maximizes (u - u*)(v - v*) among points of S such that and . is called the Nash product.
Kalai-Smorodinsky solution: For any bargaining problem , define the Utopia Points for each player: .
The Kalai-Smorodinsky solution , a compromise between the utopia and the disagreement points, is on the Pareto optimal boundary such that .
Zeuthen's Principle: Suppose that, at a given stage of the bargaining process, Player 1's last offer corresponded to the payoff vector whereas Player 2's last offer corresponded to the payoff vector such that v and w belong to the Pareto Optimal boundary of the cooperative feasible set, with but .
Then are the highest probability of a disagreement that Player 1 and Player 2, respectively, would face rather than accept the last offer of the other player, where . We will call the two players' risk limits expressed as ratio of difference in payoffs.
Zeuthen suggests that the next concession must always come from the player with a smaller risk limit (except that if the two players' risk limits are equal then both of them must make concessions to avoid a conflict). We call this suggestion Zeuthen's Principle.
Theorem: If the two players follow Zeuthen's Principle then the next concession will always made by the player whose last offer is associated with a smaller Nash product (unless both are associated with equal Nash product, in which case both of them have to make concessions).
Corollary: If the two players act according to Zeuthen's Principle then they will tend to reach the Nash solution as their agreement point.
Games in Coalitional Form
Many-Person TU Games: In many-person cooperative games, there are no restrictions on the agreements that may be reached among the players. In addition, we assume that all payoffs are measured in the same units and that there is a transferable utility which allows side payments to be made among the players.
Coalition: Let n \geq 2 denote the number of players in the game, numbered from 1 to n, and let N denote the set of players, N = {1, 2, . . . , n}. A coalition, S, is defined to be a subset of N, , and the set of all coalitions is denoted by . We also speak of the empty set, Ø, as a coalition, the empty coalition. The set N is also a coalition, called the grand coalition.
Definition: The coalitional form of an n-person game is given by the pair (N, v), where N = {1, 2, . . . , n} is the set of players and v is a realvalued function, called the characteristic function of the game, defined on the set, , of all coalitions (subsets of N), and satisfying
(i) v(Ø) = 0, and
(ii) (superadditivity) if S and T are disjoint coalitions (S ∩ T = Ø), then v(S) + v(T) ≤ v(S ∪ T).
Relation to Strategic Form: Transforming a game from strategic form to coalitional form entails specifying the value, v(S), for each coalition . We define v(S) for each as the value of the 2-person zero-sum game obtained when the coalition S acts as one player and the complementary coalition, , acts as the other player, and where the payoff to S is , the total of the payoffs to the players in S.
Theorem: Given any game in coalition form (N, v), one can find a strategic form game whose reduction to coalition form has v as its characteristic function.
Definition: A payoff vector, , is said to be group rational or efficient if .
Definition: A payoff vector, x, is said to be individually rational if for all i = 1, ... , n.
Definition: An imputation is a payoff vector that is group rational and individually rational. The set of imputations may be written , and for all i \in N.
Theorem: The set of all imputations is a nonempty convex set.
Definition: An imputation x is said to be unstable through a coalition S if . We say x is unstable if there is a coalition S such that x is unstable through S, and we say x is stable otherwise.
Definition: The set, C, of stable imputations is called the core, C = , for all .
Definition: A game in coalitional form is said to be constant-sum, if v(S) + v() = v(N) for all coalitions S ⊆ . It is said to be zero-sum if, in addition, v(N) = 0.
Definition: A game in coalitional form is said to be inessential if , and essential if .
Theorem: The core of an essential n-person constant-sum game is empty.
Definition: A game in coalition form (N, v) is convex iff for any coalitions .
Theorem: Let (N, v) be a convex game. Denote N={1,...,n}. Define
Then, is an imputation and is in the core of (N, v). Consequently, the core of a convex game is nonempty.
Theorem: A bankruptcy game is a convex game.
The Shapley Value
The Shapley Value is to assign to each game in coalitional form a UNIQUE vector of payoffs, called the Shapley Value. The entry of the value vector may be considered as a measure of the value or power of the player in the game.
A value function, φ, is function that assigns to each possible set function of an n-person game, v, an n tuple, of real numbers. represents the worth or value of player i in the game with characteristic function v.
The Shapley Axioms for φ(v):
Efficiency. .
Symmetry. If i and j are such that for every coalition S not containing i and j, then .
Dummy Axiom. If i is such that for every coalition S not containing i, then .
Additivity. If u and v are characteristic functions, then φ(u + v) = φ(u) + φ(v).
Theorem: There exists a unique function φ on the set of all set functions satisfying the Shapley axioms.
For a given nonempty set , let be defined for all such that = 1 if , = 0 otherwise.
Claim: Any v may be written as , then .
Theorem(Shapley Value): There exists a unique value function satisfying the Shapley axioms. Indeed, the Shapley value is given by an explicit formula: , for i = 1, ... , n.
Theorem: The Shapley value of a convex game lies in the core.
Definition: A game (N, v) is simple if for every coalition , either v(S) = 0 or v(S) = 1. In a simple game, a coalition S is said to be a winning coalition if v(S) = 1 and a losing coalition if v(S) = 0..
The Nucleolus
Definition of Excess: As a measure of the inequity of an imputation x for a coalition S is defined as the excess, .
The concept of the nucleolus: On the principle that the one who yells loudest gets served first, we look first at those coalitions S whose excess, for a fixed allocation x, is the largest.
Then we adjust x, if possible, to make this largest excess smaller. Note that as is fixed, making one coalition happy will be at the expenses of other coalitions.
When the largest excess has been made as small as possible, we concentrate on the next largest excess, and adjust x to make it as small as possible, and so on. When we cannot improve further, the imputation we get is called the nucleolus.
Define O(x) as the vector of excesses arranged in decreasing (non-increasing) order for . We say a vector is lexicographically less than a vector , and write , if , or and , or and , ... , or and . The nucleolus is an imputation that minimizes O(x) in the lexicographic ordering.
Definition: Let (N, v) be a game in coalition form. Let be the set of imputations. We say that a vector ν ⊆ X is a nucleolus if for every x ⊆ X we have .
Theorem : The nucleolus of a game in coalitional form exists and is unique. The nucleolus is group rational, individually rational, and satisfies the symmetry axiom and the dummy axiom. If the core is not empty, the nucleolus is in the core.
Gately Point
When Player i threatens to quit, the credibility of his threat is measured by the Propensity of Player i to disrupt the grand coalition defined as follows: is the ratio of the loss of the rest of the coalition against the loss of Player i.
Definition: The Gately point is the imputation such that it minimizes among all imputations.
Therefore, the Gately point is the imputation such that the maximum of is smallest.
Theorem: At the Gately point we have .
Ordinal Matching Games
Characteristics of an Ordinal Matching Game:
• Absence of numerical payoffs.
• Players have preferences.
• We cannot perform arithmetic operations on the payoffs.
• We will deal with rankings such as 1st choice, 2nd choice etc.
• The core concept can be applied. The core is the set of all feasible outcomes that no coalitions can improve upon.
The Marriage Game:
• Players are of two types, Boys and Girls.
• Any boy/girl can marry if they wish.
• Polygamy is frowned upon.
• The possible outcomes are matchings, consisting of the different sets of marriages that the players might enter into.
• All “side payments” are not allowed.
• The essential data for a marriage game is a double list of preference orderings.
• Core Principle: A matching will not be stable if some coalition is not satisfied.
Definition: A matching μ is said to be stable if no boy and girl not matched in μ prefer each other to their μ -mates.
The deferred acceptance algorithm:
A: | b | a | d | c |
---|---|---|---|---|
B: | c | d | a | b |
C: | c | d | b | a |
D: | d | a | b | c |
a: | C | B | A | D |
---|---|---|---|---|
b: | B | C | A | D |
c: | A | D | C | B |
d: | A | C | D | B |
Boy Propose:
1st Day | 2nd Day | 3rd Day | |
---|---|---|---|
a: | B | ||
b: | A | A | A |
c: | C | C | |
d: | D | D, |
D |
rej: | B | B |
Ans: <Ba, Ab, Cc, Dd>
Girl Propose:
1st Day | 2nd Day | |
---|---|---|
A: | d | |
B: | b | b |
C: | a | a |
D: | c | |
rej: | c |
Ans: <Ad, Bb, Ca, Dc>
Theorem: The matchings obtained by the deferred acceptance algorithm are stable.
Definition: Call a boy (girl) feasible for a girl (boy) if there exists a stable matching in which they are matched.
Theorem: In a girl-propose algorithm, no girl is ever rejected by a boy feasible for her. Similarly, in a boy-propose algorithm, no boy is ever rejected by a girl feasible for him.
Corollary: In the girl-propose algorithm, the girls are matched to optimal feasible boys. Therefore, girl-propose algorithm is girl-optimal. In the boy-propose algorithm, the boys are matched to optimal feasible girls. Therefore, boy-propose algorithm is boy-optimal.
Set of Stable Matchings: Let B be the set of boys and G be the set of girls. We suppose that each boy has a rank list of girls and each girl has a rank list of boys. Let S be the set of all stable matchings according to the rank lists.
For a matching μ, and a given boy b, we denote the girl matched to b to be μ(b). We will define a partial order in the set of matchings. (A partial order is a relation that is reflexive, antisymmetric and transitive.)
For any two matchings μ, λ, we write whenever b prefers μ(b) over λ(b).
Now, let μ, λ be two stable matchings. For a given boy b, denote γ(b) be the preferred girl between μ(b) and λ(b).
Claim: If b ≠ m, then γ(b) ≠ γ(m).
Claim: γ is a stable matching.
Theorem: . This means that is a lattice.
Theorem: The maximal element in is boy-propose matching. The minimal element in is girl-propose matching.
One-sided Matching: One type of agents, each has a preference over some indivisible, heterogeneous goods.
Let N={A, B, C, ...} be the set of traders who are homeowners. Suppose A owns house a, B owns house b, C owns house c, etc. The traders can transfer ownership amongst themselves in any way they please, except that at the end no one is allowed to own more than one house. Each trader has strict preference over the houses:
A: c e f a b d
B: b a c e f d
C: e f c a d b
D: c a b e d f
E: d c b f e a
F: b d e f a c
Top Trading Cycle algorithm:
Step1: Make a directed graph, with each trader represented by a vertex from which an edge points to the owner of his/her topranked house.
Step 2: Find top trading cycles.
Step 3: Let the traders in the top trading cycles trade amongst themselves. Delete them from the preference table. Repeat Step 1 if any traders left.
Step 4: When every trader has been assigned to a top trading cycle, execute all the indicated trades. Then, we get the following allocation.
Theorem: The TTC algorithm produces a stable allocation in the sense that no coalition of traders trading only with each other, could have achieved any allocation in which all members would be better off.