@@ -73,107 +73,181 @@ def strategy(self, opponent: Player) -> Action:
73
73
return D
74
74
return C
75
75
76
- # TODO Split this in to ttwo strategies, it's not clear to me from the internet
77
- # sources that the first implentation was buggy as opposed to just "poorly
78
- # thought out". The flaw is actually clearly described in the paper's
79
- # description: "Initially, they are both assumed to be .5, which amounts to the
80
- # pessimistic assumption that the other player is not responsive"
81
- # The revised version should be put in it's own module.
82
- # I also do not understand where the decision rules come from.
83
- # Need to read https://journals.sagepub.com/doi/10.1177/003755007500600402 to
84
- # gain understanding of decision rule.
85
- class RevisedDowning (Player ):
76
+ class FirstByDowning (Player ):
86
77
"""
87
78
Submitted to Axelrod's first tournament by Downing
88
79
89
80
The description written in [Axelrod1980]_ is:
90
81
91
- > "This rule selects its choice to maximize its own long- term expected payoff on
82
+ > "This rule selects its choice to maximize its own longterm expected payoff on
92
83
> the assumption that the other rule cooperates with a fixed probability which
93
84
> depends only on whether the other player cooperated or defected on the previous
94
- > move. These two probabilities estimates are con- tinuously updated as the game
85
+ > move. These two probabilities estimates are continuously updated as the game
95
86
> progresses. Initially, they are both assumed to be .5, which amounts to the
96
87
> pessimistic assumption that the other player is not responsive. This rule is
97
88
> based on an outcome maximization interpretation of human performances proposed
98
89
> by Downing (1975)."
99
90
100
- This strategy attempts to estimate the next move of the opponent by estimating
101
- the probability of cooperating given that they defected (:math:`p(C|D)`) or
102
- cooperated on the previous round (:math:`p(C|C)`). These probabilities are
103
- continuously updated during play and the strategy attempts to maximise the long
104
- term play. Note that the initial values are :math:`p(C|C)=p(C|D)=.5`.
91
+ The Downing (1975) paper is "The Prisoner's Dilemma Game as a
92
+ Problem-Solving Phenomenon" [Downing1975]_ and this is used to implement the
93
+ strategy.
105
94
106
- # TODO: This paragraph is not correct (see note above)
107
- Downing is implemented as `RevisedDowning`. Apparently in the first tournament
108
- the strategy was implemented incorrectly and defected on the first two rounds.
109
- This can be controlled by setting `revised=True` to prevent the initial defections.
95
+ There are a number of specific points in this paper, on page 371:
110
96
111
- This strategy came 10th in Axelrod's original tournament but would have won
112
- if it had been implemented correctly.
97
+ > "[...] In these strategies, O's [the opponent's] response on trial N is in
98
+ some way dependent or contingent on S's [the subject's] response on trial N-
99
+ 1. All varieties of these lag-one matching strategies can be defined by two
100
+ parameters: the conditional probability that O will choose C folloging C by
101
+ S, P(C_o | C_s) and the conditional probability that O will choose C
102
+ following D by S, P(C_o, D_s)."
103
+
104
+ Throughout the paper the strategy (S) assumes that the opponent (D) is
105
+ playing a reactive strategy defined by these two conditional probabilities.
106
+
107
+ The strategy aims to maximise the long run utility against such a strategy
108
+ and the mechanism for this is described in Appendix A (more on this later).
109
+
110
+ One final point from the main text is, on page 372:
111
+
112
+ > "For the various lag-one matching strategies of O, the maximizing
113
+ strategies of S will be 100% C, or 100% D, or for some strategies all S
114
+ strategies will be functionaly equivalent."
115
+
116
+ This implies that the strategy S will either always cooperate or always
117
+ defect (or be indifferent) dependent on the opponent's defining
118
+ probabilities.
119
+
120
+ To understand the particular mechanism that describes the strategy S, we
121
+ refer to Appendix A of the paper on page 389.
122
+
123
+ The state goal of the strategy is to maximize (using the notation of the
124
+ paper):
125
+
126
+ EV_TOT = #CC(EV_CC) + #CD(EV_CD) + #DC(EV_DC) + #DD(EV_DD)
127
+
128
+ I.E. The player aims to maximise the expected value of being in each state
129
+ weighted by the number of times we expect to be in that state.
130
+
131
+ On the second page of the appendix, figure 4 (page 390) supposedly
132
+ identifies an expression for EV_TOT however it is not clear how some of the
133
+ steps are carried out. To the best guess, it seems like an asymptotic
134
+ argument is being used. Furthermore, a specific term is made to disappear in
135
+ the case of T - R = P - S (which is not the case for the standard
136
+ (R, P, S, T) = (3, 1, 0, 5)):
137
+
138
+ > "Where (t - r) = (p - s), EV_TOT will be a function of alpha, beta, t, r,
139
+ p, s and N are known and V which is unknown.
140
+
141
+ V is the total number of cooperations of the player S (this is noted earlier
142
+ in the abstract) and as such the final expression (with only V as unknown)
143
+ can be used to decide if V should indicate that S always cooperates or not.
144
+
145
+ Given the lack of usable details in this paper, the following interpretation
146
+ is used to implement this strategy:
147
+
148
+ 1. On any given turn, the strategy will estimate alpha = P(C_o | C_s) and
149
+ beta = P(C_o | D_s).
150
+ 2. The stragy will calculate the expected utility of always playing C OR
151
+ always playing D against the estimage probabilities. This corresponds to:
152
+
153
+ a. In the case of the player always cooperating:
154
+
155
+ P_CC = alpha and P_CD = 1 - alpha
156
+
157
+ b. In the case of the player always defecting:
158
+
159
+ P_DC = beta and P_DD = 1 - beta
160
+
161
+
162
+ Using this we have:
163
+
164
+ E_C = alpha R + (1 - alpha) S
165
+ E_D = beta T + (1 - beta) P
166
+
167
+ Thus at every turn, the strategy will calculate those two values and
168
+ cooperate if E_C > E_D and will defect if E_C < E_D.
169
+
170
+ In the case of E_C = E_D, the player will alternate from their previous
171
+ move. This is based on specific sentence from Axelrod's original paper:
172
+
173
+ > "Under certain circumstances, DOWNING will even determine that the best
174
+ > strategy is to alternate cooperation and defection."
175
+
176
+ One final important point is the early game behaviour of the strategy. It
177
+ has been noted that this strategy was implemented in a way that assumed that
178
+ alpha and beta were both 1/2:
179
+
180
+ > "Initially, they are both assumed to be .5, which amounts to the
181
+ > pessimistic assumption that the other player is not responsive."
182
+
183
+ Thus, the player opens with a defection in the first two rounds. Note that
184
+ from the Axelrod publications alone there is nothing to indicate defections
185
+ on the first two rounds, although a defection in the opening round is clear.
186
+ However there is a presentation available at
187
+ http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf
188
+ That clearly states that Downing defected in the first two rounds, thus this
189
+ is assumed to be the behaviour.
190
+
191
+ Note that response to the first round allows us to estimate
192
+ beta = P(C_o | D_s) and we will use the opening play of the player to
193
+ estimate alpha = P(C_o | C_s). This is an assumption with no clear
194
+ indication from the literature.
195
+
196
+ --
197
+ This strategy came 10th in Axelrod's original tournament.
113
198
114
199
Names:
115
200
116
201
- Revised Downing: [Axelrod1980]_
117
202
"""
118
203
119
- name = "Revised Downing"
204
+ name = "First tournament by Downing"
120
205
121
206
classifier = {
122
207
"memory_depth" : float ("inf" ),
123
208
"stochastic" : False ,
124
- "makes_use_of" : set () ,
209
+ "makes_use_of" : { "game" } ,
125
210
"long_run_time" : False ,
126
211
"inspects_source" : False ,
127
212
"manipulates_source" : False ,
128
213
"manipulates_state" : False ,
129
214
}
130
215
131
- def __init__ (self , revised : bool = True ) -> None :
216
+ def __init__ (self ) -> None :
132
217
super ().__init__ ()
133
- self .revised = revised
134
- self .good = 1.0
135
- self .bad = 0.0
136
- self .nice1 = 0
137
- self .nice2 = 0
138
- self .total_C = 0 # note the same as self.cooperations
139
- self .total_D = 0 # note the same as self.defections
218
+ self .number_opponent_cooperations_in_response_to_C = 0
219
+ self .number_opponent_cooperations_in_response_to_D = 0
140
220
141
221
def strategy (self , opponent : Player ) -> Action :
142
222
round_number = len (self .history ) + 1
143
- # According to internet sources, the original implementation defected
144
- # on the first two moves. Otherwise it wins (if this code is removed
145
- # and the comment restored.
146
- # http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf
147
-
148
- if self .revised :
149
- if round_number == 1 :
150
- return C
151
- elif not self .revised :
152
- if round_number <= 2 :
153
- return D
154
223
155
- # Update various counts
156
- if round_number > 2 :
157
- if self .history [- 1 ] == D :
158
- if opponent .history [- 1 ] == C :
159
- self .nice2 += 1
160
- self .total_D += 1
161
- self .bad = self .nice2 / self .total_D
162
- else :
163
- if opponent .history [- 1 ] == C :
164
- self .nice1 += 1
165
- self .total_C += 1
166
- self .good = self .nice1 / self .total_C
167
- # Make a decision based on the accrued counts
168
- c = 6.0 * self .good - 8.0 * self .bad - 2
169
- alt = 4.0 * self .good - 5.0 * self .bad - 1
170
- if c >= 0 and c >= alt :
171
- move = C
172
- elif (c >= 0 and c < alt ) or (alt >= 0 ):
173
- move = self .history [- 1 ].flip ()
174
- else :
175
- move = D
176
- return move
224
+ if round_number == 1 :
225
+ return D
226
+ if round_number == 2 :
227
+ if opponent .history [- 1 ] == C :
228
+ self .number_opponent_cooperations_in_response_to_C += 1
229
+ return D
230
+
231
+
232
+ if self .history [- 2 ] == C and opponent .history [- 1 ] == C :
233
+ self .number_opponent_cooperations_in_response_to_C += 1
234
+ if self .history [- 2 ] == D and opponent .history [- 1 ] == C :
235
+ self .number_opponent_cooperations_in_response_to_D += 1
236
+
237
+ alpha = (self .number_opponent_cooperations_in_response_to_C /
238
+ (self .cooperations + 1 )) # Adding 1 to count for opening move
239
+ beta = (self .number_opponent_cooperations_in_response_to_D /
240
+ (self .defections ))
241
+
242
+ R , P , S , T = self .match_attributes ["game" ].RPST ()
243
+ expected_value_of_cooperating = alpha * R + (1 - alpha ) * S
244
+ expected_value_of_defecting = beta * T + (1 - beta ) * P
245
+
246
+ if expected_value_of_cooperating > expected_value_of_defecting :
247
+ return C
248
+ if expected_value_of_cooperating < expected_value_of_defecting :
249
+ return D
250
+ return self .history [- 1 ].flip ()
177
251
178
252
179
253
class FirstByFeld (Player ):
@@ -278,8 +352,6 @@ class FirstByGraaskamp(Player):
278
352
so it plays Tit For Tat. If not it cooperates and randomly defects every 5
279
353
to 15 moves.
280
354
281
- # TODO Compare this to Fortran code.
282
-
283
355
Note that there is no information about 'Analogy' available thus Step 5 is
284
356
a "best possible" interpretation of the description in the paper.
285
357
0 commit comments