Skip to content

Commit eccd7e2

Browse files
committed
Re-implement Downing.
This is a complete rewrite of the Downing strategy. To be able to do this I've used the description in Downing's 1975 paper. This description itself is not sufficiently clear and so I've had to make some further assumptions which I've clearly documented. Note: there was documentation claiming that there was a bug in the implementation in the original tournament. I believe this was a mistake due to a misinterpretation of one online set of slides where they commented that there was a mistake in the implementation. This however was not a bug and was actually described quite a lot in Axelrod's original tournament: the strategy was implemented to act a particular way in the first two rounds and this had the result of making the strategy a king maker. This however was not a bug, just a particular interpretation of the overall decision rule described in Downing's 1975 paper.
1 parent 0dde6e2 commit eccd7e2

File tree

5 files changed

+154
-88
lines changed

5 files changed

+154
-88
lines changed

axelrod/strategies/_strategies.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
FirstByGrofman,
1313
FirstByJoss,
1414
FirstByNydegger,
15-
RevisedDowning,
15+
FirstByDowning,
1616
FirstByShubik,
1717
FirstBySteinAndRapoport,
1818
FirstByTidemanAndChieruzzi,
@@ -397,7 +397,7 @@
397397
Retaliate,
398398
Retaliate2,
399399
Retaliate3,
400-
RevisedDowning,
400+
FirstByDowning,
401401
SecondByRichardHufford,
402402
Ripoff,
403403
RiskyQLearner,

axelrod/strategies/axelrod_first.py

Lines changed: 140 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -73,107 +73,181 @@ def strategy(self, opponent: Player) -> Action:
7373
return D
7474
return C
7575

76-
# TODO Split this in to ttwo strategies, it's not clear to me from the internet
77-
# sources that the first implentation was buggy as opposed to just "poorly
78-
# thought out". The flaw is actually clearly described in the paper's
79-
# description: "Initially, they are both assumed to be .5, which amounts to the
80-
# pessimistic assumption that the other player is not responsive"
81-
# The revised version should be put in it's own module.
82-
# I also do not understand where the decision rules come from.
83-
# Need to read https://journals.sagepub.com/doi/10.1177/003755007500600402 to
84-
# gain understanding of decision rule.
85-
class RevisedDowning(Player):
76+
class FirstByDowning(Player):
8677
"""
8778
Submitted to Axelrod's first tournament by Downing
8879
8980
The description written in [Axelrod1980]_ is:
9081
91-
> "This rule selects its choice to maximize its own long- term expected payoff on
82+
> "This rule selects its choice to maximize its own longterm expected payoff on
9283
> the assumption that the other rule cooperates with a fixed probability which
9384
> depends only on whether the other player cooperated or defected on the previous
94-
> move. These two probabilities estimates are con- tinuously updated as the game
85+
> move. These two probabilities estimates are continuously updated as the game
9586
> progresses. Initially, they are both assumed to be .5, which amounts to the
9687
> pessimistic assumption that the other player is not responsive. This rule is
9788
> based on an outcome maximization interpretation of human performances proposed
9889
> by Downing (1975)."
9990
100-
This strategy attempts to estimate the next move of the opponent by estimating
101-
the probability of cooperating given that they defected (:math:`p(C|D)`) or
102-
cooperated on the previous round (:math:`p(C|C)`). These probabilities are
103-
continuously updated during play and the strategy attempts to maximise the long
104-
term play. Note that the initial values are :math:`p(C|C)=p(C|D)=.5`.
91+
The Downing (1975) paper is "The Prisoner's Dilemma Game as a
92+
Problem-Solving Phenomenon" [Downing1975]_ and this is used to implement the
93+
strategy.
10594
106-
# TODO: This paragraph is not correct (see note above)
107-
Downing is implemented as `RevisedDowning`. Apparently in the first tournament
108-
the strategy was implemented incorrectly and defected on the first two rounds.
109-
This can be controlled by setting `revised=True` to prevent the initial defections.
95+
There are a number of specific points in this paper, on page 371:
11096
111-
This strategy came 10th in Axelrod's original tournament but would have won
112-
if it had been implemented correctly.
97+
> "[...] In these strategies, O's [the opponent's] response on trial N is in
98+
some way dependent or contingent on S's [the subject's] response on trial N-
99+
1. All varieties of these lag-one matching strategies can be defined by two
100+
parameters: the conditional probability that O will choose C folloging C by
101+
S, P(C_o | C_s) and the conditional probability that O will choose C
102+
following D by S, P(C_o, D_s)."
103+
104+
Throughout the paper the strategy (S) assumes that the opponent (D) is
105+
playing a reactive strategy defined by these two conditional probabilities.
106+
107+
The strategy aims to maximise the long run utility against such a strategy
108+
and the mechanism for this is described in Appendix A (more on this later).
109+
110+
One final point from the main text is, on page 372:
111+
112+
> "For the various lag-one matching strategies of O, the maximizing
113+
strategies of S will be 100% C, or 100% D, or for some strategies all S
114+
strategies will be functionaly equivalent."
115+
116+
This implies that the strategy S will either always cooperate or always
117+
defect (or be indifferent) dependent on the opponent's defining
118+
probabilities.
119+
120+
To understand the particular mechanism that describes the strategy S, we
121+
refer to Appendix A of the paper on page 389.
122+
123+
The state goal of the strategy is to maximize (using the notation of the
124+
paper):
125+
126+
EV_TOT = #CC(EV_CC) + #CD(EV_CD) + #DC(EV_DC) + #DD(EV_DD)
127+
128+
I.E. The player aims to maximise the expected value of being in each state
129+
weighted by the number of times we expect to be in that state.
130+
131+
On the second page of the appendix, figure 4 (page 390) supposedly
132+
identifies an expression for EV_TOT however it is not clear how some of the
133+
steps are carried out. To the best guess, it seems like an asymptotic
134+
argument is being used. Furthermore, a specific term is made to disappear in
135+
the case of T - R = P - S (which is not the case for the standard
136+
(R, P, S, T) = (3, 1, 0, 5)):
137+
138+
> "Where (t - r) = (p - s), EV_TOT will be a function of alpha, beta, t, r,
139+
p, s and N are known and V which is unknown.
140+
141+
V is the total number of cooperations of the player S (this is noted earlier
142+
in the abstract) and as such the final expression (with only V as unknown)
143+
can be used to decide if V should indicate that S always cooperates or not.
144+
145+
Given the lack of usable details in this paper, the following interpretation
146+
is used to implement this strategy:
147+
148+
1. On any given turn, the strategy will estimate alpha = P(C_o | C_s) and
149+
beta = P(C_o | D_s).
150+
2. The stragy will calculate the expected utility of always playing C OR
151+
always playing D against the estimage probabilities. This corresponds to:
152+
153+
a. In the case of the player always cooperating:
154+
155+
P_CC = alpha and P_CD = 1 - alpha
156+
157+
b. In the case of the player always defecting:
158+
159+
P_DC = beta and P_DD = 1 - beta
160+
161+
162+
Using this we have:
163+
164+
E_C = alpha R + (1 - alpha) S
165+
E_D = beta T + (1 - beta) P
166+
167+
Thus at every turn, the strategy will calculate those two values and
168+
cooperate if E_C > E_D and will defect if E_C < E_D.
169+
170+
In the case of E_C = E_D, the player will alternate from their previous
171+
move. This is based on specific sentence from Axelrod's original paper:
172+
173+
> "Under certain circumstances, DOWNING will even determine that the best
174+
> strategy is to alternate cooperation and defection."
175+
176+
One final important point is the early game behaviour of the strategy. It
177+
has been noted that this strategy was implemented in a way that assumed that
178+
alpha and beta were both 1/2:
179+
180+
> "Initially, they are both assumed to be .5, which amounts to the
181+
> pessimistic assumption that the other player is not responsive."
182+
183+
Thus, the player opens with a defection in the first two rounds. Note that
184+
from the Axelrod publications alone there is nothing to indicate defections
185+
on the first two rounds, although a defection in the opening round is clear.
186+
However there is a presentation available at
187+
http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf
188+
That clearly states that Downing defected in the first two rounds, thus this
189+
is assumed to be the behaviour.
190+
191+
Note that response to the first round allows us to estimate
192+
beta = P(C_o | D_s) and we will use the opening play of the player to
193+
estimate alpha = P(C_o | C_s). This is an assumption with no clear
194+
indication from the literature.
195+
196+
--
197+
This strategy came 10th in Axelrod's original tournament.
113198
114199
Names:
115200
116201
- Revised Downing: [Axelrod1980]_
117202
"""
118203

119-
name = "Revised Downing"
204+
name = "First tournament by Downing"
120205

121206
classifier = {
122207
"memory_depth": float("inf"),
123208
"stochastic": False,
124-
"makes_use_of": set(),
209+
"makes_use_of": {"game"},
125210
"long_run_time": False,
126211
"inspects_source": False,
127212
"manipulates_source": False,
128213
"manipulates_state": False,
129214
}
130215

131-
def __init__(self, revised: bool = True) -> None:
216+
def __init__(self) -> None:
132217
super().__init__()
133-
self.revised = revised
134-
self.good = 1.0
135-
self.bad = 0.0
136-
self.nice1 = 0
137-
self.nice2 = 0
138-
self.total_C = 0 # note the same as self.cooperations
139-
self.total_D = 0 # note the same as self.defections
218+
self.number_opponent_cooperations_in_response_to_C = 0
219+
self.number_opponent_cooperations_in_response_to_D = 0
140220

141221
def strategy(self, opponent: Player) -> Action:
142222
round_number = len(self.history) + 1
143-
# According to internet sources, the original implementation defected
144-
# on the first two moves. Otherwise it wins (if this code is removed
145-
# and the comment restored.
146-
# http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf
147-
148-
if self.revised:
149-
if round_number == 1:
150-
return C
151-
elif not self.revised:
152-
if round_number <= 2:
153-
return D
154223

155-
# Update various counts
156-
if round_number > 2:
157-
if self.history[-1] == D:
158-
if opponent.history[-1] == C:
159-
self.nice2 += 1
160-
self.total_D += 1
161-
self.bad = self.nice2 / self.total_D
162-
else:
163-
if opponent.history[-1] == C:
164-
self.nice1 += 1
165-
self.total_C += 1
166-
self.good = self.nice1 / self.total_C
167-
# Make a decision based on the accrued counts
168-
c = 6.0 * self.good - 8.0 * self.bad - 2
169-
alt = 4.0 * self.good - 5.0 * self.bad - 1
170-
if c >= 0 and c >= alt:
171-
move = C
172-
elif (c >= 0 and c < alt) or (alt >= 0):
173-
move = self.history[-1].flip()
174-
else:
175-
move = D
176-
return move
224+
if round_number == 1:
225+
return D
226+
if round_number == 2:
227+
if opponent.history[-1] == C:
228+
self.number_opponent_cooperations_in_response_to_C += 1
229+
return D
230+
231+
232+
if self.history[-2] == C and opponent.history[-1] == C:
233+
self.number_opponent_cooperations_in_response_to_C += 1
234+
if self.history[-2] == D and opponent.history[-1] == C:
235+
self.number_opponent_cooperations_in_response_to_D += 1
236+
237+
alpha = (self.number_opponent_cooperations_in_response_to_C /
238+
(self.cooperations + 1)) # Adding 1 to count for opening move
239+
beta = (self.number_opponent_cooperations_in_response_to_D /
240+
(self.defections))
241+
242+
R, P, S, T = self.match_attributes["game"].RPST()
243+
expected_value_of_cooperating = alpha * R + (1 - alpha) * S
244+
expected_value_of_defecting = beta * T + (1 - beta) * P
245+
246+
if expected_value_of_cooperating > expected_value_of_defecting:
247+
return C
248+
if expected_value_of_cooperating < expected_value_of_defecting:
249+
return D
250+
return self.history[-1].flip()
177251

178252

179253
class FirstByFeld(Player):
@@ -278,8 +352,6 @@ class FirstByGraaskamp(Player):
278352
so it plays Tit For Tat. If not it cooperates and randomly defects every 5
279353
to 15 moves.
280354
281-
# TODO Compare this to Fortran code.
282-
283355
Note that there is no information about 'Analogy' available thus Step 5 is
284356
a "best possible" interpretation of the description in the paper.
285357

axelrod/tests/strategies/test_axelrod_first.py

Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -43,50 +43,43 @@ def test_strategy(self):
4343
self.versus_test(opponent, expected_actions=actions)
4444

4545

46-
class TestRevisedDowning(TestPlayer):
46+
class TestFirstByDowning(TestPlayer):
4747

48-
name = "Revised Downing: True"
49-
player = axelrod.RevisedDowning
48+
name = "First tournament by Downing"
49+
player = axelrod.FirstByDowning
5050
expected_classifier = {
5151
"memory_depth": float("inf"),
5252
"stochastic": False,
53-
"makes_use_of": set(),
53+
"makes_use_of": {"game"},
5454
"long_run_time": False,
5555
"inspects_source": False,
5656
"manipulates_source": False,
5757
"manipulates_state": False,
5858
}
5959

6060
def test_strategy(self):
61-
actions = [(C, C), (C, C), (C, C)]
61+
actions = [(D, C), (D, C), (C, C)]
6262
self.versus_test(axelrod.Cooperator(), expected_actions=actions)
6363

64-
actions = [(C, D), (C, D), (D, D)]
64+
actions = [(D, D), (D, D), (D, D)]
6565
self.versus_test(axelrod.Defector(), expected_actions=actions)
6666

6767
opponent = axelrod.MockPlayer(actions=[D, C, C])
68-
actions = [(C, D), (C, C), (C, C), (C, D)]
68+
actions = [(D, D), (D, C), (D, C), (D, D)]
6969
self.versus_test(opponent, expected_actions=actions)
7070

7171
opponent = axelrod.MockPlayer(actions=[D, D, C])
72-
actions = [(C, D), (C, D), (D, C), (D, D)]
72+
actions = [(D, D), (D, D), (D, C), (D, D)]
7373
self.versus_test(opponent, expected_actions=actions)
7474

7575
opponent = axelrod.MockPlayer(actions=[C, C, D, D, C, C])
76-
actions = [(C, C), (C, C), (C, D), (C, D), (D, C), (D, C), (D, C)]
76+
actions = [(D, C), (D, C), (C, D), (D, D), (D, C), (D, C), (D, C)]
7777
self.versus_test(opponent, expected_actions=actions)
7878

7979
opponent = axelrod.MockPlayer(actions=[C, C, C, C, D, D])
80-
actions = [(C, C), (C, C), (C, C), (C, C), (C, D), (C, D), (C, C)]
80+
actions = [(D, C), (D, C), (C, C), (D, C), (D, D), (C, D), (D, C)]
8181
self.versus_test(opponent, expected_actions=actions)
8282

83-
def test_not_revised(self):
84-
# Test not revised
85-
player = self.player(revised=False)
86-
opponent = axelrod.Cooperator()
87-
match = axelrod.Match((player, opponent), turns=2)
88-
self.assertEqual(match.play(), [(D, C), (D, C)])
89-
9083

9184
class TestFristByFeld(TestPlayer):
9285

docs/reference/bibliography.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ documentation.
2525
.. [Bendor1993] Bendor, Jonathan. "Uncertainty and the Evolution of Cooperation." The Journal of Conflict Resolution, 37(4), 709–734.
2626
.. [Beaufils1997] Beaufils, B. and Delahaye, J. (1997). Our Meeting With Gradual: A Good Strategy For The Iterated Prisoner’s Dilemma. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.4041
2727
.. [Berg2015] Berg, P. Van Den, & Weissing, F. J. (2015). The importance of mechanisms for the evolution of cooperation. Proceedings of the Royal Society B-Biological Sciences, 282.
28+
.. [Downing1975] Downing, Leslie L. "The Prisoner's Dilemma game as a problem-solving phenomenon: An outcome maximization interpretation." Simulation & Games 6.4 (1975): 366-391.
2829
.. [Eckhart2015] Eckhart Arnold (2016) CoopSim v0.9.9 beta 6. https://github.com/jecki/CoopSim/
2930
.. [Frean1994] Frean, Marcus R. "The Prisoner's Dilemma without Synchrony." Proceedings: Biological Sciences, vol. 257, no. 1348, 1994, pp. 75–79. www.jstor.org/stable/50253.
3031
.. [Harper2017] Harper, M., Knight, V., Jones, M., Koutsovoulos, G., Glynatsi, N. E., & Campbell, O. (2017) Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma. PloS one. https://doi.org/10.1371/journal.pone.0188046

docs/reference/overview_of_strategies.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ An indication is given as to whether or not this strategy is implemented in the
2525
"Grudger", "James W Friedman", ":class:`Grudger <axelrod.strategies.grudger.Grudger>`"
2626
"Davis", "Morton Davis", ":class:`Davis <axelrod.strategies.axelrod_first.FirstByDavis>`"
2727
"Graaskamp", "Jim Graaskamp", ":class:`Graaskamp <axelrod.strategies.axelrod_first.FirstByGraaskamp>`"
28-
"Downing", "Leslie Downing", ":class:`RevisedDowning <axelrod.strategies.axelrod_first.RevisedDowning>`"
28+
"FirstByDowning", "Leslie Downing", ":class:`RevisedDowning <axelrod.strategies.axelrod_first.FirstByDowning>`"
2929
"Feld", "Scott Feld", ":class:`Feld <axelrod.strategies.axelrod_first.FirstByFeld>`"
3030
"Joss", "Johann Joss", ":class:`Joss <axelrod.strategies.axelrod_first.FirstByJoss>`"
3131
"Tullock", "Gordon Tullock", ":class:`Tullock <axelrod.strategies.axelrod_first.FirstByTullock>`"

0 commit comments

Comments
 (0)