Prisoner's Dilemma Revisited

5 hours ago 1

Tit-for-Tat is not the best strategy

Stuart Ferguson

YouTuber Veritasium has an excellent video about prisoner’s dilemma.

Press enter or click to view image in full size

From https://www.youtube.com/@veritasium

I’ve written about this myself. We reported the same conclusion — that tit-for-tat is the best strategy for iterated prisoner’s dilemma. We conclude more generally that successful strategies will be: nice, retaliatory but also forgiving.

Problem is it’s not true.

I have long been fascinated by this result, and yet all of my attempts to replicate it have failed. I’m not claiming that Axelrod’s results are fake or that his program had bugs. The problem is that his tournament approach very much depends on the combination of strategies that are playing against each other. Axelrod basically picked a set of strategies at random — things that his friends thought might be good — and used those to seed his tournament.

That’s neither complete nor objective.

Completing the Set

My approach was to test tit-for-tat against all other possible similar strategies. Tit-for-tat decides how to play based entirely on the previous game. Here’s the truth-table for tit-for-tat, where 0 means cooperation and 1 means defection.

Truth-table for player’s next choice based on previous game

The next choice the strategy makes echos the last choice that the opponent made, but there are other possible truth-tables: 16 in total. We also need to know the first play the strategy makes since that’s not based on a previous game, which doubles the number to 32.

Here’s the complete description of tit-for-tat, which starts by cooperating, expressed as a 5-digit binary number.

Press enter or click to view image in full size

Tit-for-tat strategy as five-digit binary number

While there are 32 possible strategies from 00000 (always cooperate) to 11111 (always defect), it’s not clear that they are all unique. To classify strategies by what they do in practice I compute a signature. I play the strategy against the first three moves of an opponent and record its four choices. Here is what tit-for-tat looks like against an opponent that defects for the first three rounds (the fourth round doesn’t matter).

Four rounds of tit-for-tat as hexadecimal digit

These four bits are a single hexadecimal digit — 7 in this case. If we try all 8 possible openers we can record 8 such digits, and so the signature for tit-for-tat is 7654–3210.

All possible 4-round games for tit-for-tat

Here are all the signatures for all 32 possible strategies.

Press enter or click to view image in full size

Signatures for 16 strategies that start with cooperation

Press enter or click to view image in full size

Signatures for 16 strategies that start with defection

Four strategies are equivalent to always defect (0000–0000), and 4 are equivalent to always cooperate (ffff-ffff). Other than that they are all distinct, leaving us with 26 unique strategies.

Many of these are not particularly good, nor should we expect them to be. An interesting example are 01010 (5555–5555) and 11010 (aaaa-aaaa), which always play the opposite of what they played last. They alternate back and forth no matter what the opponent does, and the only difference is whether they start by defecting or cooperating. As we’ll see these end up right in the middle of the pack with identical scores.

Tournament Results

My tournament rules were the same as Axelrod’s: every strategy plays every strategy including itself for 200 rounds. Here are the scores.

Press enter or click to view image in full size

Scores for all 26 unique strategies

The winning strategy is 00111 (7777–3310). This might be described as permanent retaliation. It starts by cooperating and continues to cooperate against a cooperating opponent, but as soon as it sees the opponent defect once, it defects after that and continues to defect no matter what the opponent does later.

It’s tit-for-tat without the forgiveness.

The second place strategy is 11110 (abab-ddef), which always defects except in the case where it and its opponent both defected in the previous game. This is strangely better than always defecting when playing against this mix of other strategies, and it nets a lot of points from cooperation.

Third place is 11111 (ffff-ffff) — always defect. This scores highly because there are many strategies that cooperate even without getting cooperation in return, like the second place strategy above for example.

There are five better strategies before we get to tit-for-tat — 00011 (7654–3210) — in ninth place. It’s solidly in the middle of the top strategies, but it’s by no means best or even close to best.

The very worst strategy is 10001 (fecc-8888) which is essentially the opposite of the winning strategy. It starts by defecting but if the opponent ever cooperates then this strategy cooperates forever after that. If 00111 is a cynical cooperator, then 10001 is an ever-hopeful defector.

Testing Robustness

To see if I could find robust winners I removed the lowest-scoring strategy and computed new scores with the reduced pool, and repeated that until there were only 4 strategies remaining. The process is quite chaotic.

Press enter or click to view image in full size

Rankings of the dwindling pool during elimination

As rival strategies are eliminated, the ranking of other strategies rearrange themselves significantly. The only one that remains consistently at the top, the light teal line, is 00111.

The four winners were 00111 (7777–3310), 00011 (7654–3210), 00110 (5467–2310), and 00010 (5454–2210). All of these strategies retaliate to varying degrees, but always cooperate when faced with an opponent that always cooperates. This is indicated by the zero at the end of the signature. These are the only strategies with that signature, and they are the four that come out on top.

Conclusions

Pitting all 26 unique single-game strategies against each other (also true for the non-unique 32) contradicts the common wisdom about iterated prisoner’s dilemma. 00111 — the strategy I’ve called permanent retaliation — is either best or very near the top, and above the much lauded tit-for-tat. At least in this deterministic tournament there’s not much benefit to forgiveness.

Update Feb-2025: Source code is here

Read Entire Article