In my last couple posts on Functional Decision Theory, I found what I believed to be two critical flaws of FDT:
It evaluates what would have been ideal to precommit to from a hindsight, rather than a forethought, perspective.
It constructs hypotheticals based on modifying what FDT recommends a particular agent to do, not what the entire policy for action should be.
These are critical flaws, in that any decision theory that has them will not endorse itself. However, some more careful review of the original FDT paper suggests Yudkowsky & Soares prepared for this better than I thought. In the first two sections, I’ll detail how each of these flaws can be avoided. In the third, I’ll offer a little reflection on how I made those mistakes, my learning process as a new substack writer, and how I’m going to decrease the risk of making more mistakes in future.
1. Updateless FDT
In “A Theory of Forethought, not Hindsight”, I explained how if you pass up what turns out to be a winning lottery ticket, you might regret not buying the ticket in hindsight. But that doesn’t mean it was rational to have done that in ideal forethought: at the time, given all the information that you had, the expected value of buying the ticket was negative. In contrast, if you’re thrown into Newcomb’s problem, it still would have been advantageous to past-you to precommit to one-boxing in Newcomb problems.
I thought I broke FDT by showing that in a twin guard-inmate dilemma, hindsight reasoning will kick in: the guard reasons that FDT should favor guards, the inmate that FDT should favor inmates, so both defect. That would be suboptimal compared to forethought reasoning, which would see that if you have a 50% chance of becoming the guard and a 50% chance of becoming the inmate, the best thing to do is commit to cooperating. Mutual cooperation > Mutual defection ⇒ Forethought reasoning > Hindsight reasoning ⇒ Optimal decision theory > FDT. Quod erat demonstrandum.
But the LessWrong folks have thought about this. Their technical term for decision theories that evaluate from a forethought position is that such decision theories are updateless. I was at first reluctant to use such a term because it sounds unintelligent: shouldn’t you always update your thinking based on new information? In my defense, the legendary effective altruist philosopher Will MacAskill felt the same way in his critical review of FDT:
“The notion of expected utility for which FDT is supposed to do well (at least, according to me) is expected utility with respect to the prior for the decision problem under consideration.” If that’s correct, it’s striking that this criterion isn’t mentioned in the paper. But it also doesn’t seem compelling as a principle by which to evaluate between decision theories, nor does it seem FDT even does well by it. To see both points: suppose I’m choosing between an avocado sandwich and a hummus sandwich, and my prior was that I prefer avocado, but I’ve since tasted them both and gotten evidence that I prefer hummus. The choice that does best in terms of expected utility with respect to my prior for the decision problem under consideration is the avocado sandwich (and FDT, as I understood it in the paper, would agree). But, uncontroversially, I should choose the hummus sandwich, because I prefer hummus to avocado.1
But updateless strategies don’t ignore new information. What they really do is plan in advance what they ought to do if they encounter new information, rather than taking all the information that they’ve gained as a given and reasoning from there. So, for MacAskill’s scenario, an updateless theory things: “if I were to learn in the future that hummus tastes better than avocado, what should I then do?” If that doesn’t seem to be a substantial distinction, I’ll be writing more about it in future, but the hindsight/forethought article gets pretty close.
In any case, the common-parlance descriptions of what FDT does don’t make it clear whether FDT is updateless or not:
Functional decision theorists hold that the normative principle for action is to treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?”2
Functional Decision Theory says: In situation X, do what would cause the best result if Functional Decision Theory said to do that in situation X.In less weird words: FDT says to act however it would have been best to pre-commit to act. It's the game theory equivalent of a "code of honor".3
Functional Decision Theory is a decision theory described by Eliezer Yudkowsky and Nate Soares which says that agents should treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?”4
I don’t think these formulations are clear about what “the best outcome” means. If I am a guard in a twin guard-inmate dilemma, and I reason, “what action if FDT recommended it would yield the best outcome?”, it seems like strategies that favor guards would yield the best outcome! Similarly, if you look at the expected value equation that, according to Yudkowsky & Soares, CDT, EDT, and FDT obey:
The information that you learn during the scenario is x, the observation history. This equation suggests we are evaluating the probability that taking action a would, hypothetically, lead to outcome o, conditional on x. x appears outside of the hypothetical and in the probability term—that’s what the semicolon is for! That to me suggests there is updating going on: we are not considering what probability would be like if in future we were to observe x, but what probability is given that x is in fact the case. Again, a subtle distinction, but this notation plus the English-language descriptions of FDT is what led me to believe that FDT wasn’t truly updateless.
But when I was writing up a version of my post for LessWrong, I took a more careful look at another pair of equations—one for FDT, one for CDT—that appear later in the paper, that were meant to be derived from the expected-value calculation:
If you don’t follow the math, the important difference is that FDT is evaluating what would happen if FDT recommended an if-then statement: what would happen if FDT recommended a on input x? Whereas CDT is manipulating the action and the observation directly. (That’s the most charitable and commonsense read, at least.)
So, in short, the FDT-equation suggests FDT is updateless, while the expected-value equation suggests it is updateful. If I had spotted the discrepancy, I wouldn’t have made the strong assertion that FDT definitely uses hindsight (updateful) rather than forethought (updateless) reasoning. Still, the discrepancy makes me wonder if it’s simply a problem of confusing notation, or if updateless and updateful decision theories actually need different expected value calculations, not just their recommendation calculations! More on that in future posts.
2. Policy FDT
Even if FDT uses forethought reasoning, FDT as written in the paper fails on the guard-inmate dilemma. Suppose you’re an inmate in the twin guard-inmate dilemma. The updateless formulation of FDT reasons, “if I could have made any precommitment about what I should do upon learning that I am an inmate, which would have been the best precommitment for me at the time?” FDT reasons: if I were going to be a inmate, it would always be to my advantage as an inmate to defect, since what guards do is independent of what inmates do.
The problem is that reasoning separately about what guards and inmates should do leads to mutual defection, which is lower in forethought-expected value than mutual cooperation. A forethought theory that considers both guards and inmates as under its precommitment control will see that committing to the rule “no matter what role I get, cooperate” leads to a better outcome than separate chains of reasoning. And we saw in the FDT equation, that FDT operates on one single recommendation at a time:
Nowhere in the paper is this critical flaw addressed…
…except in a gosh-darned footnote.
User Menotim on LessWrong was more perceptive than I, and found a quote from Yudkowsky & Soares that I missed:
In the authors’ preferred formalization of FDT, agents actually iterate over policies (mappings from observations to actions) rather than actions. This makes a difference in certain multi-agent dilemmas, but will not make a difference in this paper.5
The frustrating thing is that I read the footnotes for this paper! But I read them early on in the process of understanding FDT and so I don’t think that remark really made an impression on me. If you haven’t thought about how FDT would perform in asymmetric games, it doesn’t really occur to you why it should be important that FDT operate on policies rather than individual recommendations. So that piece of information was lost before I started writing.
I think it’s harder to formalize intervening on a policy rather than a specific recommendation, but not that much harder. Modeling subjunctive dependence relations is the hard part anyway, so if you’ve got that down it wouldn’t be difficult to move up one level of abstraction. I’ll still be working on decision theory problems, so if I discover a real difference I’ll write about it, but it seems I was addressing a weaker form of FDT than I thought.
3. Learning from my mistakes
The road to wisdom? Well, it's plain
and simple to express:
Err
and err
and err again
but less
and less
and less.
– Piet Hein (also, the LessWrong motto)6
Everything I wrote was an excellent argument against what I’d call naïve FDT: the hindsight-reasoning, action-focused theory suggested by a lot of the informal language around the theory and the expected value formula given by Yudkowsky & Soares. But it turns out that the authors were thinking more about what I’d call advanced FDT, which is updateless and intervenes on policies, not actions. They did not emphasize this distinction in their paper, and I think they would escape the notice of the average reader. However, if one had looked more carefully for these very issues when I was writing my articles, a sufficiently careful reader would have caught them, and that’s really the standard one should be held to if you’re going to write an article directly arguing against a position.
On the one hand, it’s kind of Yudkowsky & Soares’ whole ethos to try to construct arguments based on imperfect understanding, let them come into contact with a test, and rebuild when they crumble. I was trying to follow the “fail fast” advice of learning: rather than trying to make your work perfect, focus on getting it out there so that you can learn from your mistakes faster. Bentham’s Bulldog, my great inspiration for writing, recommends similarly: make interesting arguments and just write the things you would say in conversation to someone. You’ll suck and be wrong at first, but then you’ll get better.
On the other hand, I don’t feel great about missing some important distinctions when I publish an article that is going to be read by people unfamiliar with decision theory. I worry about giving people the wrong idea more on substack than I would on a forum where back-and-forth argument is more of the norm. I don’t think “fail fast” is as great of a motto when your product is also being viewed by non-experts who are trusting you to get the details right. While I have too small of an audience right now to really worry about losing credibility, I love what I work on too much not to care about getting the details right.
So! From now on I’ll be taking a few more steps to catch mistakes before misleading anyone: - Focus on publishing about arguments as I’m developing them, rather than banging out complete essays in a day - Publishing on LessWrong and technical forums first for feedback from experts - Improving my manual and automated checks for counterarguments in papers - and of course, doing more reading :)
There is an equilibrium between “perfect is the enemy of the good” and “measure twice, cut once” in research. I hope you’ll bear with me as I figure it out.
Macaskill, William. “A Critique of Functional Decision Theory,” September 13, 2019. https://www.lesswrong.com/posts/ySLYSsNeFL5CoAQzN/a-critique-of-functional-decision-theory.
Yudkowsky, Eliezer, and Nate Soares. “Functional Decision Theory: A New Theory of Instrumental Rationality.” arXiv, May 22, 2018. https://doi.org/10.48550/arXiv.1710.05060.
Case, Nicky. “What’s Nicky Learning? Decision Theory, Ottawa, Existential Risk.” Patreon, December 6, 2022. https://www.patreon.com/posts/whats-nicky-risk-63289449.
Ravid, Yoav et al. “Functional Decision Theory.” LessWrong, March 21, 2025. https://www.lesswrong.com/w/functional-decision-theory.
Yudkowsky & Soares 11.
Ruby, Raemon, RobertM, and habryka. “Welcome to LessWrong!,” June 14, 2019. https://www.lesswrong.com/posts/bJ2haLkcGeLtTWaD5/welcome-to-lesswrong.
Disagree with the advice at the end! I think that your bar for publishing should be pretty low. Being interesting is generally more important than always being right and you’ll never publish anything if your bar is too high