Free will and marble machines

High-level stuff can cause stuff!

Nov 29, 2025

Robert Sapolsky wrote a great book about why determinism is true, because the physics in your brain almost certainly causes everything you do. Unfortunately, the book says it’s about why free will doesn’t exist. Those aren’t the same thing, according to the compatibilists! My brain obeys the laws of physics.1 When I make a choice, the choice is entirely determined by those laws. But that is me doing the choosing; the system that constitutes me is doing a computation on possible alternatives.

To bastardize an example from Dan Dennett: imagine an enormous machine of marbles rolling down tracks, where you can put in any number of marbles at the front. For some marble amounts, the machine flashes a green light; for some, a red light; for most, a yellow light. All that happens are marbles bumping into one another, sliding from track to track, flipping little levers with their rolling motion. All of it is physical, and all of it is deterministic. At first, you have no idea what makes which lights turn on. I mean, you can see every step of the process—in a sense, you know everything that makes the light turn on. But you can’t identify a mapping between the number of input marbles and the output light that is any more compressible than the literal description of the entire machine.

Then you find a little slip of paper, with bits of propositional logic on it. Things like:

\(\begin{array} & p\to q & \text{p implies p} \\ p\lor q & \text{p or q} \\ \lnot p & \text{not p} \\ \dots & \dots \end{array}\)

and on the back, you see that each symbol in the logic is assigned a number. Slowly, you realize: the number of marbles you drop into the machine represents a formula of propositional logic. In all your testing, you find that when the marbles encode a valid statement, the machine lights up green. When the marbles encode an invalid statement, the machine lights up red. And when the marbles encode gibberish (something like pp¬∨→q∧, which means nothing), it flashes yellow.

I drop in, say, 24 marbles, which under this scheme encodes p → p, or “p implies p”, a true tautology. What is the cause of the machine to light up green? Clearly, there are many valid descriptions:

Is it that the marbles hit this slide, which bumps this switch, which … until the green light is triggered? Obviously, yes, that’s what’s happening on a mechanistic level.
Did I cause the machine to light up? Yes, because I was the one who dropped in the marbles, which triggered that response.
Did the designer of the machine cause it to light up? Yes, because if they had made different choices, the machine wouldn’t light up in response to 24 marbles.

All of these are descriptions of a physical history of events, which, if altered, would influence whether or not the machine would have lit up. But now I ask a slightly different question: did the fact that p → p is a valid statement in propositional logic cause the machine to light up?

On the one hand, the valuation of p → p is not a physical thing. How does it have causal power? If all the philosophers decided “hey, we’re changing what the arrow `→` represents in logic notation now, so p → p is no longer valid in propositional logic”, then the machine wouldn’t magically stop working.

On the other hand, that just means the machine is rigidly and not flexibly designating propositional logic. If I programmed a machine to detect electrons, and all the scientists decided to change the definition of “electron”, the machine would still detect the thing it was originally detecting. Similarly, this machine is designed such that it turns green only for marbles encoding valid statements. The machine is computing the validity of p → p. When I program a mergesort, what causes it to take O(n log n) time? Is it the electrons flowing around? Yes, in a sense. Is it the fact that the electrons encode items in a list, which cannot have a worst-case sort time of fewer than n log n operations? Yes! “But how can truths about algorithms have causal power, when all that’s happening is electrons flowing around?” Because the physical relationships encode these higher-level mathematical ones.

Similarly, when I choose to do something, all that’s happening on the physical level is the firing of neurons. But those neurons are the physical implementation of this higher-level choosing thing that I call me, just like the marbles are the physical representation of p → p, and the machine is how p → p being valid in propositional logic has causal power.

A chat with James Faville at EA Global really solidified this for me, when we were having an argument about functional decision theory. James’ justification for choosing for both you and your functional twin is that you are an algorithm, not an individual instance. When both bodies choose, it’s really only one entity that is doing the choosing.

…That got very mumbo-jumbo-y! I will probably have to rework a lot of these thoughts, especially the identity stuff.2 Germs of ideas, but I think they’ll be worth something once I put in the time. Thoughts, readers?

And metaphysics, if you’re not a physicalist. It’s clear that mental states have causal power, or else we wouldn’t expect to be able to talk about them!

A problem: I feel things that my clone does not! If they are hurt, I may not care. So which one is properly me? Or are they both me with different inputs, and it is the nature of that system that the different inputs cannot be co-conscious? Duuuude… no, I’m sorry, I will be more rigorous going forward. This is very messy & speculative work right now!

Tyler Seacrest

Nov 30

Love this post. Here's a musing for you: When people think about determinism, they think about (let's say) my decision to have eggs for breakfast at 8:03am this morning. They might say "Tyler thought he was making the decision at 8:03am, but actually the decision was pre-determined due to mathematics and the trajectories of atoms and things." So when was the decision actually made? "Well, outside of time," they might say. But why is 8:03am *after* the decision was made, if the decision was made completely outside of time? Indeed, maybe I am the mathematics, and in some sense every moment of my life exists outside of time, and indeed even 8:03am is outside of time in a broader sense. Thus the proposition the decision was made by me at 8:03am doesn't contradict what determinists say.

Ali Afroz

Nov 29Edited

I think it doesn’t make sense to talk about whether something is you as if it was a metaphysical question because at the end of the day, it’s a normative question about how to treat somebody after all you could insist that you five seconds from now is not the same person and there is nothing incorrect about that. Even if it’s very stupid according to normal human priorities. Even if your clone is exactly identical to you to the point of having exactly the same experiences, such that you can’t subjectively distinguish each other, it still makes sense for you to coherently care about your own well-being, but not care about the well-being of your clone, although in that case you know that if you decide to behave this way, so will the clone in which case this is self defeating because it would be better for the both of you if you agree to care about each other. To be clear, this is the counterfactual claim about what would happen. If you did something. There is no need to talk about controversial things like personal identity or whether you can cause your clone to behave in a certain way if that talk is more confusing, than simply pointing out a fact about counterfactual that might actually take place as opposed to being merely in somebody’s head. You know perfectly well, that if you behave a certain way, so will the clone regardless of whether you’re causing it or not, according to your definition of a cause.

Honestly, I think this is part of why it’s so adaptive for people to care about their future self because if they did not neither would their past self for their future self and while people can’t do functional decision theory, it’s simply the case that if you behave in this stupid way, reality will hit you hard and you’ll go extinct real fast if you ever happen to actually come into existence. Of course, this lodge becomes less applicable. The more you differ from your past self. It’s a bit similar to how in group selection between genes. There is an advantage to genes in helping copies spread, but not if the copies are sufficiently different. Yes, I am aware that an agent trying to maximise its utility is not necessarily similar to a gene trying to reproduce, but I think the gene in question is clearly a crude example of such an agent and a good example of how they behave as such pressures, even if there are complicating factors that are unrelated to the functional decision theory reasons.

2 replies by Jack Thompson and others

3 more comments...

Jack's Lab

Discussion about this post

Ready for more?