Did Erik Hoel just disprove LLM consciousness?
Nope.
Neuroscientist Erik Hoel, who did his PhD under the creator of the integrated information theory (IIT) of consciousness, recently published a preprint called “A Disproof of Large Language Model Consciousness: The Necessity of Continual Learning for Consciousness” (here is his substack post discussing it). It builds on an earlier paper he wrote with Johannes Kliener, “Falsification and Consciousness.”
As Hoel claims in his substack post, this is meant to be a literal a priori proof, not an argument about how probable or improbable LLM consciousness is.
In this article, I’ll show how Hoel only accomplishes his “disproof” by assuming that two major theoretical approaches to consciousness—illusionism and interactionist dualism—must be false. In the next article, I’ll critique Hoel’s method for attacking LLM consciousness without (he claims) ruling out human consciousness.
I. Falsification and consciousness
In “Falsification and Consciousness,” we begin with Kleiner & Hoel’s (K&H’s) paradigm example of a standard scientific theory. Suppose Alice theorizes that the temperature of a gas is just the average kinetic energy of its particles. To test her theory, Alice uses two measurement devices. The first is a spectroscope, which outputs data on average kinetic energy, say by a little screen that reads “3.7 kJ/mol” or something. The second is a thermometer, which outputs data on temperature, say by the level of mercury in the glass tube. From the spectroscope data, Alice makes an prediction: 3.7 kJ/mol of kinetic energy indicates a temperature between 95° and 105° C. Now, Alice reads the thermometer: since the mercury has risen to the little line that reads “100° C”, that means the temperature is 100° C. If the prediction and temperature mismatch, that falsifies Alice’s theory.
K&H describe this setup with the diagram P → O ← T, where P is the set of possible microphysical states (each state including particle position, kinetic energy, etc.), T is the set of possible temperatures, and O is the set of possible data. The first arrow represents measuring P (the spectroscope reading) and the second arrow represents measuring T (the temperature reading).
Now, in the case of consciousness: Bob’s theory predicts an experience of red if a certain functional representation is active. So the first measurement might consist of a brain scan or state-machine diagram. But, according to H&K, the second measurement is where we encounter difficulty. You can measure temperature with a thermometer, but you can only “infer” consciousness, via physical data like self-reports. So Bob must also make an inference: if the subject says “I am seeing red”, he infers (but does not know) that they are having an experience of red.
The diagram here is P → O ⇉ E, where E is the set of possible experiences. The first arrow is the observation of physical quantities (functional representation & self-report), and the two second arrows are the prediction and inference. According to K&H, this is a fundamental structural difference between studies of consciousness and temperature, and this is where their analysis takes off.
But let’s push on this structure a little harder. In the same way one can say to an illusionist:
Objection I: No, no, your theories only describe whether some functional state causes people to say words like ‘I am having an experience of red’. I want to know if they’re actually having an experience of red!
What, precisely, is stopping one from saying:
Objection II: No, no, your theories only describe whether some average kinetic energy pushes mercury up to the level that says ‘75° C.’ I want to know if the temperature is actually 75° C!
Objection II, I hope we can all agree, is quite silly. First, “temperature” is just a name for whatever occupies the causal role of expanding mercury, triggering phase transitions, burning paper, etc.1 If your experiments reveal that average kinetic energy causes all the things the temperature role does, you’re done; no further explanation needed. Objection II only holds if you assume:
“Temperature” refers to something above and beyond the observable causal role of what expands mercury, burns paper, etc.
Observable physics is causally closed with respect to temperature; it causes nothing, such that every observation of mercury expansion and burning paper could only be caused by other microphysical events, not temperature itself.
Therefore, objection I only holds if you assume:
“Experience” refers to something above and beyond the observable causal role of what triggers self-reports, incorporates sensory data into the brain, etc.
Observable physics is causally closed with respect to experience; it causes nothing, such that every observation of self-reports and stored sensory data could only be caused by other microphysical events, not experience itself.
If you reject premise 1, you are a type-A materialist;2 this is what illusionists are. If you reject premise 2, you are a type-D dualist; this is what interactionist dualists are. On the type-A view, the study of consciousness is no different from the study of any other physical phenomenon. On the type-D view, consciousness has directly observable physical effects: the mental does stuff which can be measured. Either view denies K&H’s assumptions, and therefore are not subject to their conclusions.
So now we can turn the pressure back on K&H. I can understand doubting that consciousness just is a causal role. I can understand doubting that mental causation exists. And I can understand the desire for a materially falsifiable theory of consciousness. What K&H demonstrate is that if you deny consciousness as a causal role and deny mental causation and insist a materially falsifiable theory of consciousness should exist, you’re going to get stuck in a serious theoretical bind. And they mention in their discussion that type-D dualism is a possible escape hatch. I think it’s great that someone formalized this.
But then the question is why you are justified in taking those three assumptions as your starting point. If mental stuff with no causal effects exists, how could you possibly know that? Why would you take it as obvious that such stuff exists? Why would you take it as obvious that it has no causal effects? And why would you expect it to be subject to empirical testing in the first place? There are philosophical responses to these questions—I know Chalmers has worked on them—but K&H don’t cite any of them, nor offer any explanation of their own. A meta-theoretic approach, which is meant to rule out entire classes of theorizing, should make a stronger argument for its assumptions.
Their best defense is that apparently most scientific research into consciousness does make these three assumptions, and are therefore theoretically suspect. Most of my reading has been in computational functionalism, so I can’t confirm or deny this assessment, but if true then K&H have probably done a great public service publishing this argument in a neuroscience journal. I just wish they were clearer about the alternatives up front.
II. The “disproof” of LLM consciousness
Like the last paper, it only works if you assume type-A materialism and type-D dualism are false. And maybe not even then.
First, Hoel declares that illusionist theories are necessarily “trivial.”
…when discussing Global Neuronal Workspace theory (GNWT), Daniel Dennett once wrote that: “… theorists must resist the temptation to see global accessibility as the cause of consciousness (as if consciousness were some other, further condition); rather, it is consciousness.” If Dennett’s ideas were true, this would make GNWT a trivial theory of consciousness, because the predictions and inferences strictly depend on the same source. (emphasis mine)
Here, I assume that trivial theories of consciousness must be false … since otherwise, there is no scientifically informative theory of consciousness. (emphasis mine)
But this is a philosophical assumption masquerading as scientific. Imagine it’s the 1800s, spirit mediums are all the rage, and there is earnest scientific inquiry into the spirit world. Everyone has different theories about how ghosts come into being, carefully checking their predictions based on cause of death, location of burial, and phase of the moon against the reporting of mediums. Except one group of scientists—we’ll call them the Ghostbusters—have a theory: ghosts don’t actually exist. All that occurs is mediums reporting that they feel or don’t feel a ghostly presence based on causal psychological mechanisms. This seems to explain much of the supposed evidence for the spirit world so far. But the other spirit-scientists reject ‘Ghostbusterism’ by assumption. Why? Because otherwise, there is no scientifically informative theory of ghosts!
Ghostbusters would reply that Ghostbusterism is plenty scientific. First, it makes falsifiable predictions about in what circumstances people will report a spirit’s presence, what they will describe that spirit’s presence as being like, etc. Second, it makes the negative prediction that there is no causal path from the spirit world to the material world, which could be falsified if ghosts are real: show a case of object levitation, of ESP-like prediction, etc. which cannot be explained by material causation. What’s more scientific than that?
“No, no,” the spirit-scientists reply. “Ghosts don’t have any causal efficacy; every indicator we have of their existence is a product of material causation. But we also for sure know that they exist, and for sure know that they aren’t material beings. Your self-report theories explain nothing, and the causal interaction theories are ludicrous sorcery.” Well, if you make all of those assumptions, of course you’re going to run into scientific difficulties!
And there’s a parallel case with consciousness. On the one hand, type-A materialists argue that every report and psychological representation of consciousness can be explained by material factors, and make falsifiable predictions about when those reports & representations will occur. On the other hand, type-D dualists argue that the mind does measurable stuff in the material world. When future neuroscience grows powerful enough, they would be able to agree to a series of empirical tests to decide between those two theories, based on whether there is evidence for mental causation. It’s only when you insist that consciousness is more than just its material causal role and that it has no causal efficacy that you run into problems with making a falsifiable theory.
Second, Hoel assumes that there is no mental-physical interaction going on for LLMs: “empirical inferences about consciousness for entities like LLMs are derived from I/O function.” But if you are a type-D dualist, this is ignoring a major source of data! If type-D dualism is true, then the strongest evidence that a mind and brain are interacting would be changes in the physical substrate—neurons firing, ion concentrations shifting—which cannot be explained by a non-mental physics; that’s the mind at work. If you are a type-D dualist, you cannot assume that an LLM will continue to operate under material physics just because the individual parts—transistors, wires, circuit boards—look materially deterministic; the same could be said of neurons and the physical brain!
Therefore, the strongest evidence that a mind and an LLM are interacting would be changes in the physical substrate—bits flipping, electrons rerouting—which cannot be explained by a non-mental physics. More about this view here:
Computers will have souls
I spend a lot of time arguing that LLMs could be thinking, either in a psychological sense (performing all the cognitive functions of thinking) or in a phenomenal sense (having subjective experiences of thinking).
So far, Hoel claims to “disprove” LLM consciousness… by ignoring two major theoretical approaches for consciousness by assumption. Not great. Type-A materialists and type-D dualists, you can probably stop reading here. Type-B materialists and epiphenomenalists, we now move on to the second half of the problem: how does Hoel manage a “disproof” which applies to LLMs and not humans? This’ll be covered in next week’s post.
III. The stakes are high
Why does this matter in the first place? Well, why did Hoel write this paper in the first place? Among other reasons:
[T]he question of whether or not contemporary LLMs (like ChatGPT or Claude or Gemini) are conscious has become suddenly critical. There are major risks associated with getting this question wrong. Assigning consciousness where there is none has a myriad of risks, which include increasing the risk of AI psychosis, overestimation of LLM capabilities, inappropriate practices or regulation, and misleading scientific beliefs about human consciousness. On the other hand, if contemporary LLMs were conscious they could be considered moral patients …
Here I agree with Hoel: there are major risks associated with getting this question wrong! That’s why I think we shouldn’t dismiss entire categories of consciousness theories out of hand, and we definitely shouldn’t go around calling our papers “A Disproof of Large Language Model Consciousness” without being very explicit about what theories we are dismissing.
Erik Hoel is not some random on the internet. He’s a real neuroscientist who is currently #15 in the substack science category. Scott Alexander subscribes to him. And he is founding his own nonprofit lab to study consciousness which is explicitly based on this formalism. Now, I think this is a really interesting research agenda, and I’m intrigued to see what comes out of it; I think we need a lot of angles on this problem. After all, there is a non-zero probability that type-B materialism is true. I’m far from confident that current LLMs are conscious in a morally-relevant way; I’d put the probability well under 50%. But Hoel writes that he means 0%: “It is not arguing about probabilities. LLMs are not conscious.” But, as we’ve seen, that only holds if you give three shaky premises 100% probability. And that is scientifically and morally reckless.
See Lewis, “Psychophysical and Theoretical Identifications,” 1972 for a more precise account of the theory I am using here, including how to handle cases where no or multiplie entities realize the causal role.
See Chalmers, “Consciousness and its Place in Nature”, 2003 for the full taxonomy.




You have the patience of a Greek god. All Hoel proved was third person observation is insufficient for claiming consciousness. That's the Hard Problem of Consciousness restated and says nothing about ontology. Thanks for addressing the minutia
This objection isn't really an objection at all. It's basically admitting the argument and then arguing for two positions on consciousness that are seriously extreme, on either end of the spectrum (specifically, interactive dualism and illusionism).
If the proof is strong enough to constrain theories of consciousness to those two, then it's pretty darn powerful!
Making this seem like a general problem with the paper requires a shell game about definitions on your end. For instance, you write that:
>> " What K&H demonstrate is that if you deny consciousness as a causal role and deny mental causation and insist a materially falsifiable theory of consciousness should exist, you’re going to get stuck in a serious theoretical bind."
This is the heart of the shell game, because you're substituting in reasonable sounding terms like "deny consciousness a causal role" for "don't believe in interactive dualism." E.g., we don't deny consciousness the kind of causal role it takes in, e.g., IIT, or GNWT. So I'd like you to answer this: Would most people say that those theories deny consciousness a causal role? I think the obvious answer to that question is no. You are forced to answer "yes," and I'd like to hear your reasoning about it.
And that reasoning can't just be a repetition of "interactive dualism is true" (or otherwise, this is indeed a definitional shell game to make an extreme position of yours seem more general). I predict answering "yes" is hard. E.g., in IIT, consciousness is deeply associated with causation, to the point of basically being an identity. And yet, the reasoning of the disproof still applies to IIT.
The same shell game of presenting extreme positions with normal ones is occurring with regards to illusionism: yes, if consciousness is an illusion, then the proof doesn't apply... and then LLMs aren't conscious! That's not a very good objection.
I think a more interesting post might have simply said "Erik's proof doesn't apply to interactive dualism" and sketched your own version of what that might look like in detail as a candidate theory that avoids the dilemma, much as I did (but for a different theory class). But it's not like the paper is unaware of this option. I do mention quantum-based theories in the paper as a possible example of lenient dependency (which I think are the closest modern equivalent to interactive dualism). However, the reason I don't spend much time on it because I find it obvious that interactive dualism would still rule out LLM consciousness! It's basically impossible to have interactive dualism in a computer - it would have to happen in precisely a way that doesn't break, e.g., the machine code. You can read a really thorough case for why that is here:
https://arxiv.org/abs/2304.05077
I'll probably be more interested in your second objection. But I'll note that you just gave a good case here wherein there are indeed theories that pass through the dilemma in humans (here, interactive dualism) but not LLMs! Again I know you seem to believe interactive dualism makes sense in LLMs, but I'll stress that basically no one else I'm aware of does and it seems really difficult to make work. So for your second objection to be true, you'll have to prove (or at least give a very good argument) that interactive dualism of the same kind we'd expect to be associated with consciousness in humans can sensibly apply to LLMs. Again, I'd suggest checking out the linked paper, which basically shows that can't be the case due to how CPUs and GPUs work. So your further post on your next objection is going to be at odds with this one.
To sum up: you've said "this paper is wrong because it doesn't consider two extreme theories, which, if true, would each also mean that LLMs are not conscious." Like I said, not much of an objection!
Since you offer some mildly accusative language about the level of certainty here, I'll take the opportunity to clarify my position: The proof works as a proof. So if the proof holds, it's 100%. The point of that sentence you highlight as an overclaim (on my end) is to distinguish between other common methods, like weighting probabilities of theories of consciousness, which this approach isn't doing. Of course, the proof could be be wrong, because some assumption is untrue: just like any other proof. But talking about this is like cataloging "the unknown unknowns" and I don't think it's egregious to not list "unknown unknowns" in a blog post introducing something. And honestly, I think the claim underlying the shell game you're doing here - which is that computers might have some sort of viable interactive dualism just like humans do, and this is some sort of obvious theory we should take into consideration - is way more radical and wild than any sort of claim I'm making (even assuming you're saying "I think it's X probability") and could be criticized in exactly the same way.