The Meta-Problem Test for AI Consciousness

by Nick Alonso

Artificial intelligence (AI) is rapidly improving and its progress shows no signs of slowing. Our understanding of consciousness, on the other hand, is limited and is progressing at a snails pace. This situation raises an obvious question: how should we judge whether advanced AI are conscious, given our understanding of consciousness is, and will likely continue to be, so limited?

Recent work has proposed two ways to approach this question: 1) use our best scientific theories of consciousness as a basis to make such judgments, or 2) devise and use a behavioral test to make such judgments. Both approaches, as I note below, face challenges.

In this post, I develop the beginnings of an alternative test for AI consciousness based on the ‘Meta-Problem’ of consciousness that does not fit cleanly in either of these categories. The meta-problem is the problem of explaining why people think they are conscious. For reasons I outline below, I believe this novel approach has interesting advantages over the other main approaches. I sketch the foundational ideas here in the hopes it could serve as a basis for further development.

Preliminaries

Before diving in, I need to define terms. The word ‘consciousness’ is ambiguous in the sense that people use it to refer to several different mental phenomena. ‘Self-consciousness’ refers, roughly, to the ability to mentally represent one’s own body and internal workings and to distinguish one’s self from the environment. A mental state or information in the brain is ‘access conscious’ if it is made widely available to a variety of cognitive systems. These two common definitions of consciousness are not what I will be discussing here. These are not especially mysterious or deeply difficult to identify in AI.

The sort of consciousness I will be discussing is what philosophers call ‘phenomenal consciousness’ or ‘subjective experience’. Examples of subjective experiences include first person experiences of color, pain/pleasure, sounds, tastes, smells, and emotions. For reasons I will not be getting into here, this is the mysterious sort of consciousness that has and continues to puzzle philosophers and scientists. The question of whether some AI can be conscious under this definition can be framed as, does the AI have first person experiences of the world? Does it have an inner mental life? Or is it all ‘dark inside’ for the AI, devoid of inner experience?

Most scientists and philosophers who study the subject will agree we are far from an agreed upon theory of the nature of subjective experience and its relation to the brain which creates an obvious challenge when trying to identify consciousness in AI.

This challenge is concerning from a moral point of view since many moral philosophers agree an agent’s moral status (i.e., what moral rights it does or does not have) depends in important ways on what conscious states it is capable of having, especially on its ability to have experiences with a pain/pleasure component. For a good discussion of this concern, I recommend the writings of philosopher Eric Schwitzgebel (e.g., here).

Background

I find it useful to split the problem of identifying consciousness in AI into two problems:

1) Identifying whether AI can be conscious. Following philosopher Susan Schneider, this is the problem of determining whether silicon (or otherwise non-carbon based) AI can support conscious experience, in principle. Some theories of consciousness entail artificial, non-carbon based systems can be conscious. Other theories disagree.

2) Identifying which AI systems are conscious. Even if we assume/discover non-carbon based artificial systems could support consciousness, in principle, there still remains the question of which subset of artificial systems are conscious and to what degree. This problem is analogous to the problem of identifying which subset of animals are conscious and to what degree, given we know biological-carbon based brains can support consciousness.

Philosophers and scientists who study consciousness commonly believe silicon can support consciousness, though agreement certainly is not universal. I suspect support for this belief will only grow in the coming years, and I make this assumption in what follows. Under this assumption, problem 2, the problem of which AI are conscious, becomes the focus.

How might we approach identifying which AI are conscious? Possible approaches are typically categorized into two types.

The theory-driven approach. This approach aims to use our best scientific theories of consciousness as a basis for making judgments about AI consciousness. For example, Butlin, Long, et al. recently showed that a handful of the leading theories of consciousness all seem to agree that certain computational properties are important for consciousness. They argue that these commonalities can be treated as ‘indicators’ of consciousness and used to make judgments about which AI are and are not conscious.

Challenge: The main challenge for theory-driven approaches is a deep lack of a consensus around the scientific methods for studying consciousness, around the requirements for a proper theory of consciousness, and around a theory of consciousness itself. This lack of consensus and shaky foundations suggest we should have limited confidence in, even our leading, theories of consciousness. For example, although Butlin, Long et al. provide a well developed theory-driven test for AI consciousness based on several popular theories, it is unclear how much trust we should put in the indicator properties pulled from these theories.

The theory-neutral approach. Theory-neutral approaches avoid the difficulties of the theory-driven approach by staying largely neutral with respect to scientific and philosophical theories of consciousness. Instead, consciousness-neutral approaches typically devise some sort of behavioral test that could help us determine whether some AI is conscious. One example, proposed by Susan Schneider and Edwin Turner, argues that if we train an AI model such that it is never taught anything about consciousness, yet it still ends up pondering the nature of consciousness, there is sufficient reason to believe it is conscious. Schneider and Turner imagine running this test on something like an advanced chatbot by asking it questions that avoid using the word ‘consciousness’, such as ‘would you survive the deletion of your program?’ The idea is that in order to provide a reasonable response, the AI would require a concept of something like consciousness, and the concept would have to originate from the AI’s inner conscious mental life, since the AI was not explicitly taught the concept during training.

Challenge: Philosophers have challenged theory-neutral approaches like this on the grounds that it seems possible, under a significant variety of views about consciousness, for a non-conscious AI to learn to act as if it is conscious, even when the AI is not explicitly taught to do so. Behavioral tests like those mentioned above would be unable to distinguish between non-conscious AI that learn to talk as if they are conscious from truly conscious AI. The reason theory-neutral approaches have this difficulty seems to be that they are too insensitive to the computational mechanisms causing the verbal reports, leaving open the possibility for non-conscious systems/cognitive mechanisms to generate behaviors that mimic those of conscious systems. To add to this problem, most of these tests rely on verbal reports and thus only apply to AI that can respond verbally.

The Meta-Problem of Consciousness

Below I present an alternative test for AI consciousness which can be interpreted as a middle point between theory-neutral and theory-driven approaches. The rough, first approximation of the test can be summarized as follows: if an AI says it is conscious for the same cognitive-computational reason that humans do, there is sufficient reason to believe the AI is conscious.

In order to develop this initial idea into a more rigorous and philosophically grounded test, we need to unpack a bit more what this statement means. First, what do I mean by “an AI says it is conscious for the same cognitive-computational reason that humans do“?

Humans who reflect on their own minds, tend to conclude they are conscious. We conclude we have a stream of first-person experiences, an inner mental life. We often conclude this stream of first person experience seems distinct from the neural mechanisms that underlie it, and make other related conclusions.

Now, why do people, who reflect on their own minds, tend to think these things? The standard line of thinking goes something like this: we think such things, of course, because we are conscious! We have conscious experiences, and these conscious experiences cause us to think we have conscious experiences.

This is the intuitive explanation, but it is not the only one. Philosophers and cognitive scientists have, in particular, shown that there exist consciousness-neutral explanations of the same behavior. That is, there exist explanations of why we think we are conscious that do not involve the term or concept of phenomenal consciousness.

Here is the basic idea: every behavior, including the behavior of saying you are conscious, is caused by some internal neural process, which implements some more abstract, cognitive-level computations. A description of a behavior’s neural and cognitive causes is what cognitive scientists and neuroscientists count as an explanation of the behavior: if we fully describe the neural and cognitive-computational processes that generate some common behavior X, then we have a (causal/mechanistic) explanation of why we observe X.

This line of thinking applies to any behavior, including common human behaviors associated with consciousness, such as our tendency to say we have a stream of first-person experiences. Put simply, there is some neural process, which implements more abstract cognitive level computations, that causes behaviors associated with consciousness. Thus, we can explain these behaviors in a consciousness-neutral way, in terms of these neural and cognitive mechanisms.

The problem of explaining why we think we have mysterious conscious states, has been given a name and developed into a research program by philosopher David Chalmers, who calls it the meta-problem of consciousness. The meta-problem gets its name from the fact that it is a problem about a problem: it is the problem of explaining why we think we have consciousness states that are problematic to explain.

What is nice about these recent developments on the meta-problem by Chalmers and others, is that they provide ideas which, as I will explain, are useful for setting the foundation for a test for AI consciousness based on our preliminary idea above (i.e., the idea that if an AI claims it is conscious for the same cognitive reason people do, it should be identified as conscious). In particular, I focus on three ideas developed around the meta-problem and use them to develop a test for AI consciousness.

First, is the idea that our tendency to say or judge we have consciousness involves two parts: a lower order mental model, usually some sort of perceptual-like model and a higher-order model, which represents the lower order model (see Chalmers, 2019, pp. 40-45). The higher-order model can be described as the high-level ‘thought’ about the lower order model, and the lower order model can, in this way, be thought of as the mechanistic source of our thoughts about consciousness. To simplify terminology, I will just call this mechanistic source of our judgments about consciousness the c-source.

To make this more concrete, consider an example of a c-source which comes from attention schema theory (AST). AST is one of the most well-developed approaches to the meta-problem. AST claims the brain has a simplified model of attention, an attention schema, which it uses to control attention. This simplified model represents attention as a simple mental relation. So, when our cognition reads out the content from this model, it concludes we have some simple, primitive mental relations between ourselves and features of the world. We call these simple, mental relations ‘awareness’ or ‘experience’ or ‘consciousness’, and conclude they seem different, and non-reducible to the physical stuff in our brain. Now, AST may be incorrect, but it provides a clear illustration of what a c-source could be, i.e., a simplified mental model of attention.

The second idea is the hypothesis our c-source must be very closely related to consciousness. Now this point is not essential to the meta-problem. However, Chalmers argues for this point (see 2019, pp. 49-56), and it makes some sense. Most people assume conscious experience is itself the source of our claims we are conscious. As such, it would be highly surprising if the c-source, which play roughly the same causal role in cognition as consciousness, has no essential relation consciousness. For example, if AST is true, it would be very surprising if we concluded attention schemas had nothing to do with consciousness, since AST seems to entail that a thought like ‘I am have conscious experience X’ is actually referring to an attention schema (and/or their contents)! The c-source is in some sense what we are thinking about what we think about consciousness, and it thus seems that in order to avoid certain philosophical problems (explained in the meta-problem paper but not here), a theory of consciousness must assume there is some essential relation between consciousness and the c-source.

The third idea is a general/high-level description of what the c-source likely is. Now there is no consensus around a solution to the meta-problem. However, dozens of ideas have been developed, in addition to AST, which can be used as a starting point. Similar to the theory-driven approach of Butlin, Long, et al., we can look at promising approaches to the meta-problem and ask whether there are any commonalities between them that can be used as a basis for an AI consciousness test. Fortunately, Chalmers provides an extensive review of proposed solutions to the meta-problem, summarizes some common threads between them, and synthesizes ideas he found promising, which I find promising as well. Here is his summary

We have introspective models deploying introspective concepts of our internal states that are largely independent of our physical concepts. These concepts are introspectively opaque, not revealing any of the underlying physical or computational mechanisms. Our perceptual models perceptually attribute primitive perceptual qualities to the world, and our introspective models attribute primitive mental relations to those qualities. We seem to have immediate knowledge that we stand in these primitive mental relations to primitive qualities, and we have the sense of being acquainted with them .

Chalmers, The Meta-Problem of Consciousness (2018, p.34).

There is a lot to unpack in this passage. Most of it is out of the scope of this post. The part important for understand the c-source, Chalmers suggests, likely involves simplified mental models of features/qualities in the world and simplified models of mental relations. The idea is our brain models certain features in the world (e.g., color) as primitive (irreducible) qualities, and our brain models certain mental relations as primitive (irreducible) relations between us and these qualities. This gives us the sense we have direct first person experiences of qualities like, e.g., color, sound, and tactile sensations. These simplified models are thus our c-sources. This idea may not be completely right, but it is a nice starting point that encompasses the claims of a range of proposed solutions to the meta-problem, including prominent ones like AST.

Summary of the foundational ideas for the MPT:

  • There is a cognitive source of our tendency to judge we are conscious (what I call the c-source), which can be described in consciousness-neutral terms.
  • There are a priori (philosophical) reasons to believe the c-source has an essential link to consciousness.
  • An initial theory that encompasses ideas from a range of hypotheses of the c-source says the c-source is a set of simplified/compressed mental representations of features in the world and our mental relations to the features.

The Meta-Problem Test for AI Consciousness

I will now use these ideas to construct what I will call the meta-problem test (MPT) for AI consciousness. After presenting the MPT here, I discuss its advantages over the other approaches. My proposal for the MPT rests on the following assumption

The Foundational Assumption for the MPT: the presence of the c-source in an AI provides sufficient reason to believe the AI is conscious (under the assumption silicon-based systems can be conscious, in principle). The absence of the c-source is sufficient reason to believe an AI is not conscious.

This assumption does not claim the presence of a c-source is sufficient for consciousness, just sufficient for us to believe consciousness is present. Thus, the claim it is making about consciousness is relatively weak, as it makes no specific claims about the necessary and sufficient conditions for consciousness, what consciousness is, or its metaphysical relation to the brain. It just says the presence of the c-source is evidence enough for consciousness, and its absence is evidence enough for consciousness’ absence.

This assumption is supported by the second idea discussed above, which is that the c-source likely has some essential relation to consciousness. Further philosophical work is needed to determine the plausibility of this point. As noted above, Chalmers provides some arguments as to why we should think there is some direct, essential relation between consciousness and, what I am calling, the c-source, (see pp. 40-45). I will not got into the details of these arguments here but will try to in future posts. For now, I only note that this foundational assumption has prima facie plausibility.

Under this assumption we can lay out the basic steps for implementing the (MPT):

  1. Put cognitive scientists to work on the meta-problem until the field converges to an explanation of what the c-source is in people, using Chalmers’ suggestion, and related theories, as a starting point.
  2. Develop systematic methods for identifying the c-source in AI.
  3. Judge those AI with the c-source to be conscious. Judge those AI without the c-source to be unconscious.

The Advantages of the MPT

How does the MPT compare to more standard theory-driven and theory-neutral approaches? Interestingly, the MPT does not cleanly fit into either category.

Like the theory-driven approach the MPT proposes using a scientific theory of a mental process as a basis for making judgments about AI consciousness. However, unlike the theory-driven approach, the mental process this theory is about is not consciousness itself, but instead a cognitive process that is very closely related to consciousness, the c-source.

Like theory-neutral approaches the MPT remains relatively neutral w.r.t. scientific and philosophical theories of consciousness, with the exception of its foundational assumption. However, unlike typical theory-neutral approaches, the MPT still relies on test for a certain mental process (the c-source) based on scientific theories of what that mental process is. Also, the MPT is a cognitive-based test: it tests whether a certain cognitive process/representation is present and if it is we judge consciousness is present too, whereas theory-neutral approaches tend to be more behavior focused and less sensitive to the cognitive mechanisms generating the behavior in question.

The MPT thus does not fit neatly into either category, yet it shares some properties with both approaches. Interestingly, this middle ground may allow the MPT to avoid the main problems associated with theory-neutral and theory-driven approaches, while keeping some of their advantages.

More specifically, the advantages of the MPA over theory-driven approaches are

  1. The MPT does not rely on any scientific theories of consciousness, and therefore avoids the uncertainty that necessarily comes with these theories and their assumptions. Although the MPT does rely on the assumption that the presence of the c-source is sufficient reason to believe consciousness is present, this assumption is much weaker, and therefore should be more easily defended, than the stronger more specific claims of scientific theories of consciousness.
  2. The theories the MPT does rely on attempt to explain a cognitive process, the c-source, which is not philosophically problematic like consciousness. As such it is much more likely we can make progress on finding widely accepted theories of the c-source in the short-term than we can with theories of consciousness.

The advantages of the MPT over theory-neutral approaches are as follows:

  1. Arguably, the central issue for behavioral-based, theory-neutral tests is that such tests cannot distinguish reliably enough between non-conscious AI that behave as if they are conscious, from genuine conscious AI. This issue is largely a result of the fact these tests are too insensitive to the cognitive processes generating the relevant behaviors. The MPT, alternatively, is a cognitive-based test that only identifies those AI with a certain kind of cognitive process to be conscious. If the foundational assumption of the MPT is correct, the MPT avoid the central issue facing behavioral-based tests, as it will be able to reliably distinguish between non-conscious AI that merely behave as if they are conscious (which are those that behave conscious-like but through a process unrelated to the c-source) from those that actually are conscious (which will be those that behave conscious-like through a process that is rooted in the c-source).
  2. The MPT, unlike many proposed theory-neutral approaches, does not rely solely on language generation. The MPT can also use non-language based tests. For example, if AST turns out to be true, we could test for the presence of attention schemas in an AI by having it perform tasks that require a certain kind of control over its attention, which only an attention schema can provide. Further, we can also perform the MPT using non-behavioral tests which directly probe the AI’s computational mechanisms for the c-source, e.g., directly study an AI’s artificial neural network for the presence of self-representations of attention. This allows us to test for consciousness in a wider variety of AI than linguistic tests allow for and it allows us to evidence our judgments about AI consciousness in a wider variety of ways than linguistic and behavioral tests can alone.

Current Limitations of the MPT

The MPT is not without limitations:

  1. The MPT approach is based on the assumption that the c-source is a very reliable indicator of consciousness. Although there is, in my opinion, good philosophical arguments for this assumption, more discussion is needed to determine whether it can be justified sufficiently to justify the use of the MPT.
  2. The MPT approach requires that cognitive scientists actually make progress on the meta-problem. Very few cognitive scientists and consciousness researchers are currently working on the meta-problem directly. It remains to be seen if consciousness scientists will shift focus in the near-term.

Conclusions

  • The MPT offers a kind of middle point between typical theory-driven and theory-neutral approaches for AI consciousness tests.
  • As such, the MPT seems to retain the best aspects of both, while avoiding the problems associated with each. This suggests the MPT is a promising avenue for further development.
  • The success of the MPT depends heavily on the foundational assumption, that the c-source has some essential tie to consciousness. This assumption has prima facie plausibility but further analysis of it is needed.
  • The success of the MPT also depends on cognitive scientists converging toward a consensus on the c-source. This will only happen if significantly more scientists work directly on the meta-problem than there currently are.