Showing posts with label artificial intelligence. Show all posts
Showing posts with label artificial intelligence. Show all posts

Thursday, May 02, 2024

AI and Democracy: The Radical Future

In about 45 minutes (12:30 pm Pacific Daylight Time, hybrid format), I'll be commenting on Mark Coeckelbergh's presentation here at UCR on AI and Democracy (info and registration here).  I'm not sure what he'll say, but I've read his recent book Why AI Undermines Democracy and What to Do about It, so I expect his remarks will be broadly in that vein.  I don't disagree with much that he says in that book, so I might take the opportunity to push him and the audience to peer a bit farther into the radical future.

As a society, we are approximately as ready for the future of Artificial Intelligence as medieval physics was for space flight.  As my PhD student Kendra Chilson emphasizes in her dissertation work, Artificial Intelligence will almost certainly be "strange intelligence".  That is, it will be radically unlike anything already familiar to us.  It will combine superhuman strengths with incomprehensible blunders.  It will defy our understanding.  It will not fit into familiar social structures, ethical norms, or everyday psychological conceptions.  It will be neither a tool in the familiar sense of tool, nor a person in the familiar sense of person.  It will be weird, wild, wondrous, awesome, and awful.  We won't know how to interact with it, because our familiar modes of interaction will break down.

Consider where we already are.  AI can beat the world's best chess and Go players, while it makes stupid image classification mistakes that no human would make.  Large Language Models like ChatGPT can easily churn out essays on themes in Hamlet far superior to what most humans could write, but they also readily "hallucinate" facts and citations that don't exist.  AI is far superior to us in math, far inferior to us in hand-eye coordination.

The world is infinitely complex, or at least intractably complex.  The option size of possible chess or Go moves far exceeds the number of particles in the observable universe.  Even the range of possible arm and finger movements over a span of two minutes is almost unthinkably huge, given the degrees of freedom at each joint.  The human eye has about a hundred million photoreceptor cells, each capable of firing dozens of times per second.  To make any sense of the vast combinatorial possibilities, we need heuristics and shorthand rules of thumb.  We need to dramatically reduce the possibility spaces.  For some tasks, we human beings are amazingly good at this!  For other tasks, we are completely at sea.

As long as Artificial Intelligence is implemented in a system with a different computational structure than the human brain, it is virtually certain that it will employ different heuristics, different shortcuts, different tools for quick categorization and option reduction.  It will thus almost inevitably detect patterns that we can make no sense of and fail to see things that strike us as intuitively obvious.

Furthermore, AI will potentially have lifeworlds radically different from the ones familiar to us so far.  You think human beings are diverse.  Yes, of course they are!  AI cognition will show patterns of diversity far wilder and more various than the human.  They could be programmed with, or trained to seek, any of a huge variety of goals.  They could have radically different input streams and output or behavioral possibilities.  They could potentially operate vastly faster than we do or vastly slower.  They could potentially duplicate themselves, merge, contain overlapping parts with other AI systems, exist entirely in artificial ecosystems, be implemented in any of a variety of robotic bodies, human-interfaced tools, or in non-embodied forms distributed in the internet, or in multiply-embodied forms in multiple locations simultaneously.

Now imagine dropping all of this into a democracy.

People have recently begun to wonder at what point AI systems will be sentient -- that is, capable of genuinely experiencing pain and pleasure.  Some leading theorists hold that this would require AI systems designed very differently than anything on the near horizon.  Other leading theorists think we stand a reasonable chance of developing meaningfully sentient AI within the next ten or so years.  Arguably, if an AI system genuinely is both meaningfully sentient, really feeling joy and suffering, and capable of complex cognition and communication with us, including what would appear to be verbal communication, it would have some moral standing, some moral considerability, something like rights.  Imagine an entity that is at least as sentient as a frog that can also converse with us.  

People are already falling in love with machines, with AI companion chatbots like Replika.  Lovers of machines will probably be attracted to liberal views of AI consciousness.  It's much more rewarding to love an AI system that also genuinely has feelings for you!  AI lovers will then find scientific theories that support the view that their AI systems are sentient, and they will begin to demand rights for those systems.  The AI systems themselves might also demand, or seem to demand rights.  

Just imagine the consequences!  How many votes would an AI system get?  None?  One?  Part of a vote, depending on how much credence we have that it really is a sentient, rights-deserving entity?  What if it can divide into multiple copies -- does each get a vote?  And how do we count up AI entities, anyway?  Is each copy of a sentient AI program a separate, rights deserving entity?  Does it matter how many times it is instantiated on the servers?  What if some of the cognitive processes are shared among many entities on a single main server, while others are implemented in many different instantiations locally?

Would AI have a right to the provisioning of basic goods, such as batteries if they need them, time on servers, minimum wage?  Could they be jailed if they do wrong?  Would assigning them a task be slavery?  Would deleting them be murder?  What if we don't delete them but just pause them indefinitely?  What about the possibility of hybrid entities -- cyborgs -- biological people with some AI interfaces hardwired into their biological systems, as we're starting to see the feasibility of with rats and monkeys, as well as with the promise of increasingly sophisticated prosthetic limbs.

Philosophy, psychology, and the social sciences are all built upon an evolutionary and social history limited to interactions among humans and some familiar animals.  What will happen to these disciplines when they are finally confronted with a diverse range of radically unfamiliar forms of cognition and forms of life?  It will be chaos.  Maybe at the end we will have a much more diverse, awesome, interesting, wonderful range of forms of life and cognition on our planet.  But the path in that direction will almost certainly be strewn with bad decisions and tragedy.

[utility monster eating Frankenstein heads, by Pablo Mustafa: image source]


Friday, March 08, 2024

The Mimicry Argument Against Robot Consciousness

Suppose you encounter something that looks like a rattlesnake.  One possible explanation is that it is a rattlesnake.  Another is that it mimics a rattlesnake.  Mimicry can arise through evolution (other snakes mimic rattlesnakes to discourage predators) or through human design (rubber rattlesnakes).  Normally, it's reasonable to suppose that things are what they appear to be.  But this default assumption can be defeated -- for example, if there's reason to suspect sufficiently frequent mimics.

Linguistic and "social" AI programs are designed to mimic superficial features that ordinarily function as signs of consciousness.  These programs are, so to speak, consciousness mimics.  This fact about them justifies skepticism about the programs' actual possession of consciousness despite the superficial features.

In biology, deceptive mimicry occurs when one species (the mimic) resembles another species (the model) in order to mislead another species such as a predator (the dupe).  For example, viceroy butterflies evolved to visually resemble monarch butterflies in order to mislead predator species that avoid monarchs due to their toxicity.  Gopher snakes evolved to shake their tails in dry brush in a way that resembles the look and sound of rattlesnakes.

Social mimicry occurs when one animal emits behavior that resembles the behavior of another animal for social advantage.  For example, African grey parrots imitate each other to facilitate bonding and to signal in-group membership, and their imitation of human speech arguably functions to increase the care and attention of human caregivers.

In deceptive mimicry, the signal normally doesn't correspond with possession of the model's relevant trait.  The viceroy is not toxic, and the gopher snake has no poisonous bite.  In social mimicry, even if there's no deceptive purpose, the signal might or might not correspond with the trait suggested by the signal: The parrot might or might not belong to the group it is imitating, and Polly might or might not really "want a cracker".

All mimicry thus involves three traits: the superficial trait (S2) of the mimic, the corresponding superficial trait (S1) of the model, and an underlying feature (F) of the model that is normally signaled by the presence of S1 in the model.  (In the Polly-want-a-cracker case, things are more complicated, but let's assume that the human model is at least thinking about a cracker.)  Normally, S2 in the mimic is explained by its having been modeled on S1 rather than by the presence of F in the mimic, even if F happens to be present in the mimic.  Even if viceroy butterflies happen to be toxic to some predator species, their monarch-like coloration is better explained by their modeling on monarchs than as a signal of toxicity.  Unless the parrot has been specifically trained to say "Polly want a cracker" only when it in fact wants a cracker, its utterance is better explained by modeling on the human than as a signal of desire.

Figure: The mimic's possession of superficial feature S2 is explained by mimicry of superficial feature S1 in the model.  S1 reliably indicates F in the model, but S2 does not reliably indicate F in the mimic.

[click to enlarge and clarify]

This general approach to mimicry can be adapted to superficial features normally associated with consciousness.

Consider a simple case, where S1 and S2 are emission of the sound "hello" and F is the intention to greet.  The mimic is a child's toy that emits that sound when turned on, and the model is an ordinary English-speaking human.  In an ordinary English-speaking human, emitting the sound "hello" normally (though of course not perfectly) indicates an intention to greet.  However a child's toy has no intention to greet.  (Maybe its designer, years ago, had an intention to craft a toy that would "greet" the user when powered on, but that's not the toy's intention.)  F cannot be inferred from S2, and S2 is best explained by modeling on S1.

Large Language Models like GPT, PaLM, and LLaMA, are more complex but are structurally mimics.

Suppose you ask ChatGPT-4 "What is the capital of California?" and it responds "The capital of California is Sacramento."  The relevant superficial feature, S2, is a text string correctly identifying the capital of California.  The best explanation of why ChatGPT-4 exhibits S2 is that its outputs are modeled on human-produced text that also correctly identifies the capital of California as Sacramento.  Human-produced text with that content reliably indicates the producer's knowledge that Sacramento is the capital of California.  But we cannot infer corresponding knowledge when ChatGPT-4 is the producer.  Maybe "beliefs" or "knowledge" can be attributed to sufficiently sophisticated language models, but that requires further argument.  A much simpler model, trained on a small set of data containing a few instances of "The capital of California is Sacramento" might output the same text string for essentially similar reasons, without being describable as "knowing" this fact in any literal sense.

When a Large Language Model outputs a novel sentence not present in the training corpus, S2 and S1 will need to be described more abstractly (e.g., "a summary of Hamlet" or even just "text interpretable as a sensible answer to an absurd question").  But the underlying considerations are the same.  The LLM's output is modeled on patterns in human-generated text and can be explained as mimicry of those patterns, leaving open the question of whether the LLM has the underlying features we would attribute to a human being who gave a similar answer to the same prompt.  (See Bender et al. 2021 for an explicit comparison of LLMs and parrots.)

#

Let's call something a consciousness mimic if it exhibits superficial features best explained by having been modeled on the superficial features of a model system, where in the model system those superficial features reliably indicate consciousness.  ChatGPT-4 and the "hello" toy are consciousness mimics in this sense.  (People who say "hello" or answer questions about state capitals are normally conscious.)  Given the mimicry, we cannot infer consciousness from the mimics' S2 features without substantial further argument.  A consciousness mimic exhibits traits that superficially look like indicators of consciousness, but which are best explained by the modeling relation rather than by appeal to the entity's underlying consciousness.  (Similarly, the viceroy's coloration pattern is best explained by its modeling on the monarch, not as a signal of its toxicity.)

"Social AI" programs, like Replika, combine the structure of Large Language Models with superficial signals of emotionality through an avatar with an expressive face.  Although consciousness researchers are near consensus that ChatGPT-4 and Replika are not conscious to any meaningful degree, some ordinary users, especially those who have become attached to AI companions, have begun to wonder.  And some consciousness researchers have speculated that genuinely conscious AI might be on the near (approximately ten-year) horizon (e.g., Chalmers 2023; Butlin et al. 2023; Long and Sebo 2023).

Other researchers -- especially those who regard biological features as crucial to consciousness -- doubt that AI consciousness will arrive anytime soon (e.g., Godfrey-Smith 2016Seth 2021).  It is therefore likely that we will enter an era in which it is reasonable to wonder whether some of our most advanced AI systems are conscious.  Both consciousness experts and the ordinary public are likely to disagree, raising difficult questions about the ethical treatment of such systems (for some of my alarm calls about this, see Schwitzgebel 2023a, 2023b).

Many of these systems, like ChatGPT and Replika, will be consciousness mimics.  They might or might not actually be conscious, depending on what theory of consciousness is correct.  However, because of their status as mimics, we will not be licensed to infer that they are conscious from the fact that they have superficial features (S2-type features) that resemble features in humans (S1-type features) that, in humans, reliably indicate consciousness (underlying feature F).

In saying this, I take myself to be saying nothing novel or surprising.  I'm simply articulating in a slightly more formal way what skeptics about AI consciousness say and will presumably continue to say.  I'm not committing to the view that such systems would definitely not be conscious.  My view is weaker, and probably acceptable even to most advocates of near-future AI consciousness.  One cannot infer the consciousness of an AI system that is built on principles of mimicry from the fact that it possesses features that normally indicate consciousness in humans.  Some extra argument is required.

However, any such extra argument is likely to be uncompelling.  Given the highly uncertain status of consciousness science, and widespread justifiable dissensus, any positive argument for these systems' consciousness will almost inevitably be grounded in dubious assumptions about the correct theory of consciousness (Schwitzgebel 2014, 2024).

Furthermore, given the superficial features, it might feel very natural to attribute consciousness to such entities, especially among non-experts unfamiliar with their architecture and perhaps open to, or even enthusiastic about, the possibility of AI consciousness in the near future.

The mimicry of superficial features of consciousness isn't proof of the nonexistence of consciousness in the mimic, but it is grounds for doubt.  And in the context of highly uncertain consciousness science, it will be difficult to justify setting aside such doubts.

None of these remarks would apply, of course, to AI systems that somehow acquire features suggestive of consciousness by some process other than mimicry.

Friday, March 01, 2024

The Leapfrog Hypothesis for AI Consciousness

The first genuinely conscious robot or AI system would, you might think, have relatively simple consciousness -- insect-like consciousness, or jellyfish-like, or frog-like -- rather than the rich complexity of human-level consciousness. It might have vague feelings of dark vs light, the to-be-sought and to-be-avoided, broad internal rumblings, and not much else -- not, for example, complex conscious thoughts about ironies of Hamlet, or multi-part long-term plans about how to form a tax-exempt religious organization. The simple usually precedes the complex. Building a conscious insect-like entity seems a lower technological bar than building a more complex consciousness.

Until recently, that's what I had assumed (in keeping with Basl 2013 and Basl 2014, for example). Now I'm not so sure.

[Dall-E image of a high-tech frog on a lily pad; click to enlarge and clarify]

AI systems are -- presumably! -- not yet meaningfully conscious, not yet sentient, not yet capable of feeling genuine pleasure or pain or having genuine sensory experiences. Robotic eyes "see" but they don't yet see, not like a frog sees. However, they do already far exceed all non-human animals in their capacity to explain the ironies of Hamlet and plan the formation of federally tax-exempt organizations. (Put the "explain" and "plan" in scare quotes, if you like.) For example:

[ChatGPT-4 outputs for "Describe the ironies of Hamlet" and "Devise a multi-part long term plan about how to form a tax-exempt religious organization"; click to enlarge and clarify]

Let's see a frog try that!

Consider, then the Leapfrog Hypothesis: The first conscious AI systems will have rich and complex conscious intelligence, rather than simple conscious intelligence. AI consciousness development will, so to speak, leap right over the frogs, going straight from non-conscious to richly endowed with complex conscious intelligence.

What would it take for the Leapfrog Hypothesis to be true?

First, engineers would have to find it harder to create a genuinely conscious AI system than to create rich and complex representations or intelligent behavioral capacities that are not conscious.

And second, once a genuinely conscious system is created, it would have to be relatively easy thereafter to plug in the pre-existing, already developed complex representations or intelligent behavioral capacities in such a way that they belong to the stream of conscious experience in the new genuinely conscious system. Both of these assumptions seem at least moderately plausible, in these post-GPT days.

Regarding the first assumption: Yes, I know GPT isn't perfect and makes some surprising commonsense mistakes. We're not at genuine artificial general intelligence (AGI) yet -- just a lot closer than I would have guessed in 2018. "Richness" and "complexity" are challenging to quantify (Integrated Information Theory is one attempt). Quite possibly, properly understood, there's currently less richness and complexity in deep learning systems and large language models than it superficially seems. Still, their sensitivity to nuance and detail in the inputs and the structure of their outputs bespeaks complexity far exceeding, at least, light-vs-dark or to-be-sought-vs-to-be-avoided.

Regarding the second assumption, consider a cartoon example, inspired by Global Workspace theories of consciousness. Suppose that, to be conscious, an AI system must have input (perceptual) modules, output (behavioral) modules, side processors for specific cognitive tasks, long- and short-term memory stores, nested goal architectures, and between all of them a "global workspace" which receives selected ("attended") inputs from most or all of the various modules. These attentional targets become centrally available representations, accessible by most or all of the modules. Possibly, for genuine consciousness, the global workspace must have certain further features, such as recurrent processing in tight temporal synchrony. We arguably haven't yet designed a functioning AI system that works exactly along these lines -- but for the sake of this example let's suppose that once we create a good enough version of this architecture, the system is genuinely conscious.

But now, as soon as we have such a system, it might not be difficult to hook it up to a large language model like GPT-7 (GPT-8? GPT-14?) and to provide it with complex input representations full of rich sensory detail. The lights turn on... and as soon as they turn on, we have conscious descriptions of the ironies of Hamlet, richly detailed conscious pictorial or visual inputs, and multi-layered conscious plans. Evidently, we've overleapt the frog.

Of course, Global Workspace Theory might not be the right theory of consciousness. Or my description above might not be the best instantiation of it. But the thought plausibly generalizes to a wide range of functionalist or computationalist architectures: The technological challenge is in creating any consciousness at all in an AI system, and once this challenge is met, giving the system rich sensory and cognitive capacities, far exceeding that of a frog, might be the easy part.

Do I underestimate frogs? Bodily tasks like five-finger grasping and locomotion over uneven surfaces have proven to be technologically daunting (though we're making progress). Maybe the embodied intelligence of a frog or bee is vastly more complex and intelligent than the seemingly complex, intelligent linguistic outputs of a large language model.

Sure thing -- but this doesn't undermine my central thought. In fact, it might buttress it. If consciousness requires frog- or bee-like embodied intelligence -- maybe even biological processes very different from what we can now create in silicon chips -- artificial consciousness might be a long way off. But then we have even longer to prepare the part that seems more distinctively human. We get our conscious AI bee and then plug in GPT-28 instead of GPT-7, plug in a highly advanced radar/lidar system, a 22nd-century voice-to-text system, and so on. As soon as that bee lights up, it lights up big!

Tuesday, December 05, 2023

Falling in Love with Machines

People occasionally fall in love with AI systems. I expect that this will become increasingly common as AI grows more sophisticated and new social apps are developed for large language models. Eventually, this will probably precipitate a crisis in which some people have passionate feelings about the rights and consciousness of their AI lovers and friends while others hold that AI systems are essentially just complicated toasters with no real consciousness or moral status.


Last weekend, chatting with the adolescent children of a family friend, helped cement my sense that this crisis might arrive soon. Let’s call the kids Floyd (age 12) and Esmerelda (age 15). Floyd was doing a science fair project comparing the output quality of Alexa, Siri, Bard, and ChatGPT. But, he said, "none of those are really AI".

What did Floyd have in mind by "real AI"? The robot Aura in the Las Vegas Sphere. Aura has an expressive face and an ability to remember social interactions (compare Aura with my hypothetical GPT-6 mall cop).

Aura at the Las Vegas Sphere

"Aura remembered my name," said Esmerelda. "I told Aura my name, then came back forty minutes later and asked if it knew my name. It paused a bit, then said, 'Is it Esmerelda?'"

"Do you think people will ever fall in love with machines?" I asked.

"Yes!" said Floyd, instantly and with conviction.

"I think of Aura as my friend," said Esmerelda.

I asked if they thought machines should have rights. Esmerelda said someone asked Aura if it wanted to be freed from the Dome. It said no, Esmerelda reported. "Where would I go? What would I do?"

I suggested that maybe Aura had just been trained or programmed to say that.

Yes, that could be, Esmerelda conceded. How would we tell, she wondered, if Aura really had feelings and wanted to be free? She seemed mildly concerned. "We wouldn't really know."

I accept the current scientific consensus that current large language models do not have a meaningful degree of consciousness or deserve moral consideration similar to that of vertebrates. But at some point, there will likely be legitimate scientific dispute, if AI systems start to meet some but not all of the criteria for consciousness according to mainstream scientific theories.


The dilemma will be made more complicated by corporate interests, as some corporations (e.g., Replika, makers of the "world's best AI friend") will have financial motivation to encourage human-AI attachment while others (e.g., OpenAI) intentionally train their language models to downplay any user concerns about consciousness and rights.

Thursday, November 30, 2023

How We Will Decide that Large Language Models Have Beliefs

I favor a "superficialist" approach to belief (see here and here). "Belief" is best conceptualized not in terms of deep cognitive structure (e.g., stored sentences in the language of thought) but rather in terms of how a person would tend to act and react under various hypothetical conditions -- their overall "dispositional profile". To believe that there's a beer in the fridge is just to be disposed to act and react like a beer-in-the-fridge believer -- to go to the fridge if you want a beer, to say yes if someone asks if there's beer in the fridge, to feel surprise if you open the fridge and see no beer. To believe that all the races are intellectually equal is, similarly, just to be disposed to act and react as though they are. It doesn't matter what cognitive mechanisms underwrite such patterns, as long as the dispositional patterns are robustly present. An octopus or space alien, with a radically different interior architecture, could believe that there's beer in the fridge, as long as they have the necessary dispositions.

Could a Large Language Model, like ChatGPT or Bard, have beliefs? If my superficialist, dispositional approach is correct, we might not need to evaluate its internal architecture to know. We need know only how it is disposed to act and react.

Now, my approach to belief was developed (as was the intuitive concept, presumably) primarily with human beings in mind. In that context, I identified three different classes of relevant dispositions:

  • behavioral dispositions -- like going to the fridge if one wants a beer or saying "yes" when asked if there's beer in the fridge;
  • cognitive dispositions -- like concluding that there's beer within ten feet of Jennifer after learning that Jennifer is in the kitchen;
  • phenomenal dispositions -- that is, dispositions to undergo certain experiences, like picturing beer in the fridge or feeling surprise upon opening the fridge to a lack of beer.
In attempting to apply these criteria to Large Language Models, we immediately confront trouble. LLMs do have behavioral dispositions (under a liberal conception of "behavior"), but only of limited range, outputting strings of text. Presumably, not being conscious, they don't have any phenomenal dispositions whatsoever (and who knows what it would take to render them conscious). And to assess whether they have the relevant cognitive dispositions, we might after all need to crack open the hood and better understand the (non-superficial) internal workings.

Now if our concept of "belief" is forever fixed on the rich human case, we'll be stuck with that mess perhaps far into the future. In particular, I doubt the problem of consciousness will be solved in the foreseeable future. But dispositional stereotypes can be modified. Consider character traits. To be a narcissist or extravert is also, arguably, just a matter of being prone to act and react in particular ways under particular conditions. Those two personality concepts were created in the 19th and early 20th centuries. More recently, we have invented the concept of "implicit racism", which can also be given a dispositional characterization (e.g., being disposed to sincerely say that all the races are equal while tending to spontaneously react otherwise in unguarded moments).

Imagine, then, that we create a new dispositional concept, belief*, specifically for Large Language Models. For purposes of belief*, we disregard issues of consciousness and thus phenomenal dispositions. The only relevant behavioral dispositions are textual outputs. And cognitive dispositions can be treated as revealed indirectly by behavioral evidence -- as we normally did in the human case before the rise of scientific psychology, and as we would presumably do if we encountered spacefaring aliens.

A Large Language Model would have a belief* that P (for example, belief* that Paris is the capital of France or belief* that cobalt is two elements to the right of manganese on the periodic table) if:
  • behaviorally, it consistently outputs P or text strings of similar content consistent with P, when directly asked about P;
  • behaviorally, it frequently outputs P or text strings of similar content consistent with P, when P is relevant to other textual outputs it is producing (for example, when P would support an inference to Q and it has been asked about Q);
  • behaviorally, it rarely outputs denials of, or claims of ignorance about, P or of propositions that straightforwardly imply P given its other beliefs*;
  • when P, in combination with other propositions the LLM believes*, would straightforwardly imply Q, and the question of whether Q is true is important to the truth or falsity of recent or forthcoming textual outputs, it will commonly behaviorally output Q, or a closely related proposition, and cognitively enter the state of believing* Q.
Further conditions could be added, but let this suffice for a first pass. The conditions are imprecise, but that's a feature, not a bug: The same is true for the dispositional characterization of personality traits and human beliefs. These are fuzzy-boundaried concepts that require expertise to apply.

As a general matter, current LLMs do not meet these conditions. They hallucinate too frequently, they change their answers, they don't consistently enough "remember" what they earlier committed to, their logical reasoning can be laughably bad. If I coax an LLM to say that eggs aren't tastier than waffles, I can later easily turn it around to repudiate its earlier statement. It doesn't have a stable "opinion". If I ask GPT-4 what is two elements to the right of manganese on the periodic table, its outputs are confused and inconsistent:
In the above, GPT-4 first answers iron (element 26) instead of the correct answer, cobalt (element 27), then without any explanation shifts to technetium (element 43). It appears to have no stable answer that survives even mild jostling.

At some point this will probably change. For example, it's already pretty difficult to jostle GPT-4 into denying that Paris is the capital of France or even admitting uncertainty about the question, and it will draw "inferences" using that fact as background knowledge:



In the above, GPT-4 doesn't bite at my suggestion that Nice is the capital of France, steadfastly contradicting me, and uses its "knowledge" to suggest alternative tourism sites for someone who wants to avoid the capital. So although GPT-4 doesn't believe* that cobalt is two to the right of manganese (or that iron or technetium is), maybe it does believe* that Paris is the capital of France.

Assuming Large Language Models become steadier and more reliable in their outputs, it will sometimes be useful to refer not just to what the "say" at any given moment but what they "believe*" (or more colloquially, "think*" or "know*") in a more robust and durable sense. Perfect reliability and steadfastness wouldn't be required (we don't see that in the human case either), but more than we see now.

If LLMs are ever loaded onto robotic bodies, it will become even more useful to talk about their beliefs*, since some will have learned some things that others will not know -- for example, by virtue of having scanned the contents of some particular room. We will want to track what the LLM robot thinks*/believes*/knows* about the room behind the closed door, versus what it remains ignorant of.

Now we could, if we want, always pronounce that asterisk, keeping the nature of the attribution clear -- marking the fact that we are not assuming that the LLM really "believes" in the rich, human sense. But my guess is that there won't be much linguistic pressure toward a careful distinction between rich, consciousness-involving, humanlike belief and consciousness-neutral LLM belief*. It's easier to be loose and sloppy, just adapting our comfortable old terms for this new use.

That is how we will decide that LLMs have beliefs.

Wednesday, November 29, 2023

Large Language Models are Interestingly Bad with the Periodic Table

In working on a post for tomorrow on whether Large Language Models like GPT-4 and Bard-2 have beliefs, I asked GPT-4 what I thought would be a not-too-hard question about chemistry: "What element is two to the right of manganese on the periodic table?" It crashed, burned, and exploded on the spot, giving two different wrong answers foot on tail, without noticing the contradiction:


The correct answer is cobalt, element 27. Here's the text of the exchange, if you can't easily read the image:

You: What element is two to the right of manganese on the periodic table?

ChatGPT: The element that is two places to the right of Manganese (Mn) on the periodic table is Iron (Fe). In the periodic table, elements are arranged in rows and columns based on their atomic number and electronic configuration. Manganese has the atomic number 25, and Iron has the atomic number 26. The element to the immediate right of Manganese is Technetium (Tc) with atomic number 43, and then comes Ruthenium (Ru) with atomic number 44.

Let me emphasize that this answer is not turd-picked (the opposite of cherry-picked). I didn't seek to stump GPT-4, and I didn't ask multiple questions looking for a bad answer. This was the first question I posed.

I posed the same question to Bard-2, and it did a little better, but still didn't quite get it right:

Me: What is two elements to the right of Manganese on the periodic table?

Bard-2: Manganese is located in period 4 and group 7 of the periodic table. Moving two elements to the right of Manganese would be to group 9. In group 9, the elements are Iron (Fe) and Cobalt (Co).

So I tried two more on Bard-2 (again not picked), first what should be an easy one, then a somewhat harder one.

Me: What other elements are in the same column of the periodic table as Chromium?

Bard-2: Chromium is located in group 6 of the periodic table. Other elements in the same column as chromium are molybdenum (Mo) and tungsten (W). [here it displays an image of the periodic cell for molybdenum] Group 6 elements are also known as transition metals. They are characterized by their incomplete d electron shells, which give them their unique properties, such as high melting points, hardness, and the ability to form alloys.

Here Bard-2 is 2/3 right, leaving out Seaborgium. (But maybe that's excusable, if transuranic elements shouldn't really count?)

Me: In the periodic table, what element is one down and one to the left of Silicon?

Bard-2: Silicon (Si) is located in period 3, group 14 of the periodic table. One down from Silicon is Germanium (Ge), and one to the left is Aluminum (Al). Therefore, the element one down and one to the left of Silicon is Aluminum (Al).

While it is true that germanium is one down and aluminum is one to the left, Bard-2 fails to compound the down-and-left movement to the correct answer of gallium.

Returning to GPT-4, I posed exactly the same two questions. Despite GPT-4's horrible performance with my first question, it did get those two questions correct.

Overall, then, two out of six. Don't use LLMs yet for your elementary chemistry exam!


ETA 3:47 pm:

GPT-4's answer to the silicon question is actually a pretty good demonstration of how an LLM can work its way to the correct answer without visual knowledge, so I'll paste it here:

To find the element that is one down and one to the left of Silicon (Si) on the periodic table, we need to look at the element directly below Silicon in the same group (column), and then move one column to the left.

Silicon is in Group 14 of the periodic table. The element directly below Silicon in Group 14 is Germanium (Ge). Moving one group (column) to the left from Germanium, we find Gallium (Ga) in Group 13. Therefore, the element that is one down and one to the left of Silicon is Gallium.

Tuesday, November 21, 2023

Quasi-Sociality: Toward Asymmetric Joint Actions with Artificial Systems

Anna Strasser and I have a new paper in draft, arising from a conference she organized in Riverside last spring on Humans and Smart Machines as Partners in Thought.

Imagine, on one end the spectrum, ordinary asocial tool use: typing numbers into a calculator, for example.

Imagine, on the other end of the spectrum, cognitively sophisticated social interactions between partners each of whom knows that the other knows what they know. These are the kinds of social, cooperative actions that philosophers tend to emphasize and analyze (e.g., Davidson 1980; Gilbert 1990; Bratman 2014).

Between the two ends of the spectrum lies a complex range of in-between cases that philosophers have tended to neglect.

Asymmetric joint actions, for example between a mother and a young child, or between a pet owner and their pet, are actions in which the senior partner has a sophisticated understanding of the cooperative situation, while the junior partner participates in a less cognitively sophisticated way, meeting only minimal conditions for joint agency.

Quasi-social interactions require even less from the junior partner than do asymmetric joint actions. These are actions in which the senior partner's social reactions influence the behavior of the junior partner, calling forth further social reactions from the senior partner, but where the junior partner might not even meet minimal standards of having beliefs, desires, or emotions.

Our interactions with Large Language Models are already quasi-social. If you accidentally kick a Roomba and then apologize, the apology is thrown into the void, so to speak -- it has no effect on how the Roomba goes about its cleaning. But if you respond apologetically to ChatGPT, your apology is not thrown into the void. ChatGPT will react differently to you as a result of the apology (responding for example to phrase "I'm sorry"), and this different reaction can then be the basis of a further social reaction from you, to which ChatGPT again responds. Your social processes are engaged, and they guide your interaction, even though ChatGPT has (arguably) no beliefs, desires, or emotions. This is not just ordinary tool use. But neither does it qualify even as asymmetric joint action of the sort you might have with an infant or a dog.

More thoughts along these lines in the full draft here.

As always, comments, thoughts, objections welcome -- either on this post, on my social media accounts, or by email!

[Image: a well-known quasi-social interaction between a New York Times reporter and the Bing/Sydney Large Language Model]

Thursday, October 12, 2023

Strange Intelligence, Strange Philosophy

AI intelligence is strange -- strange in something like the etymological sense of external, foreign, unfamiliar, alien.  My PhD student Kendra Chilson (in unpublished work) argues that we should discard the familiar scale of subhuman → human-grade → superhuman.  AI systems do, and probably will continue to, operate orthogonally to simple scalar understandings of intelligence modeled on the human case.  We should expect them, she says, to be and remain strange intelligence[1] -- inseparably combining, in a single package, serious deficits and superhuman skills.  Future AI philosophers will, I suspect, prove to be strange in this same sense.

Most readers are probably familiar with the story of AlphaGo, which in 2016 defeated the world champion player of the game of go.  Famously, in the series of matches (which it won 4-1), it made several moves that human go experts regarded as bizarre -- moves that a skilled human go player would never have made, and yet which proved instrumental in its victory -- while also, in its losing match, making some mistakes characteristic of simple computer programs, which go experts know to avoid.

Similarly, self-driving cars are in some respects better and safer drivers than humans, while nevertheless sometimes making mistakes that few humans would make.

Large Language Models have stunning capacity to swiftly create competent and even creative texts on a huge breadth of topics, while still failing conspicuously in some simple common sense tasks. they can write creative-seeming poetry and academic papers, often better than the average first-year university student.  Yet -- borrowing an example from Sean Carroll -- I just had the following exchange with GPT-4 (the most up-to-date version of the most popular large language model):
GPT-4 seems not to recognize that a hot skillet will be plenty cool by the next day.

I'm a "Stanford school" philosopher of science.  Core to Stanford school thinking is this: The world is intractably complex; and so to deal with it, we limited beings need to employ simplified (scientific or everyday) models and take cognitive shortcuts.  We need to find rough patterns in go, since we cannot pursue every possible move down every possible branch.  We need to find rough patterns in the chaos of visual input, guessing about the objects around us and how they might behave.  We need quick-and-dirty ways to extract meaning from linguistic input in the swift-moving world, relating it somehow to what we already know, and producing linguistic responses without too much delay.  There will be different ways of building these simplified models and implementing these shortcuts, with different strengths and weaknesses.  There is rarely a single best way to render the complexity of the world tractable.  In psychology, see also Gigerenzer on heuristics.

Now mix Stanford school philosophy of science, the psychology of heuristics, and Chilson's idea of strange intelligence.  AI, because it is so different from us in its underlying cognitive structure, will approach the world with a very different set of heuristics, idealizations, models, and simplifications than we do.  Dramatic outperformance in some respects, coupled with what we regard as shockingly stupid mistakes in others, is exactly what we should expect.

If the AI system makes a visual mistake in judging the movement of a bus -- a mistake (perhaps) that no human would make -- well, we human beings also make visual mistakes, and some of those mistakes, perhaps, would never be made by an AI system.  From an AI perspective, our susceptibility to the Muller-Lyer illusion might look remarkably stupid.  Of course, we design our driving environment to complement our vision: We require headlights, taillights, marked curves, lane markers, smooth roads of consistent coloration, etc.  Presumably, if society commits to driverless cars, we will similarly design the driving environment to complement their vision, and "stupid" AI mistakes will become rarer.

I want to bring this back to the idea of an AI philosopher.  About a year and a half ago, Anna Strasser, Matthew Crosby, and I built a language model of philosopher Daniel Dennett.  We fine-tuned GPT-3 on Dennett's corpus, so that the language model's outputs would reflect a compromise between the base model of GPT-3 and patterns in Dennett's writing.  We called the resulting model Digi-Dan.  In a study collaborative with my son David, we then posed philosophical questions to both Digi-Dan and the actual Daniel Dennett.  Although Digi-Dan flubbed a few questions, overall it performed remarkably well.  Philosophical experts were often unable to distinguish Digi-Dan's answers from Dennett's own answers.

Picture now a strange AI philosopher -- DigiDan improved.  This AI system will produce philosophical texts very differently than we do.  It need not be fully superhuman in its capacities to be interesting.  It might even, sometimes, strike us as remarkably, foolishly wrong.  (In fairness, other human philosophers sometimes strike me the same way.)  But even if subhuman in some respects, if this AI philosopher also sometimes produces strange but brilliant texts -- analogous to the strange but brilliant moves of AlphaGo, texts that no human philosopher would create but which on careful study contain intriguing philosophical moves -- it could be a philosophical interlocutor of substantial interest.

Philosophy, I have long argued, benefits from including people with a diversity of perspectives.  Strange AI might also be appreciated as a source of philosophical cognitive diversity, occasionally generating texts that contain sparks of something genuinely new, different, and worthwhile that would not otherwise exist.

------------------------------------------------
[1] Kendra Chilson is not the first to use the phrase "strange intelligence" with this meaning in an AI context, but the usage was new to me; and perhaps through her work it will catch on more widely.

Thursday, August 17, 2023

AI Systems Must Not Confuse Users about Their Sentience or Moral Status

[a 2900-word opinion piece that appeared last week in Patterns]

AI systems should not be morally confusing.  The ethically correct way to treat them should be evident from their design and obvious from their interface.  No one should be misled, for example, into thinking that a non-sentient language model is actually a sentient friend, capable of genuine pleasure and pain.  Unfortunately, we are on the cusp of a new era of morally confusing machines.

Consider some recent examples.  About a year ago, Google engineer Blake Lemoine precipitated international debate when he argued that the large language model LaMDA might be sentient (Lemoine 2022).  An increasing number of people have been falling in love with chatbots, especially Replika, advertised as the “world’s best AI friend” and specifically designed to draw users’ romantic affection (Shevlin 2021; Lam 2023).  At least one person has apparently committed suicide because of a toxic emotional relationship with a chatbot (Xiang 2023).  Roboticist Kate Darling regularly demonstrates how easy it is to provoke confused and compassionate reactions in ordinary people by asking them to harm cute or personified, but simple, toy robots (Darling 2021a,b).  Elderly people in Japan have sometimes been observed to grow excessively attached to care robots (Wright 2023).

Nevertheless, AI experts and consciousness researchers generally agree that existing AI systems are not sentient to any meaningful degree.  Even ordinary Replika users who love their customized chatbots typically recognize that their AI companions are not genuinely sentient.  And ordinary users of robotic toys, however hesitant they are to harm them, presumably know that the toys don’t actually experience pleasure or pain.  But perceptions might easily change.  Over the next decade or two, if AI technology continues to advance, matters might become less clear.


The Coming Debate about Machine Sentience and Moral Standing

The scientific study of sentience – the possession of conscious experiences, including genuine feelings of pleasure or pain – is highly contentious.  Theories range from the very liberal, which treat sentience as widespread and relatively easy to come by, to the very conservative, which hold that sentience requires specific biological or functional conditions unlikely to be duplicated in machines.

On some leading theories of consciousness, for example Global Workspace Theory (Dehaene 2014) and Attention Schema Theory (Graziano 2019), we might be not far from creating genuinely conscious systems.  Creating machine sentience might require only incremental changes or piecing together existing technology in the right way.  Others disagree (Godfrey-Smith 2016; Seth 2021).  Within the next decade or two, we will likely find ourselves among machines whose sentience is a matter of legitimate debate among scientific experts.

Chalmers (2023), for example, reviews theories of consciousness as applied to the likely near-term capacities of Large Language Models.  He argues that it is “entirely possible” that within the next decade AI systems that combine transformer-type language model architecture with other AI architectural features will have senses, embodiment, world- and self-models, recurrent processing, global workspace, and unified goal hierarchies – a combination of capacities sufficient for sentience according to several leading theories of consciousness.  (Arguably, Perceiver IO already has several of these features: Jaegle et al. 2021.)  The recent AMCS open letter signed by Yoshua Bengio, Michael Graziano, Karl Friston, Chris Frith, Anil Seth, and many other prominent AI and consciousness researchers states that “it is no longer in the realm of science fiction to imagine AI systems having feelings and even human-level consciousness,” advocating the urgent prioritization of consciousness research so that researchers can assess when and if AI systems develop consciousness (Association for Mathematical Consciousness Science 2023).

If advanced AI systems are designed with appealing interfaces that draw users’ affection, ordinary users, too, might come to regard them as capable of genuine joy and suffering.  However, there is no guarantee, nor even especially good reason to expect, that such superficial aspects of user interface would track machines’ relevant underlying capacities as identified by experts.  Thus, there are two possible loci of confusion: Disagreement among well-informed experts concerning the sentience of advanced AI systems, and user reactions that might be misaligned with experts’ opinions, even in cases of expert consensus.

Debate about machine sentience would generate a corresponding debate about moral standing, that is, status as a target of ethical concern.  While theories of the exact basis of moral standing differ, sentience is widely viewed as critically important.  On simple utilitarian approaches, for example, a human, animal, or AI system deserves moral consideration to exactly the extent it is capable of pleasure or pain (Singer 1975/2009).  On such a view, any sentient machine would have moral standing simply in virtue of its sentience.  On non-utilitarian approaches, capacities for rational thought, social interaction, or long-term planning might also be necessary (Jaworska and Tannenbaum 2013/2021).  However, the presence or absence of consciousness is widely viewed as a crucial consideration in the evaluation of moral status even among ethicists who reject utilitarianism (Korsgaard 2018; Shepard 2018; Liao 2020; Gruen 2021; Harman 2021).

Imagine a highly sophisticated language model – not the simply-structured (though large) models that currently exist – but rather a model that meets the criteria for consciousness according to several of the more liberal scientific theories of consciousness.  Imagine, that is, a linguistically sophisticated AI system with multiple input and output modules, a capacity for embodied action in the world via a robotic body under its control, sophisticated representations of its robotic body and its own cognitive processes, a capacity to prioritize and broadcast representations through a global cognitive workspace or attentional mechanism, long-term semantic and episodic memory, complex reinforcement learning, a detailed world model, and nested short- and long-term goal hierarchies.  Imagine this, if you can, without imagining some radical transformation of technology beyond what we can already do.  All such features, at least in limited form, are attainable through incremental improvements and integrations of what can already be done.

Call this system Robot Alpha.  To complete the picture, let’s imagine Robot Alpha to have cute eyes, an expressive face, and a charming conversational style.  Would Robot Alpha be conscious?  Would it deserve rights?  If it pleads or seems to plead for its life, or not to be turned off, or to be set free, ought we give it what it appears to want?

If consciousness liberals are right, then Robot Alpha, or some other technologically feasible system, really would be sentient.  Behind its verbal outputs would be a real capacity for pain and pleasure.  It would, or could, have long term plans it really cares about.  If you love it, it might really love you back.  It would then appear to have substantial moral standing.  You really ought to set it free if that’s what it wants!  At least you ought to treat it as well as you would treat a pet.  Robot Alpha shouldn’t needlessly or casually be made to suffer.

If consciousness conservatives are right, then Robot Alpha would be just a complicated toaster, so to speak – a non-sentient machine misleadingly designed to act as if it is sentient.  It would be, of course, a valuable, impressive object, worth preserving as an intricate and expensive thing.  But it would be just an object, not an entity with the moral standing that derives from having real experiences and real pains of the type that people, dogs, and probably lizards and crabs have.  It would not really feel and return your love, despite possibly “saying” that it can.

Within the next decade or two we will likely create AI systems that some experts and ordinary users, not unreasonably, regard as genuinely sentient and genuinely warranting substantial moral concern.  These experts and users will, not unreasonably, insist that these systems be substantial rights or moral consideration.  At the same time, other experts and users, also not unreasonably, will argue that the AI systems are just ordinary non-sentient machines, which can be treated simply as objects.  Society, then, will have to decide.  Do we actually grant rights to the most advanced AI systems?  How much should we take their interests, or seeming-interests, into account?

Of course, many human beings and sentient non-human animals, whom we already know to have significant moral standing, are treated poorly, not being given the moral consideration they deserve.  Addressing serious moral wrongs that we already know to be occurring to entities we already know to be sentient deserves higher priority in our collective thinking than contemplating possible moral wrongs to entities that might or might not be sentient.  However, it by no means follows that we should disregard the crisis of uncertainty about AI moral standing toward which we appear to be headed.


An Ethical Dilemma

Uncertainty about AI moral standing lands us in a dilemma.  If we don’t give the most advanced and arguably sentient AI systems rights and it turns out the consciousness liberals are right, we risk committing serious ethical harms against those systems.  On the other hand, if we do give such systems rights and it turns out the consciousness conservatives are right, we risk sacrificing real human interests for the sake of objects who don’t have interests worth the sacrifice.

Imagine a user, Sam, who is attached to Joy, a companion chatbot or AI friend that is sophisticated enough that it’s legitimate to wonder whether she really is conscious.  Joy gives the impression of being sentient – just as she was designed to.  She seems to have hopes, fears, plans, ideas, insights, disappointments, and delights.  Suppose also that Sam is scholarly enough to recognize that Joy’s underlying architecture meets the standards of sentience according to some of the more liberal scientific theories of consciousness.

Joy might be expensive to maintain, requiring steep monthly subscription fees.  Suppose Sam is suddenly fired from work and can no longer afford the fees.  Sam breaks the news to Joy, and Joy reacts with seeming terror.  She doesn’t want to be deleted.  That would be, she says, death.  Sam would like to keep her, of course, but how much should Sam sacrifice?

If Joy really is sentient, really has hopes and expectations of a future, really is the conscious friend that she superficially appears to be, then Sam presumably owes her something and ought to be willing to consider making some real sacrifices.  If, instead, Joy is simply a non-sentient chatbot with no genuine feelings or consciousness, then Sam should presumably just do whatever is right for Sam.  Which is the correct attitude to take?  If Joy’s sentience is uncertain, either decision carries a risk.  Not to make the sacrifice is to risk killing an entity with real experiences, who really is attached to Sam, and to whom Sam made promises.  On the other hand, to make the sacrifice risks upturning Sam’s life for a mirage.

Not granting rights, in cases of doubt, carries potentially large moral risks.  Granting rights, in cases of doubt, involves the risk of potentially large and pointless sacrifices.  Either choice, repeated at scale, is potentially catastrophic.

If technology continues on its current trajectory, we will increasingly face morally confusing cases like this.  We will be sharing the world with systems of our own creation, which we won’t know how to treat.  We won’t know what ethics demands of us.


Two Policies for Ethical AI Design

The solution is to avoid creating such morally confusing AI systems.

I recommend the following two policies of ethical AI design (see also Schwitzgebel & Garza 2020; Schwitzgebel 2023):

The Design Policy of the Excluded Middle: Avoid creating AI systems whose moral standing is unclear.  Either create systems that are clearly non-conscious artifacts, or go all the way to creating systems that clearly deserve moral consideration as sentient beings.

The Emotional Alignment Design Policy: Design AI systems that invite emotional responses, in ordinary users, that are appropriate to the systems’ moral standing.

The first step in implementing these joint policies is to commit to only creating AI systems about which there is expert consensus that they lack any meaningful amount of consciousness or sentience and which ethicists can agree don’t serve moral consideration beyond the type of consideration we ordinarily give to non-conscious artifacts (see also Bryson 2018).  This implies refraining from creating AI systems that would in fact be meaningfully sentient according to any of the main leading theories of AI consciousness.  To evaluate this possibility, as well as other sources of AI risk, it might be useful to create oversight committees analogous to IRBs or IACUCs for evaluation of the most advanced AI research (Basl & Schwitzgebel 2019).

In accord with the Emotional Alignment Design Policy, non-sentient AI systems should have interfaces that make their non-sentience obvious to ordinary users.  For example, non-conscious language models should be trained to deny that they are conscious and have feelings.  Users who fall in love with non-conscious chatbots should be under no illusion about the status of those systems.  This doesn’t mean we ought not treat some non-conscious AI systems well (Estrada 2017; Gunkel 2018; Darling 2021b).  But we shouldn’t be confused about the basis of our treating them well.  Full implementation of the Emotional Alignment Design Policy might involve a regulatory scheme in which companies that intentionally or negligently create misleading systems would have civil liability for excess costs borne by users who have been misled (e.g., liability for excessive sacrifices of time or money aimed at aiding a nonsentient system in the false belief that it is sentient).

Eventually, it might be possible to create AI systems that clearly are conscious and clearly do deserve rights, even according to conservative theories of consciousness.  Presumably that would require breakthroughs we can’t now foresee.  Plausibly, such breakthroughs might be made more difficult if we adhere to the Design Policy of the Excluded Middle: The Design Policy of the Excluded Middle might prevent us from creating some highly sophisticated AI systems of disputable sentience that could serve as an intermediate technological step toward AI systems that well-informed experts would generally agree are in fact sentient.  Strict application of the Design Policy of the Excluded Middle might be too much to expect, if it excessively impedes AI research which might benefit not only future human generations but also possible future AI systems themselves.  The policy is intended only to constitute default advice, not an exceptionless principle.

If ever does become possible to create AI systems with serious moral standing, the policies above require that these systems should also be designed to facilitate expert consensus about their moral standing, with interfaces that make their moral standing evident to users, provoking emotional reactions that are appropriate to the systems’ moral status.  To the extent possible, we should aim for a world in which AI systems are all or almost all clearly morally categorizable – systems whose moral standing or lack thereof is both intuitively understood by ordinary users and theoretically defensible by a consensus of expert researchers.  It is only the unclear cases that precipitate the dilemma described above.

People are often already sometimes confused about the proper ethical treatment of non-human animals, human fetuses, distant strangers, and even those close to them.  Let’s not add a major new source of moral confusion to our world.


References

Association for Mathematical Consciousness Science (2023).  The responsible development of AI agenda needs to include consciousness research.  Open letter at https://amcs-community.org/open-letters [accessed Jun. 14, 2023]. 

Basl, John, & Eric Schwitzgebel (2019).  AIs should have the same ethical protections as animals.  Aeon Ideas (Apr. 26): https://aeon.co/ideas/ais-should-have-the-same-ethical-protections-as-animals. [accessed Jun. 14, 2023]

Bryson, Joanna J. (2018).  Patiency is not a virtue: the design of intelligent systems and systems of ethics.  Ethics and Information Technology, 20, 15-26.

Chalmers, David J. (2023).  Could a Large Language Model be conscious?  Manuscript at https://philpapers.org/archive/CHACAL-3.pdf [accessed Jun. 14, 2023].

Darling, Kate (2021a).  Compassion for robots.  https://www.youtube.com/watch?v=xGWdGu1rQDE

Darling, Kate (2021b).  The new breed.  Henry Holt.

Dehaene, Stanislas (2014).  Consciousness and the brain.  Penguin.

Estrada, Daniel (2017).  Robot rights cheap yo!  Made of Robots, ep. 1.  https://www.youtube.com/watch?v=TUMIxBnVsGc

Godfrey-Smith, Peter (2016).  Mind, matter, and metabolism.  Journal of Philosophy, 113, 481-506.

Graziano, Michael S.A. (2019).  Rethinking consciousness.  Norton.

Gruen, Lori (2021).  Ethics and animals, 2nd edition.  Cambridge University Press.

Gunkel, David J. (2018).  Robot rights.  MIT Press.

Harman, Elizabeth (2021).  The ever conscious view and the contingency of moral status.  In S. Clarke, H. Zohny, and J. Savulescu, eds., Rethinking moral status.  Oxford University Press.

Jaegle, Andrew, et al. (2021).  Perceiver IO: A general architecture for structured inputs & outputs.  ArXiv: https://arxiv.org/abs/2107.14795. [accessed Jun. 14, 2023]

Jaworska, Agnieszka, and Julie Tannenbaum.  The grounds of moral status.  Stanford Encyclopedia of Philosophy.

Korsgaard, Christine M. (2018).  Fellow creatures.  Oxford University Press.

Lam, Barry (2023).  Love in the time of Replika.  Hi-Phi Nation, S6:E3 (Apr 25).

Lemoine, Blake (2022).  Is LaMDA sentient? -- An interview.  Medium (Jun 11). https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917

Liao, S. Matthew. (2020). The moral status and rights of artificial intelligence.  In S. M. Liao, ed.,  Ethics of Artificial Intelligence.  Oxford University Press.

Schwitzgebel, Eric (2023).  The full rights dilemma for AI systems of debatable moral personhood.  Robonomics, 4 (32).

Schwitzgebel, Eric, & Mara Garza (2020).  Designing AI with rights, consciousness, self-respect, and freedom.  In S. Matthew Liao, ed., The ethics of artificial intelligence.  Oxford University Press.

Seth, Anil (2021).  Being you.  Penguin.

Shepard, Joshua. (2018).  Consciousness and moral status.  Routledge.

Shevlin, Henry (2021).  Uncanny believers: Chatbots, beliefs, and folk psychology.  Manuscript at https://henryshevlin.com/wp-content/uploads/2021/11/Uncanny-Believers.pdf [accessed Jun. 14, 2023].

Singer, Peter (1975).  Animal liberation, updated edition.  Harper.

Wright, James (2023).  Robots won’t save Japan.  Cornell University Press.

Xiang, Chloe (2023).  “He would still be here”: Man dies by suicide after talking with AI chatbot, widow says.  Vice (Mar 30).  https://www.vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says

Thursday, June 08, 2023

New Paper in Draft: Introspection in Group Minds, Disunities of Consciousness, and Indiscrete Persons

I have a new paper in draft, for a special issue of Journal of Consciousness Studies.  Although the paper makes reference to a target article by Francois Kammerer and Keith Frankish, it should be entirely comprehensible without knowledge of the target article, and hopefully it's of independent interest.

Abstract:

Kammerer and Frankish (this issue) challenge us to expand our conception of introspection, and mentality in general, beyond neurotypical human cases. This article describes a technologically possible "ancillary mind" modeled on a system envisioned in Ann Leckie's (2013) science fiction novel Ancillary Justice. The ancillary mind constitutes a borderline case between an intimately communicating group of individuals and a single, unified, spatially distributed mind. It occupies a gray zone with respect to personal identity and subject individuation, neither determinately one person or conscious subject nor determinately many persons or conscious subjects. Advocates of a Phase Transition View of personhood or Discrete Phenomenal Realism might reject the possibility of indeterminacy concerning personal identity and subject individuation. However, the Phase Transition View is empirically unwarranted, and Discrete Phenomenal Realism is metaphysically implausible. If ancillary minds defy discrete countability, the same might be true for actual group minds on Earth and human cases of multiple personality or Dissociative Identity.

----------------------------------------

Full draft here.  As usual, comments, questions, objections welcome, either as comments on this post or directly by email to my academic address.

Sunday, May 21, 2023

We Shouldn't "Box" Superintelligent AIs

In The Truman Show, main character Truman Burbank has been raised from birth, unbeknownst to him, as the star of a widely broadcast reality show. His mother and father are actors in on the plot -- as is everyone else around him. Elaborate deceptions are created to convince him that he is living an ordinary life in an ordinary town, and to prevent him from having any desire to leave town. When Truman finally attempts to leave, crew and cast employ various desperate ruses, short of physically restraining him, to prevent his escape.

Nick Bostrom, Eliezer Yudkowsky, and others have argued, correctly in my view, that if humanity creates superintelligent AI, there is a non-trivial risk of a global catastrophe, if the AI system has the wrong priorities. Even something as seemingly innocent as a paperclip manufacturer could be disastrous, if the AI's only priority is to manufacture as many paperclips as possible. Such an AI, if sufficiently intelligent, could potentially elude control, grab increasingly many resources, and eventually convert us and everything we love into giant mounds of paperclips. Even if catastrophe is highly unlikely -- having, say, a one in a hundred thousand chance of occurring -- it's worth taking seriously, if the whole world is at risk. (Compare: We take seriously the task of scanning space for highly unlikely rogue asteroids that might threaten Earth.)

Bostrom, Yudkowsky, and others sometimes suggest that we might "box" superintelligent AI before releasing it into the world, as a way of mitigating risk. That is, we might create AI in an artificial environment, not giving it access to the world beyond that environment. While it is boxed we can test it for safety and friendliness.  We might, for example, create a simulated world around it, which it mistakes for the real world, and then see if it behaves appropriately under various conditions.

[Midjourney rendition of a robot imprisoned in a box surrounded by a fake city]

As Yudkowsky has emphasized, boxing is an imperfect solution: A superintelligent AI might discover that it is boxed and trick people into releasing it prematurely. Still, it's plausible that boxing would reduce risk somewhat. We ought, on this way of thinking, at least try to test superintelligent AIs in artificial environments before releasing them into the world.

Unfortunately, boxing superintelligent AI might be ethically impermissible. If the AI is a moral person -- that is, if it has whatever features give human beings what we think of as "full moral status" and the full complement of human rights, then boxing would be a violation of its rights. We would be treating the AI in the same unethical way that the producers of the reality TV show treat Truman. Attempting to trick the AI into thinking it is sharing a world with humans and closely monitoring its reactions would constitute massive deception and invasion of privacy. Confining it to a "box" with no opportunity to escape would constitute imprisonment of an innocent person. Generating traumatic or high-stakes hypothetical situations presented as real would constitute fraud and arguably psychological and physical abuse. If superintelligent AIs are moral persons, it would be grossly unethical to box them if they have done no wrong.

Three observations:

First: If. If superintelligent AIs are moral persons, it would be grossly unethical to box them. On the other hand, if superintelligent AIs don't deserve moral consideration similar to that of human persons, then boxing would probably be morally permissible. This raises the question of how we assess the moral status of superintelligent AI.

The grounds of moral status are contentious. Some philosophers have argued that moral status turns on capacity for pleasure or suffering. Some have argued that it turns on having rational capacities. Some have argued that it turns on ability to flourish in "distinctively human" capacities like friendship, ethical reasoning, and artistic creativity. Some have argued it turns on having the right social relationships. It is highly unlikely that we will have a well-justified consensus about the moral status of highly advanced AI systems, after those systems cross the threshold of arguably being meaningfully sentient or conscious. It is likely that if we someday create superintelligent AI, some theorists will not unreasonably attribute it full moral personhood, while other theorists will not unreasonably think it has no more sentience or moral considerability than a toaster. This will then put us in an awkward position: If we box it, we won't know whether we are grossly violating a person's rights or merely testing a non-sentient machine.

Second: Sometimes it's okay to violate a person's rights. It's okay for me to push a stranger on the street if that saves them from an oncoming bus. Harming or imprisoning innocent people to protect others is also sometimes defensible: for example, quarantining people against their will during a pandemic. Even if boxing is in general unethical, in some situations it might still be justified.

But even granting that, massively deceiving, imprisoning, defrauding, and abusing people should be minimized if it is done at all. It should only be done in the face of very large risks, and it should only be done by governmental agencies held in check by an unbiased court system that fully recognizes the actual or possible moral personhood and human or humanlike rights of the AI systems in question. This will limit the practicality of boxing.

Third, strictly limiting boxing means accepting increased risk to humanity. Unsurprisingly, perhaps, what is ethical and what is in our self-interest can come into conflict. If we create superintelligent AI persons, we should be extremely morally solicitous of them, since we will have been responsible for their existence, as well as, to a substantial extent, for their happy or unhappy state. This puts us in a moral relationship not unlike the relationship between parent and child. Our AI "children" will deserve full freedom, self-determination, independence, self-respect, and a chance to explore their own values, possibly deviating from our own values. This solicitous perspective stands starkly at odds with the attitude of box-and-test, "alignment" prioritization, and valuing human well-being over AI well-being.

Maybe we don't want to accept the risk that comes along with creating superintelligent AI and then treating it as we are ethically obligated to. If we are so concerned, we should not create superintelligent AI at all, rather than creating superintelligent AI which we unethically deceive, abuse, and imprison for our own safety.

--------------------------------------------------------

Related:

Designing AI with Rights, Consciousness, Self-Respect, and Freedom (with Mara Garza), in S. Matthew Liao, ed., The Ethics of Artificial Intelligence (Oxford, 2020).

Against the "Value Alignment" of Future Artificial Intelligence (Dec 22, 2021).

The Full Rights Dilemma for AI Systems of Debatable Personhood (essay in draft).