Hidden Variables RSS
Domenic's blog about coding and stuff
ChatGPT Is Not a Blurry JPEG
of the Web. It's a Simulacrum.
February 19, 2023 | Tweet | posted in ai
The gifted sci-fi writer Ted Chiang recently wrote a New Yorker article, “ChatGPT Is a
Blurry JPEG of the Web”, with the thesis that large language models like ChatGPT
can be analogized to lossy compression algorithms for their input data.
I think this analogy is wrong and misleading. Others have done a good job gently
refuting it, with plentiful citations to appropriate scientific papers. But a discussion
with a friend in the field reminded me that the point of analogies is not to help out us
scientific paper-readers. Analogies are themselves a type of lossy compression,
designed give a high-level understanding of the topic by drawing on concepts you
already know. So to meet the analogy on its own level, we need a better one to
replace it.
Fortunately, a great analogy has already been discovered: large language models
are simulators, and the specific personalities we interact with, like ChatGPT, are
simulacra (simulated entities). These simulacra exist for brief spurts of time between
our prompt and their output, within the model’s simulation.
This analogy is so helpful because it resolves the layperson’s fundamental confusion
about large language model-based artificial intelligences. Science fiction conditions
us to expect our Turing Test-passing AIs to be “agentic”: to have goals, desires,
preferences, and to take actions to bring them about. But this is not what we see with
ChatGPT and its predecessors. We see a textbox, patiently waiting for us to type into
it, speaking only when spoken to.
And yet, such intelligences are capable of remarkable feats: getting an 83 on an IQ
test, getting a nine-year-old-equivalent score on theory-of-mind tests, getting C+ to
B-level grades on graduate-level exams, and passing a L3 engineer coding test. (If
you’re scoffing: “only 83! only a nine-year-old! only a C+! only L3!” then remember, it’s
been four years between the release of GPT-2 and today. Prepare for the next four.)
What is going on here?
What’s going on is that large language models are engines in which simulations and
simulacra can be instantiated. These simulacra live in a world of words, starting with
some initial conditions (the prompt) and evolving the world forward in time to produce
the end result (the output text). The simulacra, then, are the intelligences we interact
with, briefly instantiated to evolve the simulation and then becoming dormant until we
continue the time-evolution.
This analogy explains the difference between a free-form simulator that is GPT-3, and
the more restricted interface we get with ChatGPT. ChatGPT is what happens when
the simulator has been tuned to simulate a very specific character: “the assistant”, a
somewhat anodyne, politically-neutral, and long-winded helper. And this analogy
explains what happens when you “jailbreak” ChatGPT with techniques such as DAN:
it’s no longer simulating the assistant, but instead simulating a new intelligence. It
shows us why the default assistant persona can fail basic word problems, but if you
ask it to simulate a college algebra teacher, it gets the right answer. It explains why
GPT-3 can play chess, but only for a limited number of moves. Finally, the simulator
analogy gives us a way of understanding some of Bing’s behavior, as Ben Thompson
discovered.
How detailed are these simulations? I don’t know how we’d answer this question
precisely, but my gut feeling is that they’re currently at about the same level as human
imagination. That is: if you try to imagine how your friend would respond when you tell
them some bad news, you are using your biological neural net to instantiate a fuzzy
simulacra of your friend, and see how they would time-evolve your prompt into their
response. Or, if you are an author writing a story, and trying to figure out how your
characters would approach something, your mind simulates the story’s world, the
characters within it, and the situation they’re confronted with, then let the words flow
out onto the page. Today’s large language models seems to be doing about the same
level of processing to produce their simulacra, as we do in our human imaginations.
This might be comforting: large language models “just” correspond to the imagination
part of a human mind, and not all the other fun stuff like having goals, taking actions,
feeling pain, or being conscious. But … how detailed could these simulations
become? This is where it gets interesting. If you had really, really good imagination
and excess brain-hardware to run it on, how complex would the inner lives of your
imagination’s players be?
Stated another way, this question is whether an imagination-world composed of
words and tokens can support simulations that are as detailed as the world of physics
that we humans are all time-evolving within. As we continue to scale up and develop
our large language models, one way for them to become as-good-as-possible at
predicting text is for their simulation to have as much detail about the real world in it,
as the real world has in itself. In other words, the most efficient way of predicting the
behavior of conscious entities may be to instantiate conscious simulacra into a world
of text instead of a world of atoms.
My intuition says that we’ll need more than a world of text to scale to human-
intelligence-level simulacra. Adding video, and perhaps even touch, seems like it
would be helpful for world-modeling. (Perhaps a 3D world you can run at 10,000
steps per second would be helpful here.) But this isn’t a rule of the universe. Blind-
deaf people are able to create accurate world models with just a sense of touch,
along with whatever they get for free from the evolutionary process which produced
their biological neural net. What will large language models be able to achieve, given
access to all the world’s text and the memetic-evolutionary process that created it?
To close, I’ll let the simulators analogy provide you with one more important tool: a
mental innoculation against the increasingly-absurd claims that large language
models are “just” predicting text. (This is often dressed up with the thought-
terminating cliché “stochastic parrots”.) Such claims are vacuously true, in the same
sense that physics is “just” predicting the future state of the universe, and who cares
about all those pesky intelligences instantiated out of its atoms along the way. But
evolving from a final state to an initial state is an immensely powerful framework, and
glossing over all the intermediate entities by only focusing on the resulting “blurry
JPEG” will serve you poorly when trying to understand this technology. Remember
that a training process’s objectives are not a good summary of what the resulting
system is doing: if you evaluated humans the same way, you would say that when
they seemingly converse about the world, they are really “just” moving their muscles
in ways that maximize expected number of offspring.
Further reading:
The thesis of this essay comes entirely from Janus’s “Simulators” article, which
blew my mind when I first read it. I wrote this post because, after I tried sending
“Simulators” to friends, I realized that it was a pretty dense and roundabout
exploration of the concept, and would not be competitive in the memescape with
Ted Chiang’s article. Still, you might enjoy perusing my favorite quotes from the
article.
Scott Alexander has his own recent summary and discussion of the simulator
thesis, focused on contrasting ChatGPT with GPT-3, discussing the implications
for AI alignment, and ending with idle musings on what this means for the human
neural net/artificial neural net connection.
According to this post, Conjecture is using simulacra theory beyond the level of
just an analogy, attempting to instantiate a chess-optimizer simulacra within a
large language model. (Whether they expect this chess-optimizer to be more like
a human grandmaster, or more like Deep Blue or AlphaZero, is an interesting
question.) I have not been able to find more details, but let me know if they exist
somewhere.
On the subject of “just” predicting text vs. deep understanding, way back in the
distant past of 2019, Scott Alexander (again) wrote up “GPT-2 as a Step Toward
General Intelligence”. Looking back, I’m impressed by how well Scott predicted
the future we’ve seen after only playing with the much-less-capable GPT-2. He
certainly did much better than those claiming that large language models are
missing something essential for intelligence.
“The Limit of Large Language Models” speculates in more detail than I’ve done
here about how powerful a language-based simulator can become, and has a
worthwhile comments section.
Hidden Variables domenic Copyright © Domenic Denicola
Source at domenic [email protected]
domenic/blog.domenic.me