ChatGPT Is Not A Blurry JPEG of The Web. It's A Simulacrum.

Uploaded by

Tyson Woolman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views5 pages

ChatGPT Is Not A Blurry JPEG of The Web. It's A Simulacrum.

Uploaded by

Tyson Woolman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Hidden Variables RSS

Domenic's blog about coding and stuﬀ

ChatGPT Is Not a Blurry JPEG

of the Web. It's a Simulacrum.
February 19, 2023 | Tweet | posted in ai

The gifted sci-fi writer Ted Chiang recently wrote a New Yorker article, “ChatGPT Is a
Blurry JPEG of the Web”, with the thesis that large language models like ChatGPT
can be analogized to lossy compression algorithms for their input data.

I think this analogy is wrong and misleading. Others have done a good job gently
refuting it, with plentiful citations to appropriate scientific papers. But a discussion
with a friend in the field reminded me that the point of analogies is not to help out us
scientific paper-readers. Analogies are themselves a type of lossy compression,
designed give a high-level understanding of the topic by drawing on concepts you
already know. So to meet the analogy on its own level, we need a better one to
replace it.

Fortunately, a great analogy has already been discovered: large language models
are simulators, and the specific personalities we interact with, like ChatGPT, are
simulacra (simulated entities). These simulacra exist for brief spurts of time between
our prompt and their output, within the model’s simulation.

This analogy is so helpful because it resolves the layperson’s fundamental confusion

about large language model-based artificial intelligences. Science fiction conditions
us to expect our Turing Test-passing AIs to be “agentic”: to have goals, desires,
preferences, and to take actions to bring them about. But this is not what we see with
ChatGPT and its predecessors. We see a textbox, patiently waiting for us to type into
it, speaking only when spoken to.

And yet, such intelligences are capable of remarkable feats: getting an 83 on an IQ

test, getting a nine-year-old-equivalent score on theory-of-mind tests, getting C+ to
B-level grades on graduate-level exams, and passing a L3 engineer coding test. (If
you’re scoﬃng: “only 83! only a nine-year-old! only a C+! only L3!” then remember, it’s
been four years between the release of GPT-2 and today. Prepare for the next four.)
What is going on here?

What’s going on is that large language models are engines in which simulations and
simulacra can be instantiated. These simulacra live in a world of words, starting with
some initial conditions (the prompt) and evolving the world forward in time to produce
the end result (the output text). The simulacra, then, are the intelligences we interact
with, briefly instantiated to evolve the simulation and then becoming dormant until we
continue the time-evolution.

This analogy explains the diﬀerence between a free-form simulator that is GPT-3, and
the more restricted interface we get with ChatGPT. ChatGPT is what happens when
the simulator has been tuned to simulate a very specific character: “the assistant”, a
somewhat anodyne, politically-neutral, and long-winded helper. And this analogy
explains what happens when you “jailbreak” ChatGPT with techniques such as DAN:
it’s no longer simulating the assistant, but instead simulating a new intelligence. It
shows us why the default assistant persona can fail basic word problems, but if you
ask it to simulate a college algebra teacher, it gets the right answer. It explains why
GPT-3 can play chess, but only for a limited number of moves. Finally, the simulator
analogy gives us a way of understanding some of Bing’s behavior, as Ben Thompson
discovered.

How detailed are these simulations? I don’t know how we’d answer this question
precisely, but my gut feeling is that they’re currently at about the same level as human
imagination. That is: if you try to imagine how your friend would respond when you tell
them some bad news, you are using your biological neural net to instantiate a fuzzy
simulacra of your friend, and see how they would time-evolve your prompt into their
response. Or, if you are an author writing a story, and trying to figure out how your
characters would approach something, your mind simulates the story’s world, the
characters within it, and the situation they’re confronted with, then let the words flow
out onto the page. Today’s large language models seems to be doing about the same
level of processing to produce their simulacra, as we do in our human imaginations.

This might be comforting: large language models “just” correspond to the imagination
part of a human mind, and not all the other fun stuﬀ like having goals, taking actions,
feeling pain, or being conscious. But … how detailed could these simulations
become? This is where it gets interesting. If you had really, really good imagination
and excess brain-hardware to run it on, how complex would the inner lives of your
imagination’s players be?

Stated another way, this question is whether an imagination-world composed of

words and tokens can support simulations that are as detailed as the world of physics
that we humans are all time-evolving within. As we continue to scale up and develop
our large language models, one way for them to become as-good-as-possible at
predicting text is for their simulation to have as much detail about the real world in it,
as the real world has in itself. In other words, the most eﬃcient way of predicting the
behavior of conscious entities may be to instantiate conscious simulacra into a world
of text instead of a world of atoms.

My intuition says that we’ll need more than a world of text to scale to human-
intelligence-level simulacra. Adding video, and perhaps even touch, seems like it
would be helpful for world-modeling. (Perhaps a 3D world you can run at 10,000
steps per second would be helpful here.) But this isn’t a rule of the universe. Blind-
deaf people are able to create accurate world models with just a sense of touch,
along with whatever they get for free from the evolutionary process which produced
their biological neural net. What will large language models be able to achieve, given
access to all the world’s text and the memetic-evolutionary process that created it?

To close, I’ll let the simulators analogy provide you with one more important tool: a
mental innoculation against the increasingly-absurd claims that large language
models are “just” predicting text. (This is often dressed up with the thought-
terminating cliché “stochastic parrots”.) Such claims are vacuously true, in the same
sense that physics is “just” predicting the future state of the universe, and who cares
about all those pesky intelligences instantiated out of its atoms along the way. But
evolving from a final state to an initial state is an immensely powerful framework, and
glossing over all the intermediate entities by only focusing on the resulting “blurry
JPEG” will serve you poorly when trying to understand this technology. Remember
that a training process’s objectives are not a good summary of what the resulting
system is doing: if you evaluated humans the same way, you would say that when
they seemingly converse about the world, they are really “just” moving their muscles
in ways that maximize expected number of oﬀspring.