0% found this document useful (0 votes)
44 views14 pages

The Paradigm Shifts in Artificial Intelligence

info

Uploaded by

kogatiy753
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views14 pages

The Paradigm Shifts in Artificial Intelligence

info

Uploaded by

kogatiy753
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

@ACM-Class I.2.

The Paradigm Shifts in Artificial Intelligence


Vasant Dhar 1
July 2023

Abstract. Kuhn’s framework of scientific conversations, assistance, decision-making, or


progress (Kuhn, 1962) provides a useful framing code generation – for which it wasn’t explicitly
of the paradigm shifts that have occurred in trained. The scientific history of AI provides a
Artificial Intelligence over the last 60 years. The backdrop for evaluating and discussing the
framework is also useful in understanding what capabilities and limitations of this new
is arguably a new paradigm shift in AI, signaled technology, and the challenging that lie ahead.
by the emergence of large pre-trained systems Economics Nobel Laureate Herbert Simon, one
such as GPT-3, on which conversational agents of the fathers of Artificial Intelligence, described
such as ChatGPT are based. Such systems make Artificial Intelligence as a “science of the
intelligence a commoditized general purpose artificial.” (Simon, 1970). In contrast to the
technology that is configurable to applications. natural sciences, which describe the world as it
In this paper, I summarize the forces that led to exists, a science of the artificial is driven by a
the rise and fall of each paradigm, and discuss goal, of creating machine intelligence.
the pressing issues and risks associated with the According to Simon, this made AI a science of
current paradigm shift in AI. design and engineering. Pre-trained models have
greatly expanded the design aspirations of AI,
from crafting high performing systems in
narrowly-specified applications, to becoming
general-purpose and without boundaries,
1. Introduction applicable to anything involving intelligence.
Artificial Intelligence (AI) captured the world’s The evolution of AI can be understood through
attention in 2023 with the emergence of pre- Kuhn’s (1962) theory of scientific progress in
trained models such as GPT-3, on which the terms of “paradigm shifts.” A paradigm is
conversational AI system ChatGPT is based. For essentially a set of theories and methods
the first time, we can converse with an entity, accepted by the community to guide inquiry. It’s
however imperfectly, about anything, as we do a way of thinking. Kuhn describes science as a
with humans. This new capability provided by process involving occasional “revolutions”
pre-trained models has created a paradigm shift stemming from crises faced by the dominant
in AI, transforming it from an application to a theories, followed by periods of “normal
general purpose technology that is configurable science” where the details of the new paradigm
to specific uses. Whereas historically an AI are fleshed out. Over time, as the dominant
model was trained to do one thing well, it is now paradigm fails to address an increasing number
usable for a variety of tasks such as general of important anomalies or challenges, we see a

New York University, Stern School of Business and the Center for Data Science
paradigm shift to a new set of theories and The practical challenges of the knowledge
methods – a new way of thinking that better acquisition bottleneck led to the next paradigm
addresses them. shift. As more data became available,
researchers developed learning algorithms that
A key feature of paradigms is that they have
could automatically create rules or models
“exemplars” that guide problem formulation and
directly from the data using mathematical,
solution. In physics, for example, the models
statistical, or logical, methods, guided by a user-
describing the laws of motion, like Kepler’s or
specified objective function.
Newton’s laws of motion, could serve as
exemplars that drive hypothesis formulation, That’s where we are today. Systems such as
observation, and hypothesis testing. In AI, ChatGPT employ variants of neural networks
exemplars define the core principles and called Transformers that provide the architecture
methods for knowledge extraction, of large language models (LLMs), which are
representation, and use. Early approaches trained directly from the collection of human
favored methods for declaring human-specified expression available on the Internet. They use
knowledge as rules using symbols to describe complex mathematical models with billions of
the world, and an “inference engine” to parameters that are estimated from large
manipulate the symbols – which was viewed as amounts of publicly available data. While
“reasoning.” In contrast, current methods have language has been a key area of advancement in
shifted towards learning more complex recent years, these approaches have been used to
statistical representations of the world that are enable machines to learn from other modalities
derived almost entirely from data. The latter of data including vision, sound, smell, and
tend to be better at dealing with the contextual touch. What is particularly important today is
subtleties and complexity that we witness in the shift from building specialized applications
problems involving language, perception and of AI to one where knowledge and intelligence
cognition. don’t have specific boundaries, but transfer
seamlessly across applications and to novel
The paradigm shifts in AI have been driven by
situations.
methods that broke through major walls that
were considered to be significant at the time.
The first generation of AI research in the late
50s and 60s was dominated by game playing 2. The Paradigm Shifts in AI
search algorithms (Samuel, 1959, 2000) that led
To understand the state of the art of AI and
to novel ways for searching various kinds of
where it is heading, it is important to understand
graph structures. But this type of mechanical
its scientific history, including the bottlenecks
search provided limited insight into intelligence,
that stalled progress in each paradigm and the
where real-world knowledge seemed to play a
degree to which they were addressed by each
major role in solving problems, such as in
paradigm shift.
medical diagnosis and planning. Expert Systems
provided a way forward, by representing domain Figure 1 sketches out the history of Artificial
expertise and intuition in the form of explicit Intelligence from the Expert Systems era –
rules and relationships that could be invoked by which spanned the late sixties to the late 80s – to
an inference mechanism. But these systems were the present.
hard to create and maintain. A knowledge
engineer needed to define each relationship
manually and consider how it would be invoked Expert Systems and Symbolic AI
in making inferences.
Expert systems are attractive in narrow, well- The prototypical exemplar for representing
circumscribed domains in which human knowledge in this paradigm were symbolic
expertise is identifiable and definable. They relationships expressed in the form of
perform well at specific tasks where this “IF/THEN” rules (Buchanan et.al 1969),
expertise can be extracted through interactions “semantic networks,” (Quillian, 1968) or
with humans, and it is typically represented in structured object representations (Minsky,
terms of relationships among situations and 1974). But it was difficult to express uncertainty
outcomes. The driving force in that paradigm in terms of these representations, let alone
was to apply AI to diagnosis, planning, and combine such uncertainties during inference,
design across a number of domains including which prompted the development of more
healthcare, science, engineering, and business. principled graphical models for representing
The thinking was that if such systems performed uncertainty in knowledge using probability
at the level of human experts, they were theory (Pearl, 1988).
intelligent.
The exemplar was shaped by the existing models
of cognition from Psychology, which viewed
humans as having a long-term and a short-term
memory, and a mechanism for evoking them in a
specific context. The knowledge declared by
humans in expert systems, such as the rule
“excess bilirubin  high pallor” constituted
their long-term memory. An interpreter, also
known as the inference engine or “control
regime,” evoked the rules depending on the
An early success in medicine was the Internist context, and updated its short-term memory
system (Pople, 1982), which performed accordingly. If a patient exhibited unusually
diagnosis in the field of internal medicine. high pallor for example, this symptom was noted
Internist represented expert knowledge using in short-term memory, and the appropriate rule
causal graphs and hierarchies relating diseases to was evoked from long-term memory to
symptoms. The rule-based expert system Mycin hypothesize its cause, such as “excess bilirubin.”
(Buchanan and Shortliffe, 1975) was another In effect, symbolic AI separated the declaration
early demonstration of diagnostic reasoning of knowledge from its application.
involving blood diseases. Other medical
applications included the diagnosis of renal Research in natural language processing was
failure (Gorry, et.al 1973) and glaucoma along similar lines, with researchers seeking to
(Kulikowski and Weiss 1982). discover the rules of language. The expectation
was that once these were fully specified, a
In addition to applications in medicine, expert machine would follow these rules in order to
systems were also successful in a number of understand and generate language (Schank,
other domains such as engineering (Tzafestas, 1990; 1991). This turned out to be exceedingly
1993), accounting (Brown, 1991), tax planning difficult to achieve.
(Shpilberg et.al 1986)., configuration of
computer systems (McDermott, 1982), The major hurdle of this paradigm of top-down
monitoring industrial plants (Nelson, 1982), knowledge specification was the “knowledge
mineral prospecting (Hart et.al 1978), and engineering bottleneck.” It was challenging to
identifying new kinds of chemical molecules extract reliable knowledge from experts, and
(Feigenbaum et.al, 1970). equally difficult to represent and combine
uncertainty in terms of rules. Collaborations
between experts and knowledge engineers could optimization. Many such algorithms emerged,
take years or even decades, and the systems but the common thread among them was that
became brittle at scale. Furthermore, researchers they belonged to the broad class of “function
found that expert systems would often make approximation” methods that used data and a
errors in common-sense reasoning, which user-defined objective function to guide
seemed intertwined with specialized knowledge. knowledge discovery.
Evaluating such systems was also difficult, if
This shift in perspective transformed the
one ever got to that stage. Human reasoning and
machine into a generator and tester of
language seemed much too complex and
hypotheses that used optimization – the loss
heterogenous to be captured by top-down
function – to focus knowledge discovery. This
specification of relationships. Progress stalled,
ability made machines capable of automated
as the reality, both in research and practice, fell
inquiry without a human in the loop. Instead of a
short of expectations.
being a passive repository of knowledge, the
machine became an active “what if” explorer,
capable of asking and evaluating its own
Machine Learning
questions. This enabled data-driven scientific
The supervised machine learning paradigm discovery (Hey et.al, 2009).
emerged in the late 80s and 90s, with the
The epistemic criterion in machine learning for
maturing of database technology, the emergence
something to count as knowledge was accurate
of the Internet, and the increasing abundance of
prediction (Popper, 1963; Dhar, 2013). This
observational and transactional data (Breiman,
conforms to Popper’s view of using the
et.al, 1984, Quinlan, 1986). AI thinking shifted
predictive power theories as a measure of their
away from spoon-feeding highly specified
goodness. Popper argued that theories that
human abstractions to the machine, and towards
sought only to explain a phenomenon were
automatically learning rules from data, guided
weaker than those that made “bold” ex-ante
by human intuition. While symbolic expert
predictions” that were objectively falsifiable.
systems required humans to specify a model,
Good theories stood the test of time. In his 1963
machine learning enabled the machine to learn
treatise on this subject, Conjectures and
the model automatically from curated examples.
Refutations, Popper characterized Einstein’s
Model discovery was guided by a “loss
theory of relativity as a “good” one, since it
function,” designed to directly or indirectly to
made bold predictions that can be falsified
minimize the system’s overall prediction error,
easily, yet all attempts at falsification of the
which by virtue of the data, could be measured
theory have failed.
in terms of the differences between predictions
and empirical reality. The exemplars for supervised machine learning
are relationships derived from data that is
Empirics provided the ground truth for
specified in (X,y) pairs, where “y” are data
supervision. For example, to learn how to
about the target to be predicted based on a
predict pneumonia, also called the target, one
situation described by the vector of observable
could collect historical medical records of
features “X.” This exemplar has a very general
people with and without pneumonia, intuit and
form: the discovered relationships can be
engineer the features that might be associated
“IF/THEN” rules, graph structures such as
with the target, and let the machine figure out
Bayesian networks (Pearl, 1988; Blei et.al,
the relationships from the data to minimize the
2003), or implicit mathematical functions
overall prediction error. Instead of trying to
expressed via weights in a neural network
specify the rules, the new generation of
(Rosenblatt, 1958; Hopfield, 1982). Once
algorithms could learn them from data using
learned, this knowledge could be viewed as form of images, language, and sound. “Deep
analogous to memory, invoked depending on neural nets” (DNNs), which involve multiple
context, and updatable over time. stacked layers of neurons, form the foundation
of vision and language models (Hinton, 1992;
But there’s no free lunch with machine learning.
LeCun and Bengio, 1998). While learning still
There is a loss of transparency in what the
involves adjusting the weights among the
machine has learned. Neural networks, including
neurons, the “deep” part of the neural
large language models, are particularly opaque
architecture is important in translating the raw
in that it is difficult to assign meanings to the
sensory input automatically into machine-
connections among neurons, let alone
computable data.
combinations of them. Even more significantly,
the machine learning paradigm introduced a new The exemplar in deep learning is a multi-level
bottleneck, namely, requiring the curation of neural network architecture. Adjusting the
available data using some sort of vocabulary that weights among the neurons makes it a universal
the machine can understand. This required that function approximator (Cybenko, 1989), where
the right features be created from the raw data. the machine can approximate any function,
For example, to include an MRI image as input regardless of its complexity, to an acceptable
into the diagnostic reasoning process, the degree of precision. What is unique about DNNs
contents of the image had to be expressed in is the organization of hidden layers between the
terms of features of the vocabulary such as input and output that learn the features implicit
“inflammation” and “large spots on the liver.” in the raw data instead of requiring that they be
Similarly, a physician’s notes about a case had specified by humans. A vision system, for
to be condensed into features that the machine example, might learn to recognize features
could process. This type of feature engineering common to all images, such as lines, curves and
was cumbersome. Specifying the labels colors from the raw images that make up its
accurately could also be costly and time- training data. These can be combined variously
consuming. These were major bottlenecks for to make up more complex image parts such as
the paradigm. windows, doors and street signs that are
represented by “downstream” layers of the deep
What was direly needed was the ability of the
neural network. In other words, the DNN tends
machine to deal directly with the raw data
to have an organization, where more abstract
emanating from the real world, instead of
and latent concepts that are closer to its output
relying on humans to perform the often difficult
are composed from more basic features
translation of feature engineering. Machines
represented in the layers that are closer to the
needed to ingest raw data such as numbers,
input.
images, notes or sounds directly, ideally without
curation by humans. The same ideas have been applied to large
language models (LLMs) from which systems
like ChatGPT are built. They learn the implicit
Deep Learning relationships among things in the world from
large amounts of text from books, magazines,
The next AI paradigm, “Deep learning,” made a web-posts etc. As in vision, we would expect
big dent in the feature engineering bottleneck by layers of the neural network that are closer to the
providing a solution to perception, such as output to represent more abstract concepts,
seeing, reading, and hearing. Instead of requiring relative to layers that are closer to the input.
humans to describe the world for the machine, However, we don’t currently understand how
this generation of algorithms could consume the DNNs organize and use such knowledge, or how
raw input similar to what humans use, in the
they represent relationships in general. This contained chunks, the relationships in a DNN
depends on what they are trained for. are smeared across the weights in the network
and much harder to interpret.
In language modeling, for example, the core
learning task is typically to predict the next Nevertheless, the complexity of the neural
occurrence of an input sequence. This requires a network architecture – typically measured by the
considerable amount of knowledge and number of layers and connections in the neural
understanding of the relationships among the network (its parameters) – is what allows the
different parts of the input. Large language machine to recognize context and nuance. It is
models use a special configuration of the remarkable that the pre-trained LLM can be used
“Transformer” neural architecture, which to explain why a joke is funny, summarize or
represents language as a contextualized interpret a legal document, answer questions,
sequence, where context is represented by and all kinds of other things that is wasn’t
estimating dependencies between each pair of explicitly trained to do.
the input sequence (Vaswani et.al 2017).
Bowman (2023) conjectures that the
Because this pairwise computation grows
autocomplete task was serendipitous: it was just
sharply with the length of the input, engineering
at the right level of difficulty, where doing well
considerations constrain the length of the input
conversationally forced the machine to learn a
sequences – its span of attention, for which
large number of other things about the world. In
LLMs are able to maintain context.
other words, a sufficiently deep understanding
This Transformer architecture holds both long about the world, including common-sense, is
term memory, represented by the connections necessary for language fluency. However,
between neurons, as well as the context of a current-day machines can’t match humans in
conversation – the equivalent of short-term terms of common sense. As of this writing,
memory – using its “attention mechanism,” ChatGPT fails at the Winograd Schema task
which captures the relationships between all (Winograd, 1972), which involves resolving an
parts of the input. For example, it is able to tell ambiguous pronoun in a sentence. For example,
what the pronoun “it” refers to in the sentences when asked what the “it” refers to in the
“The chicken didn’t cross the road because it sentence “the trophy wouldn’t fit into the
was wet” and “The chicken didn’t cross the road suitcase because it was too small,” ChatGPT
because it was tired.” While humans find such thinks that the “it” refers to the trophy. The right
reasoning easy by invoking common-sense, answer requires the use of common-sense, and
previous paradigms failed at such kinds of tasks cannot be determined by structure alone.
that require understanding context. The
These are the challenges for the new and still
architecture also works remarkably well in
emerging paradigm of AI, namely one of
vision, where it is able to capture the correlation
General Intelligence, where expertise and
structure between the various parts of an image.
common-sense can blend together more
The downside is that DNNs are large and seamlessly, as can different modalities of
complex. What pre-trained language models information.
learn as a by-product of learning sequence
Table 1 summarizes the properties of each
prediction is unclear because their knowledge –
paradigm in terms of how knowledge is acquired
the meanings and relationships among things –
(its source), the exemplar that guides problem
is represented in a “distributed” way, in the form
formulation, its capability, and the degree to
of weighted connections among the layers of
which the input is curated. The “+” prefix means
neurons. In contrast to Expert Systems, where
“in addition to the previous case.”
relationships are specified in “localized” self-
Knowledge
Exemplar Capability Data Curation
Source
Expert Systems Human Rules Follows High
+ Discovers
Machine Learning + Databases Rules/Networks Medium
Relationships
Deep Neural + Senses
Deep Learning + Sensory Low
Networks Relationships
Pre-Trained Deep + Understands
General Intelligence + Everything Minimal
Neural Networks the World
Table 1: The Paradigm Shifts in AI

3. General Intelligence Pre-trained models provide the building blocks


for General Intelligence by virtue of being
Pre-trained models are the foundation for the domain-independent, requiring minimal
General Intelligence paradigm. Previous AI curation, 2 and being transferrable across
applications were tuned to a task. In order to applications.
predict pneumonia in a hospital, for example, the
AI model was trained using cases from that The shift to pre-trained models represents a
hospital alone, and wouldn’t necessarily transfer fundamental departure from the previous
to a nearby hospital, let alone a different paradigms, where knowledge was carefully
country. In contrast, General Intelligence is extracted and represented. AI was an
about the ability to integrate knowledge about application, and tacit knowledge and common-
pneumonia with other diseases, conditions, sense reasoning were add-ons that were separate
geographies, etc., like humans are able to do, from expertise. The CYC project (Lenat et.al
and to apply the knowledge to unforeseen 1990) was the first major effort to explicitly
situations. In other words, General Intelligence teach the machine common sense. It didn’t work
refers to an integrated set of essential mental as the designers had hoped. There’s too much
skills that include spatial, numerical, tacit knowledge and common sense in human
mechanical, verbal, reasoning, and common interaction that is evoked depending on context,
sense abilities, which underpin performance on and intelligence is much too complex and
all mental tasks (Cattell, 1963). Such knowledge heterogenous to be compartmentalized and
is easily transferrable across tasks, and can be specified in the form of rules.
applied to novel situations. In contrast, pre-trained models eschew
Each paradigm shift greatly expanded the scope boundaries, as in the pneumonia example.
of applications. Machine Learning brought Rather, they integrate specialized and general
structured databases to life. Deep Learning went knowledge, including data about peoples’
further, enabling the machine to deal with experiences across a range of subjects. Much of
structured and unstructured data about an this type of knowledge became available
application directly from the real world, as because of the Internet, where in the short span
humans are able to do. of a few decades, humanity expressed thousands

2
The data curation in pre-trained LLMs like GPT-3 is training and RLHF (reinforcement learning with Human
primarily in the choice of sources and tokenization. AI Feedback) to curate their response to be socially
Systems like ChatGPT3 use additional conversational acceptable.
of years of its history in terms of language, can understand what we are saying well enough
along with social media and conversational data to maintain a conversation, enables a new kind
on a wide array of subjects. All humans became of interaction, where the machine is able to
potential publishers and curators, providing the acquire high quality training data seamlessly
training data for AI to learn how to “from the wild” and learn in parallel with its
communicate fluently. Hinton describes large operation.
language models like ChatGPT akin to an alien
As in deep learning, the exemplar in General
species that has enthralled us because they speak
Intelligence paradigm is the deep neural
such good English.
network, whose properties we now trying to
It is important to appreciate that in learning to understand, along with the general principles
communicate in natural language, AI has broken that underpin their performance. One such
through two fundamental bottlenecks principle in the area of LLMs is that
simultaneously. First, we are now able to performance improves by increasing model
communicate with machines on our terms. This complexity, data size, and compute power across
required solving a related problem, of a wide range of tasks (Kaplan, et.al, 2020).
integrating and transferring knowledge about the These “scaling laws of AI” indicate that
world, including common sense, seamlessly into predictive accuracy on the autocompletion task
a conversation about any subject. Achieving this improves with increased compute power, model
capability has required the machine to acquire complexity, and data. If this measure of
the various types of knowledge simultaneously – performance on autocompletion is a good proxy
expertise, common sense, and tacit knowledge – for General Intelligence, the scaling laws predict
all of which are embedded in language. Things that LLMs should continue to improve with
are connected in subtle ways, which provides the increases in compute power and data. A related
basis for “meaning” and “understanding,” phenomenon to performance improvement with
which AI pioneer Marvin Minsky describes in scaling may be the “emergence” of new abilities
terms of “associations” and “perspectives:” at certain tipping points of model size (Wei,
et.al, 2022) that don’t exist at smaller model
What is the difference between merely knowing
sizes.
(or remembering, or memorizing) and
understanding? We all agree that to understand At the moment, there are no obvious limits to
something, we must know what it means, and these dimensions in the development of General
that is about as far as we ever get. A thing or Intelligence. On the data front, for example, in
idea seems meaningful only when we have addition to additional language data that will be
several different ways to represent it–different generated by humans on the Internet, other
perspectives and different associations. Then we modalities of data such as video are now
can turn it around in our minds, so to speak: becoming more widely available. Indeed, a
however it seems at the moment, we can see it fertile area of research is how machines will
another way and we never come to a full stop. In integrate data from across multiple sensory
other words, we can 'think' about it. If there modalities including vision, touch and smell,
were only one way to represent this thing or like humans are able to do. In short, we are in
idea, we would not call this representation the early innings of the new paradigm, where we
thinking.(Minsky, 1981) should see continued improvement of pre-
trained models and General Intelligence with
Conversational agents such as ChatGPT display
increases in the volume and variety of data and
a remarkable ability to adapt and combine
computing power. However, this should not
contexts in maintaining conversational
distract us from the fact that several fundamental
coherence, This capability, where the machine
aspects of intelligence are still mysterious, and
unlikely to be answered solely by making of tools to a “General Purpose Technology,”
existing models larger and more complex. from which applications are assembled. Like
electricity, intelligence becomes a commodity.
Nevertheless, it is worth noting that the DNN
exemplar of General Intelligence has been Economists use the term general-purpose
adopted by a number of disciplines including technology –of which electricity and the Internet
psychology, neuroscience, linguistics, and are examples – as a new method for producing
philosophy, that seek to explain intelligence, and inventing that is important enough to have a
meaning, and understanding. This has arguably protracted aggregate economic impact across the
made AI more interdisciplinary by unifying its economy (Jovanovic and Rousseau, 2005).
these with its engineering and design
Bresnahan and Trachtenburg (1995) describe
perspectives. Explaining and understanding the
general purpose technologies in terms of three
behavior of DNNs in terms of a set of core
defining properties:
principles of its underlying disciplines is an
active area of research in the current paradigm. “pervasiveness – they are used as inputs by
many downstream sectors),
The progression towards the General
Intelligence has followed a path of increasing inherent potential for technical improvements,
scope of machine intelligence. The first and
paradigm was “Learn from humans." The next
one was “Learn from curated data." This was innovational complementarities – the
followed by “Learn from any kind of data." The productivity of R&D in downstream sectors
current paradigm is “Learn from all kinds of multiplies as a consequence of innovation in the
data it in a way that transfers to novel general purpose technology, creating
situations.” The latest paradigm shift makes AI a productivity gains throughout the economy.”
general purpose technology and a commodity, How well does the General Intelligence
one that should keep improving in terms of paradigm of AI meet these criteria?
quality with increasing amounts of data and
computing power. Arguably, AI is already pervasive, embedded
increasingly in applications without our
realization. And with the new high bandwidth
human-machine interfaces enabled by
4. AI as a General-Purpose
conversational AI, the quality and volume of
Technology training data that machines like ChatGPT can
Paradigm shifts as defined by Kuhn are followed now acquire as they operate is unprecedented.
by periods of “normal science,” where the Other sensory data from video and other sources
details of the new paradigm are fleshed out. We will continue to lead to improvements in pre-
are in the early stages of one such period. trained models and their downstream
applications.
Despite their current limitations, pre-trained
LLMs and conversational AI have unleashed The last of the three properties, innovation
applications in language and vision, ranging complementarities, may take time to play out at
from support services that require conversational the level of the economy. With previous
expertise to creative tasks such as creating technologies such as electricity and IT, growth
documents or videos. As the capability provided rates were below those attained in the decades
by these pre-trained models grows and becomes immediately preceding their arrival (Jovanovic
embedded in a broad range of industries and and Rousseau, 2005). This phenomenon was
functions, AI is transitioning from a bespoke set also observed by the economist Robert Solow
who famously commented that “IT was Despite the current optimism about AI, the
everywhere except in the productivity statistics.” current paradigm faces a serious challenge of
(Solow, 1987). Erik Brynolffson and his trust, that stems in large part from its
colleagues subsequently explained Solow’s representation of knowledge that is opaque to
observation in terms of the substantial humans. Systems such as ChatGPT can be
complementary investments required to realize trained on orders of magnitude more cases than
the benefits of IT (Brynjolfsson et.al, 2018), a human expert will encounter in their lifetime,
where productivity emerged after a significant but their ability to explain themselves and
lag. With electricity, for example, it took introspect is very limited relative to humans.
decades for society to realize its benefits, since And we can never be sure that they are correct,
motors needed to be replaced, factories needed and not “hallucinating,” that is, filling in their
redesign, and workforces needed to be reskilled. knowledge gaps with answers that look credible
IT was similar, as was the Internet. but are incorrect. It’s like talking to someone
intelligent that you can’t always trust.
AI is similarly in its early stages, where
businesses are scrambling to reorganize business These problems will need to be addressed if we
processes and rethinking the future of work. Just are to trust AI. Since the data for pre-trained
as electricity required the creation of an electric models are not curated, they pick up on the
grid and the redesign of factories, AI will falsehoods, biases, and noise in their training
similarly require a redesign of business data. Systems using LLMs can also be
processes in order to realize productivity gains unpredictable, and systems based on them can
from this new general purpose technology exhibit racist or other kinds of undesirable social
(Brynjolfsson et.al 2023). Such improvements behavior that their designers didn’t intend.
take time to play out, and depend on effective While designers might take great care to prohibit
complementary investments in processes and undesirable behavior via training using
technologies. “reinforcement learning via human feedback”
(RLHF), such guardrails don’t always work as
intended. The machine is relatively inscrutable.
5. Challenges of Current Making AI explainable and truthful is a big
Paradigm: Trust and Law challenge. At the moment, it isn’t obvious
whether this problem is addressable solely by
We should not assume that we have converged the existing paradigm, whether it will require a
on the “right paradigm” for AI. The current new paradigm, or whether it will be addressed
paradigm will undoubtedly give way to one that via an integration of the symbolic and neural
addresses its shortcomings. approaches to computation.
Indeed, paradigm shifts do not always improve The unpredictability of AI systems built on pre-
on previous paradigms in every way, especially trained models also poses new problems for
in their early stages, and the current paradigm is trust. The output of LLM-based AI systems on
no exception. New theories often face resistance the same input can vary, a behavior we associate
and challenges initially, while their details are with humans but not machines (Dhar, 2022). To
being filled in (Laudan, 1978). For example, the the contrary, we expect machines to be
Copernican revolution faced numerous deterministic, not “noisy” or inconsistent like
challenges in explaining certain recorded humans. Until now, we have expected
planetary movements that were explained by the consistency from machines.
existing theory, until new methods and
measurements emerged that provided strong While we might consider the machine’s variance
support for the new theory (Kuhn, 1956). in decision-making as an indication of creativity
– a human-like behavior – it poses severe risks, 2. Bowman, S.R., et.al. Measuring
especially when combined with its inscrutability Progress on Scalable Oversight for
and an uncanny ability to mimic humans. Large Language Models, November
Machines are already able to create “deep fakes” 2022. https://arxiv.org/abs/2211.03540
which can be undistinguishable from human 3. Brynjolfsson, E., Daniel Rock, D., and
creations. We are seeing the emergence of things Syverson, D., Unpacking the AI-
like fake pornography, art, and documents. It is Productivity Paradox, Sloan
exceedingly difficult to detect plagiarism, or to Management Review, January 2018.
even define plagiarism or intellectual property https://sloanreview.mit.edu/article/unpac
theft, given the large corpus of public king-the-ai-productivity-
information on which LLMs have been trained. paradox/?utm_source=twitter&utm_med
When will such risks lie with the creators of pre- ium=social&utm_campaign=sm-direct
trained models, applications that use them, or 4. Brynjolfsson, E., Li, Danielle,
their users? Existing laws are not designed to Raymond, L., Generative AI at Work,
address such problems, and will need to be https://arxiv.org/abs/2304.11771 2023
expanded to recognize them, to limits their risks, 5. Brown, C.E., Expert systems in public
and specify culpability. accounting: Current practice and future
directions, Expert Systems with
Finally, inscrutability also creates a larger,
Applications, Volume 3, Issue 1, 1991,
existential risk to humanity, which could
Pages 3-18, ISSN 0957-4174,
become a crisis for the current paradigm. For
https://doi.org/10.1016/0957-
example, in trying to achieve goals that we give
4174(91)90084-R.
the AI, such as “save the planet,” we have no
6. Caruana, R., Lou, Y., Gehrke, J., Koch,
idea about the sub-goals the machine will create
P., Sturm, M., Elhadad, N., Intelligible
in order to achieve its larger goals. This is
Models for HealthCare: Predicting
known as “the alignment problem,” in that it is
Pneumonia Risk and Hospital 30-day
impossible to determine whether the machine’s
Readmission, KDD 2015, Sydney,
hidden goals are aligned with ours. In saving the
Australia.
planet, for example, the AI might determine that
7. Chalmers, D., Reality+: Virtual Worlds
humans pose the greatest risk to its survival, and
and the Problems of Philosophy. W.W.
hence they should be contained or eliminated
Norton & Co, 2022.
(Bostrom 2014; Russell, 2019, Christian, 2020).
8. Charniak, E., and McDermott, D.,
So, even as we celebrate AI as a technology that Introduction to Artificial Intelligence,
will have far-reaching impacts on society, Addison-Wesley, 1985.
economics, and humanity – potentially 9. Christian, B., The Alignment Problem:
exceeding that of other general purpose Machine Learning and Human Values,
technologies such as electric power and the Brilliance Publishing, 2020.
Internet – trust and alignment remain 10. Cybenko, G. (1989). "Approximation by
disconcertingly unaddressed. They are the most superpositions of a sigmoidal
pressing ones that humanity faces today. function". Mathematics of Control,
Signals, and Systems. 2 (4): 303–314.
11. Dhar, V., Data science and prediction,
Communications of the ACM, Volume
References 56, Issue 12, December 2013.
1. Bostrom, N., SuperIntelligence, Oxford 12. Dhar, V., Bias and Noise in Humans and
University Press, 2014. AI, Journal of Investment Management,
vol 20, number 4, December 2022.
13. Dhar, V., When to Trust Machines With collective computational abilities,
Decisions and When Not To, Harvard Proceedings of the National Academy of
Business Review, May 2016. Sciences. 79 (8): 2554–2558.
14. Doersch, C; Zisserman, A. "Multi-task 22. Jovanovic, B., and Rousseau, P.,
Self-Supervised Visual Learning". 2017 General Purpose Technologies, in
IEEE International Conference on Handbook of Economic Growth, Volume
Computer Vision (ICCV). IEEE: 2070– 1B. Edited by Philippe Aghion and
2079. arXiv:1708.07860. Steven N. Durlauf, 2005.
doi:10.1109/iccv.2017.226. ISBN 978-1- 23. Kaplan, L., et.al 2020. Scaling Laws for
5386-1032-9. S2CID 473729. Neural Language Models
15. Feigenbaum, Edward & Buchanan, https://arxiv.org/pdf/2001.08361.pdf
Bruce & Lederberg, Joshua. (1970). On 24. Kumar, A., Irsoy, O., Ondruska, P.,
generality and problem solving: A case Iyyer, M., Bradbury, J., Gulrajani,
study using the DENDRAL program. I.,Zhong, V., Paulus, R., Socher, R., Ask
Machine Intelligence. 6. me anything: Dynamic memory
16. Hart, P.E., Duda, R.O. & Einaudi, M.T. networks for natural language
PROSPECTOR—A computer-based processing. In International conference
consultation system for mineral on machine learning (pp. 1378-1387).
exploration. Mathematical Geology 10, 25. Kuhn, Thomas S. The Structure of
589–610 (1978). Scientific Revolutions. University of
https://doi.org/10.1007/BF02461988 Chicago Press, 1962.
17. Hey, T., Tansley, S., Tolle, K., Gray, J., 26. Kuhn, Thomas S. The Copernican
The Fourth Paradigm: Data-Intensive Revolution. Harvard Press, 1956.
Scientific Discovery, 2009. 27. Kulikowski, C. A. and Weiss, S.
https://www.microsoft.com/en- M. "Representation of Expert
us/research/wp- Knowledge for Consultation: The
content/uploads/2009/10/Fourth_Paradi CASNET and EXPERT
gm.pdf Projects." Chapter 2 in Szolovits, P.
18. Gorry GA, Kassirer JP, Essig A, and (Ed.) Artificial Intelligence in Medicine.
Schwartz WB (1973). "Decision Westview Press, Boulder,
analysis as the basis for computer-aided Colorado. 1982.
management of acute renal failure". The 28. Laudan, L., Progress and Its Problems,
American Journal of Medicine. 55 (4): University of California Press 1978.
473–484. doi:10.1016/0002- 29. LeCun, Y., and Bengio, Y.,
9343(73)90204-0. PMID 4582702. Convolutional networks for images,
19. Hart,P., Duda, R., Einaudi, M., speech, and time series, in The
PROSPECTOR—A computer-based handbook of brain theory and neural
networks, October 1998 Pages 255–258
consultation system for mineral
https://dl.acm.org/doi/10.5555/303568.3
exploration, Journal of the International
03704
Association for Mathematical Geology 30. Lenat, Douglas B.; Guha, R. V.;
volume 10, number 5, 1978. Pittman, Karen; Pratt, Dexter; Shepherd,
https://doi.org/10.1007/BF02461988 Mary (August 1990). "Cyc: Toward
20. Hinton, G., How neural networks learn Programs with Common Sense".
from experience. Scientific American, Commun. ACM. 33 (8): 30–49.
September 1992. 31. McDermott, J., R1: a rule-based
21. Hopfield, J. J. (1982). Neural networks configurer of computer systems,
and physical systems with emergent Artificial Intelligence, Volume 19, Issue
1, September 1982 pp 39–88 Research and Development. 44: 206–
https://doi.org/10.1016/0004- 226.
3702(82)90021-2 44. Samuel, A. L. (2000). "Some studies in
32. Meng,, K., Sharma, A., Andonian, A. machine learning using the game of
Belinkov, Y., Bau, D., 2022. checkers". IBM Journal of Research and
https://arxiv.org/pdf/2210.07229.pdf Development. IBM. 44: 206–226.
33. Minsky, M., A Framework for doi:10.1147/rd.441.0206
Representing Knowledge, in The 45. Schaeffer, R. Miranda, B., and Koyejo,
Psychology of Computer Vision, P. S., (2023) Are Emergent Properties of
Winston (Ed.), McGraw-Hill, 1975. Large Language Models a Mirage?
34. Minsky, M., Music, Mind, and Meaning, https://arxiv.org/abs/2304.15004
Computer Music Journal, Fall 1981 46. Schank, Roger. The Connoisseur's
35. Pople, H. E (1982), Heuristic Methods Guide to the Mind: How we think, How
for Imposing Structure on Ill-Structured we learn, and what it means to be
Problems, in Szolovits, P. (Ed.) intelligent. Summit Books, 1991.
Artificial Intelligence in Medicine. 47. Schank, Roger. Tell Me A Story: A new
Westview Press, Boulder, look at real and artificial memory.
Colorado. 1982. Scribner's, 1990
36. Nelson, W. R. (1982). "REACTOR: An 48. Shanahan, M., Talking About Large
Expert System for Diagnosis and Language Models.
Treatment of Nuclear Reactors". https://arxiv.org/pdf/2212.03551.pdf
37. Ouyang, L,. et.al, 2022. Training 49. Shortliffe EH, and Buchanan BG
Models to Follow Instructions With (1975). "A model of inexact reasoning in
Human Feedback., medicine". Mathematical Biosciences.
https://arxiv.org/pdf/2203.02155.pdf 23 (3–4): 351–379.
38. Pearl, J., Probabilistic Reasoning in 50. Shpilberg,D., Graham, L., Schatz, H.,
Intelligent Systems, Morgan Kauffman ExperTAXsm: an expert system for
Publishers, 1988. corporate tax planning, Wiley, July 1986
39. Quillian, J.R., Semantic Networks, In https://doi.org/10.1111/j.1468-
Marvin L. Minsky (ed.), Semantic 0394.1986.tb00487.x
Information Processing. MIT Press 51. Szolovits P, Patil RS, and Schwartz WB
(1968) . (1988). "Artificial intelligence in
40. Quinlan, J. R. 1986. Induction of medical diagnosis". Annals of Internal
Decision Trees. Machine Learning Medicine. 108 (1): 80–87.
volume 1, No1 (Mar. 1986), 81–106 52. Tzafestas, S., Expert Systems in
41. Rosenblatt, F. (1958). "The Perceptron: Engineering Applications, Springer-
A Probabilistic Model For Information Verlag, 1993.
Storage And Organization In The 53. Vaswani, A., Shazeer, N., Parmar, N.,
Brain". Psychological Review. 65 (6): Uszkoreit, J., Jones, L., Gomez, A.,
386–408. CiteSeerX 10.1.1.588.3775 Kaiser, L. Polosukhin, I., (2017)
42. Russell, Stuart., Human Compatible: AI Attention is All You Need.,
and the Problem of Control, Penguin, https://arxiv.org/abs/1706.03762
2019. 54. Wei, J., Tay, Y., Bommasini, R., Raffel,
43. Samuel, Arthur L. (1959). "Some C., Zoph, B., Borgeaud, S., Y, D., .,
Studies in Machine Learning Using the Emergent Properties of Large Language
Game of Checkers". IBM Journal of Models, 2022.
https://arxiv.org/pdf/2206.07682.pdf
55. Wei, J., Wang, X., Schuurmans, D.,
Bosma M., Ichter, B., Xia, F., Chi, E.,
Li, Q., Zhou, D., 2023.
https://arxiv.org/pdf/2201.11903.pdf
56. Winograd, Terry (January 1972).
Understanding Natural Language.
Cognitive Psychology. 3 (1): 1–191.
doi:10.1016/0010-0285(72)90002-3

You might also like