0% found this document useful (0 votes)
37 views10 pages

Imagination Getting Real About Ai White Paper 25

This white paper by Dennis Laudick discusses the complexities and misconceptions surrounding AI processors, particularly focusing on the distinction between cloud AI and edge AI. It emphasizes the evolution of AI hardware, including CPUs, GPUs, and NPUs, and their respective roles in handling AI workloads efficiently. The paper advocates for a heterogeneous computing approach that combines different processor types to optimize performance and power efficiency in AI applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views10 pages

Imagination Getting Real About Ai White Paper 25

This white paper by Dennis Laudick discusses the complexities and misconceptions surrounding AI processors, particularly focusing on the distinction between cloud AI and edge AI. It emphasizes the evolution of AI hardware, including CPUs, GPUs, and NPUs, and their respective roles in handling AI workloads efficiently. The paper advocates for a heterogeneous computing approach that combines different processor types to optimize performance and power efficiency in AI applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

White Paper

GETTING REAL ABOUT


AI PROCESSORS
By Dennis Laudick
VP of Product Management at Imagination Technologies April 2025

© Imagination Technologies 1
Getting real about AI processors

There’s a lot of confusion and hype around AI.


Nearly every service, product or subject area
in the technology industry now has an AI label.
A lot of this is valid and there’s no doubt that
AI is opening up new capabilities and higher
productivity across all industries.

However, in far too many cases, the AI connection


can be tenuous or, in the worst cases, simply
misleading. This paper categorises AI and related
hardware options, with a particular focus on
on-device (i.e. edge) AI, giving readers a practical
background with which you can better understand
this new wave of excitement (and sometimes hype)
around the world of AI.

© Imagination Technologies 2
Getting real about AI processors

What do you mean by AI?


While AI as a term in the semiconductor industry is in its relative infancy,
the technology is sufficiently advanced that subdivisions are helpful.
Initial AI concepts that need to be understood are:

Cloud AI: When the computing happens off-device in a data-centre or remote desktop.
Cutting-edge AI algorithms (sometimes called foundation models) typically start life in the
cloud, and the cloud still hosts the most sophisticated AI applications i.e. the most accurate
and highest performance Generative AI tools. These applications typically involve high levels
of complexity and more computing resource demands than a single device or PC can deliver
on its own.

Edge AI: Once AI algorithms are proven to be functional in a cloud setting, they typically enter an
optimisation phase that reduces the computational demands of the algorithm while retaining an
acceptable level of accuracy. The result of this phase is edge AI: an algorithm that is practical to
run on a more resource (whether power, memory or cost) constrained device like a phone, car,
drone or camera. There’s a huge range of applications in the edge AI space, ranging from highly
optimised Generative AI or large language models to convolutional neural networks (CNNs) for
computer vision, or tiny networks used to learn about something as simple as watch battery
drainage patterns over time.

Equivalent power consumption per 100 queries

Small language model

Large language model

AI training: The process of initially creating an AI model to meet a particular use case. Almost all
approaches involve building out a large number of ‘nodes’ into a complex matrix relationship (or
network) and then passing massive volumes of sample data thought it to ‘train’ the model, for
example, running 1 million pictures of cats through a model so it can ‘learn’ what a cat looks like
in this model. Training is typically done by data scientists in cloud environments and can often
involve truly staggering amounts of data and data processing (for example, it’s been reported
that ChatGPT-4 has cost over $100M to train11). Increasingly, lightweight training can also take
place at the edge to support AI on private devices.

© Imagination Technologies 3
Getting real about AI processors

Hidden layer 1 Hidden layer 2 Hidden layer 3

Input Output
layer layer

AI inference: Much simpler and far less computationally expensive than training. It is the
process of running new stimulus through a pre-trained model and asking it to do its job. In the
cat recognition example, you pass a new picture through the model and it simply gives you the
probability that the picture has a cat in it. The majority of edge AI (and AI in general) is inference.

Evolving AI Models and the Technology Lifecycle


Only a decade ago, Convolutional Neural Networks (CNNs) were all the rage in AI and were
doing things we had previously found incredibly difficult in computer science, like – wait for
it – identifying cats in pictures! However, by using CNNs and Recurrent Neural Networks
(RNNs) speech recognition and image recognition had now achieved >99% accuracy and the
technologists and academics then turned their focus onto resource optimisation. Not a week
went by without numerous articles on how people had managed to achieve the same capability
and accuracy but in 1/10th, 1/100th, even 1/1000th the power and processing.

This same process has taken place across all technical history. Significant breakthroughs are
typically, initially expensive but, once shown to be possible, they can be optimised over time
in order to achieve the same result with fewer resources. This cycle of capability leaps and
resource optimisation will repeat itself forever.

Shortly after CNNs, transformer models emerged and revolutionised generative AI; since then,
diffusion models have also emerged as a solution that shows great promise for generative
tasks. Each of these newer models are now in the process of being optimised to run more power
efficiently, both to save operating costs for cloud AI use cases and to enable them to run on
more devices at the edge, for a variety of reasons. The AI model landscape is still in an early
phase and continuing to deliver incredible breakthroughs on a regular basis.

© Imagination Technologies 4
Getting real about AI processors

General Purpose AI Hardware


Now that we have a general understanding of some of the high-level concepts of AI, let’s start
talking about the hardware that’s needed to run it. First, we need to look at how computing was
done ‘pre-AI’ and the evolution of different types of general purpose processors.

Sequential Processing
Traditionally, computing was done on what are called scalar or sequential processors – most
commonly represented by a Central Processing Unit (CPU). In simple terms, these processors
do one action, finish it and then move on to the next action. They are very easy to understand
and can be used for almost any type of computing. Historically, as software got more complex,
CPUs simply got faster.

However, the CPU’s sequential approach to computing has a limitation.


It can only do one thing at a time, and there are some tasks which don’t fit well into this.

The Birth of Parallelism


The idea of parallel processing was invented to speed up the tasks that CPUs can’t do well.
Parallel processing breaks up a very large workload into many small, independent workloads
which can all be run at the same time.

One of the original tasks that needed the acceleration of parallelism was pixel processing.
A graphical user interface or computer game needs to calculate each of the millions of pixels
on a screen at least 30 times per second. This was something CPUs simply couldn’t do fast
enough. However, it was clear that if the data for each pixel could be calculated independently
and in parallel, then the necessary performance would be achieved. That’s what lead to the first
Graphics Processor Unit (GPU). GPUs are now found on nearly every device with a screen, from
desktops to smartwatches.

Originally, GPUs were very limited in what they could do. However, over time, as user interfaces
became richer and game scenes became more detailed, each pixel required more complex
processing to achieve the right result. To handle this, GPUs became highly flexible, highly
programmable, general purpose, parallel computing accelerators. Their flexibility means they
can run software beyond just graphics, all in a highly parallelised and highly efficient manner.

© Imagination Technologies 5
Getting real about AI processors

Sequential Processing Hits Its Limits


One of the modern challenges of computer science is that, in recent years, CPU designers
and semiconductor technologists are struggling to boost CPU performance… but software
is continuing to get more complex!

About the time CPU performance leaps started to slow down, AI emerged. As covered above, AI
fundamentally involves complex matrices or networks with a vast number of ‘nodes’ (potentially
10s of billions at the time of writing… and growing fast). Calculating all these ‘nodes’ sequentially
is only practical for the simplest of AI networks.

To address this, many CPUs have adopted special adaptations like vector engines which can
help ‘some’ with AI workloads, but they are still constrained by a processor which is fundamental
sequential in nature and simple not designed to deal with the very high data bandwidth and sheer
volume of computations needed in AI. Running most AI software on CPUs means waiting a long
time for responses and very high power consumption. There is a practical limit in how much AI a
CPU can support.

Parallelism To The Rescue!


But it turns out that, in most cases, a ‘node’ in an AI network bears a lot of similarities to a pixel on
a screen: they can be isolated and run independently in parallel, they consist of a defined subset
of calculations, they rely on clever data management. This similarity between pixels and nodes,
combined with the high degrees of programmability and flexibility in modern GPUs, means that
with a few, targeted modifications a GPU is an ideal tool for running AI networks.

Not All GPUs are the Same


Recognising this fact, cloud AI is already predominantly GPU based. The GPUs for cloud-based
AI training deal with incredibly complex networks in whatever shape, size or description that
their programmers choose. They also process an almost unfathomable amount of training
data. Therefore, these GPUs are designed primarily for maximum flexibility and maximum raw
performance. To achieve this, they can be very large and tend to sit in a data centre with huge
amounts of power, cooling and data bandwidth available. Even at their most powerful, it can
still take a huge number of data centre GPUs days, weeks or months to finish a training course,
consuming megawatts of power.

Edge AI inference, on the other hand, presents a very different challenge. The GPUs needed for
on-device AI inference can’t rely on brute force performance as these processors are smaller
and far more limited in terms of available power and memory. They need to be smarter and more
efficient than cloud GPUs in order to deliver the benefits of modern AI but without draining a
phone battery or significantly impacting an electric vehicle’s range.

Fortunately, modern GPUs embedded in devices likes phones and cars have had decades of
optimisations related to executing data-heavy workloads with high performance and low power
consumption – a massive feat when you consider the levels of processing involved. They are
more than capable of delivering against the requirements of edge AI inference.

© Imagination Technologies 6
Getting real about AI processors

What about NPUs or AI Processors?


Just focusing on CPUs and GPUs doesn’t quite give a complete picture when it comes to AI
hardware. There are also Neural Processor Units (NPUs), sometimes referred to as AIPUs or
other similar ‘brain’ or ‘intelligence’ terms.

NPUs are completely unstandardised and come in a wild variety of forms. Some are rather
exotic (like neuromorphic or in-memory type computing), some highly optimised for a
specific AI network, and some have a level of flexibility built into them.

However, nearly all fall foul to one or more of the following programmability challenges

 They are proprietary by nature. There is no standard approach for NPU design,
which makes them difficult for third party software developers to use.

T
 hey have no standard software interfaces and typically lack tooling, which
again makes them difficult to program.

 They are often optimised for a specific type of AI network or class of networks. This means
that while they are highly efficient at those specific networks, they struggle to adapt when
new AI networks arrive. They can have a short shelf life in the rapidly evolving AI world.

 They all have different strengths and weaknesses, meaning software that runs well
on one NPU may perform badly on another.

Even with these programmability problems, there is a place for NPUs. They are best
viewed as a dedicated hardware accelerator. If you have a known workload shape and
you want to run it as fast or as efficiently as possible, dedicated hardware acceleration
will always give you the best results.

Flexibility Efficiency

Figure 1 - GPU provides both flexibility and efficiency for AI

However, in our current situation where AI workloads are changing year-on-year, then a
level of flexibility and general purpose acceleration is still needed in AI hardware systems
to future-proof devices.

© Imagination Technologies 7
Getting real about AI processors

Let’s Not Forget Software


AI hardware and software are intricately tied together. When creating AI solutions it can
be tempting to divide these worlds and allow each to manage their own problem space.
But you don’t have anything without both these components coming together.

When creating an AI hardware platform, it’s critical to consider what software it needs to
run and who will be developing these applications. This is the big area where dedicated
hardware accelerators fall down. They are so focused on a specific task, they typically lack
the supporting software and tools that make them useful to the wider software community.

Software is the superpower of general purpose processors. They all ship with the foundational
layers, standard interfaces and higher level tooling that enable the hardware and software worlds
to come together and make the magic happen.

This is where CPUs are strong – they’ve had nearly half a century of software ecosystem
development. Though younger, the GPU software ecosystem is also sufficiently mature to
make it easy for AI developers to take advantage of their parallelism. There are plenty of
well-defined and accepted software standards, like oneAPI, OpenCL, SYCL and TVM, that
enable developers to use the GPU for AI, and analysis, trace and debug tools are quickly
being adapted to cover AI workloads in addition to their original graphics.

The Age of Parallel Processing


As AI infuses itself across all devices, more and more parallel processors are being integrated
into hardware systems. The trajectory seems clear: edge AI hardware of the future will include
both a general purpose sequential processor (CPU) for control tasks, a general purpose parallel
processor (GPU) for flexible graphics and AI programming, and, if needed, a dedicated hardware
accelerator (NPU) for predetermined AI functions.

This approach of combining multiple different processor types into one system is known as
‘heterogenous computing’ and has been in use at the edge for many years. However, stagnant
CPU performance combined with ramping AI workloads is driving the adoption of GPUs into
heterogeneous edge systems. Their parallelism provides the performance, programmability
and power efficiency that the software community need from edge AI hardware.

© Imagination Technologies 8
Getting real about AI processors

Bio

This paper has been written by Dennis Laudick, VP of Product


Management at Imagination Technologies who is on a mission
to provide clarity around AI. It leans on learnings from his
decades of experience across mobile, automotive, and
consumer electronics industries. Before joining Imagination
Technologies, Dennis held leadership roles in the automotive,
AI, and GPU divisions within Arm. Prior to that, he worked at
other leading semiconductor and OEM companies. He has
designed and brought to market numerous generations of
GPUs and four generations of AI processors, including the
latest GPU from Imagination.

Dennis Laudick
VP of Product Management
at Imagination Technologies

© Imagination Technologies 9
imaginationtech.com
Contact us now

© Imagination Technologies 10

You might also like