As technology manufacturers race to incorporate artificial intelligence into their smartphones, laptops and smart devices, a new class of AI chip has become the center of attention: neural processing units, or NPUs.
Neural Processing Unit (NPU) Definition
An NPU (neural processing unit) is a specialized AI accelerator chip optimized for deep learning tasks such as image recognition, object detection and natural language processing. Its ability to process data on-chip with minimal energy usage lends itself to real-time applications on smartphones, autonomous vehicles and IoT devices.
NPUs are chips optimized for machine learning — more specifically, deep learning. Their unique architecture is specifically built to handle the types of mathematical computations found in neural networks. And their high throughput, energy efficiency and high-bandwidth memory processing makes them ideal for real-time data processing on smaller, low-power devices, such as smartphones, laptops, autonomous vehicles and IoT hardware.
Below, we’ll explain how NPUs work, how they’re used and the benefits and challenges of incorporating this emerging technology.
What Is a Neural Processing Unit (NPU)?
A neural processing unit is a type of AI chip designed to handle the complex computations involved in deep learning, a method that teaches computers to learn and make decisions by mimicking how the human brain works. To that end, they are specifically optimized for neural networks, which are inspired by the structure and function of the human brain itself.
While NPUs can certainly be used to train neural networks, they’re particularly well-suited for inference, where a trained AI model analyzes new data to make predictions or decisions. Like neurons in the brain, the nodes within an NPU communicate by passing information to one another. By adjusting the strength of the synapses connecting the nodes, the network learns to identify patterns and relationships, improving its ability to make accurate inferences over time.
Focusing on neural network workloads, NPUs deliver high performance and energy efficiency in AI tasks. And unlike other chips, they combine both memory and computation capabilities on the same chip, enabling them to process data locally without sending it to the cloud. This makes them ideal for real-time, on-device applications, like telling an autonomous vehicle to swerve around an object in the road.
NPUs also excel at tasks like image recognition, object detection, and natural language processing. They are what allow you to blur your background on a video call, unlock your phone with facial recognition or seek advice from voice assistants like Siri or Alexa.
Neural processing units are typically integrated into a larger system-on-chip (SoC) configuration, where they work alongside more general-purpose chips like CPUs or GPUs. In this setup, the NPU handles the AI-specific tasks, freeing up the other semiconductors to perform other tasks.
How Do NPUs Work?
NPUs are significantly better at AI tasks than CPUs and GPUs — and they use far less energy in the process. This performance advantage largely comes down to how they are designed.
While the CPU for a laptop may have around four cores, a single NPU contains thousands of tiny processing units called multiplication-accumulate (MAC) units that perform basic calculations, multiplying two numbers and then adding the result. MAC units are optimized for low-precision mathematical computations, which boosts memory and energy efficiency, with only marginal impacts to model accuracy.
The MAC units in an NPU are typically arranged in a grid-like pattern known as a systolic array. This architecture enables parallel processing, allowing the chip to carry out thousands — even trillions — of computations at the same time. This capability is especially useful for matrix multiplication, a fundamental operation in neural networks that lets them process large amounts of data quickly and at scale.
In addition to MAC units, NPUs have other purpose-built modules for applying activation functions, decompressing data and other tasks.
Another key advantage is that NPUs integrate high-bandwidth memory directly on the chip, which allows smaller AI models to operate right on the edge. Because the chip can access data locally, it requires less bandwidth, responds to requests faster and offers better privacy than sending data to the cloud. This is particularly useful in devices like smartphones and IoT applications.
NPUs vs. CPUs vs. GPUs vs. TPUs
Central Processing Unit (CPU)
Central processing units (CPUs) provide the primary computing power for most computers. They’re often referred to as the brain of a computer, as they execute commands, run programs and distribute computing resources. A CPU may have only four or 12 powerful cores, whereas an NPU may have thousands of smaller cores. And unlike NPUs, CPUs can only process data sequentially, not in parallel. While a CPU may be able to handle smaller automation workloads, AI tasks are typically reserved for GPUs, NPUs and TPUs.
Graphics Processing Unit (GPU)
Graphics processing units (GPUs) were developed to handle graphics-heavy tasks, like video gameplay and video playback. In recent years, they have also been used to power the generative AI revolution, with some companies buying thousands of GPUs to train their models. Like NPUs, their parallel processing abilities allow them to handle advanced AI workloads — including in data centers. But GPUs require more energy than NPUs, so NPUs are the preferred choice for powering neural networks locally on smartphones and other edge devices.
Tensor Processing Unit (TPU)
Designed by Google, tensor processing units (TPUs) are an application-specific integrated circuit (ASIC) specialized for neural networks. They’re similar to NPUs in that way, but they have different uses. TPUs are typically deployed at data-center scale for cloud computing, whereas NPUs are built for energy efficiency, making them more suitable for edge devices.
NPU Use Cases
Smartphones
Neural processing units are increasingly used in smartphones to power AI features such as facial recognition, real-time voice assistants and language translation. Because NPUs have high-bandwidth memory processing capabilities, these smartphones can run AI models locally, without draining the battery or waiting on data to travel to and from the cloud. NPUs have been used by almost all of the major smartphone companies, beginning with iPhone X, Samsung S24 and Google Pixel 8.
Autonomous Vehicles
Autonomous vehicles like drones and self-driving cars rely on NPUs for their energy efficiency and real-time processing capabilities. These chips allow them to quickly and efficiently detect objects, such as traffic signs and oncoming vehicles, and make split-second decisions without waiting on the cloud.
IoT Devices
IoT devices — like smart home systems, wearable technology and automated industrial equipment — also use NPUs, as they can provide real-time feedback on small, low-power devices. A neural processing unit could be used in a home security camera to detect a visitor at your front door, for example, or it could be used to monitor an industrial machine’s temperature or energy usage.
Companies and Products Using NPUs
Apple was one of the first companies to incorporate a NPU, debuting its so-called “Neural Engine” in 2017 as part of the A11 Bionic chip for iPhone X. Then in 2020, Apple introduced its Neural Engine into MacBooks with the launch of its M-series silicon chips.
In 2017, Qualcomm introduced on-device AI processing in its Hexagon digital signal processor (DSP), which was part of its Snapdragon 845 mobile SoC. By 2021, the company rebranded its Hexagon DPS to Hexagon NPU with the release of the Snapdragon 8 SoC. Snapdragon SoCs run on a variety of laptops and smartphones, including the Samsung Galaxy smartphone.
Chinese tech giant Huawei first launched an NPU in 2017 with the Kiron 970 SoC on the Huawei Mate 10 smartphone. In 2019, the company launched Ascend 910, its first NPU for cloud training.
Intel introduced its first NPU in 2023 within the Intel Core Ultra Series 1, also known as Meteor Lake. In 2024, it followed up with the Intel Core Ultra Series 2, called Lunar Lake.
In 2024, Microsoft launched a new category of AI-powered Windows laptops with the Copilot+ PC. These so-called “AI PCs” initially launched with Qualcomm’s Snapdragon X Elite and X Plus chips, which deliver more than 45 trillions of operations per second (TOPS). CoPilot+ PCs include the Microsoft Surface, as well as other Windows-based PCs made by Acer, ASUS, Dell, HP, Lenovo and Samsung.
Benefits of NPUs
Enhanced Performance
NPUs are built for parallel processing and matrix multiplication, allowing them to handle deep learning inference tasks much faster than other AI chips.
Energy Efficiency
Because they are designed for low-precision arithmetic, parallel processing and minimal data movement, NPUs are far more energy efficient than some other AI chips. This is particularly helpful with smartphones and other electronic devices, as an NPU can execute AI tasks without slowing down the device or draining its battery.
Real-Time Processing
NPUs’ high-bandwidth memory processing allows for data to be quickly processed on the chip without waiting for data to be transmitted to the cloud. This makes them an excellent choice for autonomous driving, augmented reality and other applications that require real-time data processing.
Data Privacy
NPUs offer important data privacy benefits. Because data is processed right on the chip rather than in the cloud, NPUs reduce the risk of data breaches, interceptions or leaks during transmission.
Limitations of NPUs
Lack of Versatility
While NPUs excel at neural network operations, they are not equipped to handle general computation, graphics rendering and other non-AI tasks. This is why they’re typically used alongside CPUs and GPUs within a larger SoC configuration to carry out broader processing needs.
Limited Scalability
NPUs are not capable of large-scale AI workloads due to their limited scalability and lower compute capacity. They are designed primarily for in-device inference and edge applications, where efficiency and low power consumption are the priority. Therefore, GPUs and TPUs are the preferred option for data center environments and large scale AI models.
Integration Troubles
NPUs often rely on proprietary APIs and specialized software frameworks, which may not work well with existing systems. And since the NPU ecosystem is still relatively new, developers may have difficulty accessing these specialized libraries and resources, thus slowing down the AI development process.
Frequently Asked Questions
What does NPU stand for?
NPU stands for neural processing unit. NPUs are optimized to execute neural network models with higher performance and energy efficiency than other AI chips.
Are NPUs better than GPUs for AI tasks?
In many cases, NPUs are better than GPUs for AI tasks due to their energy efficiency, on-device data processing capabilities and architecture, which is optimized for neural network computations. But GPUs are still the AI chip of choice for training AI models and data center environments.
What is an NPU vs. a GPU?
An NPU is built specifically for neural network operations, whereas a GPU is capable of general purpose computing. NPUs are more energy efficient than GPUs, making them ideal for edge computing. But they don’t have the scalability and versatility of GPUs.
Can an NPU replace a CPU?
No, an NPU cannot replace a CPU. A CPU is the brain of a computer, managing its operating system and running programs. A NPU is much more specialized. It plays a complementary role to the CPU, handling AI tasks so the CPU can focus on other things.
Do NPUs need special hardware?
Yes. Unlike CPUs, NPUs require specialized drivers, compilers and runtime engines that are typically developed by the NPU vendor.
What’s the difference between NPUs and AI accelerators?
NPUs are a type of AI accelerator, along with GPUs, TPUs and field programmable gate arrays (FPGAs). NPUs differentiate themselves from other AI accelerators with their parallel processing capabilities, energy efficiency and high-bandwidth memory processing.