What is an arithmetic logic unit (ALU) and how does it work?
An arithmetic logic unit (ALU) is a part of a central processing unit (CPU) that carries out arithmetic and logic operations. The ALU takes the input operands and an instruction and outputs the result.
In some processors, the ALU is divided into two units: an arithmetic unit (AU) and a logic unit (LU). Some processors contain more than one AU -- for example, one for fixed-point operations and another for floating-point operations.
The ALU is a vital part of modern CPUs. Most CPUs contain many subcomponents for various functions, including accumulators, registers, a memory manager, an ALU, a floating-point unit (FPU) and cache memory. New processors may also have a graphics processing unit (GPU) and a neural processing unit (NPU).
Intel CPUs use the more generic execution units to cover the more generic purpose of the ALU.
What is the purpose of an arithmetic logic unit?
The purpose of an ALU is to speed up a CPU's overall processing by performing math and logic functions. By splitting out these functions, the different portions of the CPU can be more specialized and perform different operations simultaneously.
In early microprocessors, the main CPU could only perform basic operations, and more complex processes, like math, required many steps and took a long time to perform. A separate chip, often called a math coprocessor, could be added to offload these slow functions so the CPU could perform other work.
Modern CPU cores use a pipelining approach to work on multiple things at the same time. With pipelining, for example, the memory manager can load items into registers while the ALU is performing an add operation.
Which processors have ALUs?
All modern processors have an ALU or equivalent circuitry in the CPU. This includes x86, Arm and RISC-V CPU architectures. GPUs also contain circuitry to perform arithmetic, but these are more specialized than the ones in a CPU and are often called a different name.
The ALU is a mainstay in current processors. Most processors have more than one ALU. With several ALUs, they can perform multiple operations simultaneously. For example, in AMD Zen 3 CPU processors, each core has four ALUs. An eight-core AMD processor could be running 32 different math operations at a time.
GPUs may have thousands of specialized ALUs. Nvidia refers to the parts of the GPU that perform math operations as Compute Unified Device Architecture (CUDA) cores. So, an RTX 3080 GPU has 8,704 CUDA cores and could conceptually have that many ALUs.

How does an arithmetic logic unit work?
Typically, the ALU has direct access to the processor controller, main memory -- RAM in a PC -- and the input/output (I/O) of the CPU. I/O flows along an electronic path called a bus.
The input consists of an instruction word, sometimes called a machine instruction word, that contains an operation code (opcode), one or more operands, and sometimes a format code. The opcode tells the ALU what operation to perform, and the operands are used in the operation.

For example, two operands might be added together or compared logically. The format may be combined with the opcode and tells, for example, whether this is a fixed-point or a floating-point instruction.
The output consists of a result that is placed in a storage register and settings that indicate whether the operation was performed successfully. If it isn't, some sort of status is stored in a permanent place that is sometimes called the machine status word.
In general, the ALU includes storage places for input operands, operands that are being added, the accumulated result stored in an accumulator and shifted results. The flow of bits and the operations performed on them in the subunits of the ALU are controlled by gated circuits.
The gates in these circuits are controlled by a sequence logic unit that uses a particular algorithm or sequence for each opcode. In the arithmetic unit, multiplication and division are done by a series of adding or subtracting and shifting operations.
There are several ways to represent negative numbers. In the logic unit, one of 16 possible logic operations can be performed, such as comparing two operands and identifying where bits don't match.
The design of the ALU is a critical part of the processor, and new approaches to speeding up instruction handling are continually being developed.
What type of functions do ALUs support?
In computer science, ALUs serve as a combinational digital circuit that performs arithmetic and bitwise operations on binary numbers. This is a foundational building block of arithmetic logic circuits for numerous types of control units and computing circuits, including CPUs, FPUs and GPUs.
The following are a few examples of bitwise logical operations and basic arithmetic operations supported by ALUs:
- Addition. This adds A and B with carry-in or carry-out sum at Y.
- Subtraction. This subtracts B from A or vice versa with the difference at Y and carry-in or carry-out.
- Increment. A or B is increased by one, and Y represents the new value.
- Decrement. A or B is decreased by one, and Y represents the new value.
- AND. The bitwise logic AND of A and B is represented by Y.
- OR. The bitwise logic OR of A and B is represented by Y.
- Exclusive-OR. The bitwise logic XOR of A and B is represented by Y.
ALU shift functions cause A or B operands to shift, either right or left, with the new operand represented by Y. Complex ALUs use barrel shifters to shift A or B operands by any number of bits in a single operation.

What are the differences among the ALU, CPU, GPU and NPU?
An ALU is a small piece of circuitry that performs math. It needs to be part of a larger unit to perform useful work.
The CPU is the main component in a computer. It contains all the general-purpose functions needed for the computer to operate. It always has one or more ALUs to help it do math functions. It may also contain a GPU and an NPU.
The GPU is responsible for generating video output for a computer. To help it do this, it may have thousands of ALUs to do all the necessary math.
An NPU is a piece of circuitry optimized for the math operations used in AI workloads. It often has many thousands of ALUs optimized for a few specific math operations.
Traditional CPUs face significant challenges when it comes to executing complex machine learning and AI tasks. This limitation has paved the way for specialized processors -- GPUs, tensor processing units (TPUs) and NPUs -- each expertly engineered to handle specific functions with remarkable efficiency. Compare the purposes, similarities and differences among GPUs, TPUs and NPUs.