Browse Definitions :

non-uniform memory access (NUMA)

What is non-uniform memory access (NUMA)?

Non-uniform memory access, or NUMA, is a method of configuring a cluster of microprocessors in a multiprocessing system so they can share memory locally. The idea is to improve the system's performance and allow it to expand as processing needs evolve.

In a NUMA setup, the individual processors in a computing system share local memory and can work together. Data can flow smoothly and quickly since it goes through intermediate memory instead of a main bus.

NUMA can be thought of as a microprocessor cluster in a box. The cluster typically consists of four microprocessors interconnected on a local bus to a shared memory on a single motherboard. The bus may be a peripheral component interconnect bus, the shared memory is called an L3 cache and the motherboard is often referred to as a card.

This unit can be added to similar units to form a symmetric multiprocessing system (SMP) that can contain 16 00 256 microprocessors with a common SMP bus interconnecting the clusters. In an SMP system, all the individual processor memories look like a single memory to an application program.

How non-uniform memory access works

When a processor looks for data at a certain memory address, it first looks in the L1 cache on the microprocessor. Then it moves to the larger L2 cache chip and finally to a third level of cache (L3). The NUMA configuration provides this third level. If the processor still cannot find the data, it will look in the remote memory located near the other microprocessors.

Each of these clusters is viewed by NUMA as a node in the interconnection network. NUMA maintains a hierarchical view of the data on all nodes. Data is moved on the bus between the clusters using a scalable coherent interface. SCI coordinates cache coherence (consistency) across the nodes of the multiple clusters in the NUMA architecture.

NUMA and symmetric multiprocessing

NUMA is commonly used in a symmetric multiprocessing system. An SMP system is a tightly coupled, share-everything system in which multiple processors work under a single operating system and access each other's memory over a common bus or interconnect path. These microprocessors work on a single motherboard connected by a bus.

One limitation of SMP is that as microprocessors are added, it overloads the shared bus or data path, creating a performance bottleneck. NUMA adds an intermediate level of memory that is shared among a few microprocessors so that all accessed data doesn't travel on the main bus. This helps address performance bottleneck issues.

SMP and NUMA systems are typically used in data mining applications and decision support systems. In these types of applications, processing can be parceled out to multiple processors that collectively work on a common database.

NUMA node architecture

The NUMA architecture is common in multiprocessing systems. These systems include multiple hardware resources including memory, input/output devices, chipset, networking devices and storage devices (in addition to processors). Each collection of resources is a node. Multiple nodes are linked via a high-speed interconnect or bus.

Every NUMA system contains a coherent global memory and I/O address space that can be accessed by all processors in the system. The other components can vary, although at least one node must have memory, one must have I/O resources, and one must have processors.

In this type of memory architecture, a processor is assigned a specific local memory for its own use, and this memory is placed close to the processor. The signal paths are shorter, which is why these processors can access local memory faster than non-local memory. Also, since there is no sharing of non-local memory, there is an appreciable drop in delays (latency) when multiple access requests come in for the same memory location.

Advantages and disadvantages of NUMA

One of the biggest advantages of NUMA is the fast movement of data and lower latency in the multiprocessing system. Additionally, NUMA reduces data replication and simplifies programming. And the parallel computers in a NUMA architecture are highly scalable and responsive to data allocation in local memories.

One disadvantage of NUMA is that it can be expensive. And the lack of programming standards for larger configurations can make implementation challenging.

Difference between NUMA and UMA

There are three shared-memory models in multiprocessor systems.

  1. Uniform memory access (UMA)
  2. Non-uniform memory access (NUMA)
  3. Cache-only memory access/architecture (COMA)

Figure 1 shows the differences between UMA and NUMA.

uniform memory access (UMA) vs. non-uniform memory access (NUMA)
Figure 1

Like UMA and NUMA, COMA is also a shared-memory model used in multiprocessors. In a COMA machine, multiple processing nodes are connected by an interconnection network. Each node includes a processor and cache and is allocated as part of the global shared memory. The local memory (typically DRAM) at each node is used as the cache. This feature differentiates COMA from NUMA in which the local memory is used as the main memory.

COMA generally enables more efficient use of memory resources. The drawback is that it also makes it harder to find a particular piece of data since there is no home node in COMA (unlike NUMA). Moreover, it can be challenging to address memory shortage problems once the local memory fills up.

See also: vNUMA (virtual NUMA), IT Acronyms at your fingertips

This was last updated in September 2022

Continue Reading About non-uniform memory access (NUMA)

  • network traffic

    Network traffic is the amount of data that moves across a network during any given time.

  • dynamic and static

    In general, dynamic means 'energetic, capable of action and/or change, or forceful,' while static means 'stationary or fixed.'

  • MAC address (media access control address)

    A MAC address (media access control address) is a 12-digit hexadecimal number assigned to each device connected to the network.

  • Evil Corp

    Evil Corp is an international cybercrime network that uses malicious software to steal money from victims' bank accounts and to ...

  • Trojan horse

    In computing, a Trojan horse is a program downloaded and installed on a computer that appears harmless, but is, in fact, ...

  • quantum key distribution (QKD)

    Quantum key distribution (QKD) is a secure communication method for exchanging encryption keys only known between shared parties.

  • green IT (green information technology)

    Green IT (green information technology) is the practice of creating and using environmentally sustainable computing.

  • benchmark

    A benchmark is a standard or point of reference people can use to measure something else.

  • spatial computing

    Spatial computing broadly characterizes the processes and tools used to capture, process and interact with 3D data.

  • learning experience platform (LXP)

    A learning experience platform (LXP) is an AI-driven peer learning experience platform delivered using software as a service (...

  • talent acquisition

    Talent acquisition is the strategic process employers use to analyze their long-term talent needs in the context of business ...

  • employee retention

    Employee retention is the organizational goal of keeping productive and talented workers and reducing turnover by fostering a ...

Customer Experience
  • BOPIS (buy online, pick up in-store)

    BOPIS (buy online, pick up in-store) is a business model that allows consumers to shop and place orders online and then pick up ...

  • real-time analytics

    Real-time analytics is the use of data and related resources for analysis as soon as it enters the system.

  • database marketing

    Database marketing is a systematic approach to the gathering, consolidation and processing of consumer data.