Browse Definitions :

non-uniform memory access (NUMA)

What is non-uniform memory access (NUMA)?

Non-uniform memory access, or NUMA, is a method of configuring a cluster of microprocessors in a multiprocessing system so they can share memory locally. The idea is to improve the system's performance and allow it to expand as processing needs evolve.

In a NUMA setup, the individual processors in a computing system share local memory and can work together. Data can flow smoothly and quickly since it goes through intermediate memory instead of a main bus.

NUMA can be thought of as a microprocessor cluster in a box. The cluster typically consists of four microprocessors interconnected on a local bus to a shared memory on a single motherboard. The bus may be a peripheral component interconnect bus, the shared memory is called an L3 cache and the motherboard is often referred to as a card.

This unit can be added to similar units to form a symmetric multiprocessing system (SMP) that can contain 16 00 256 microprocessors with a common SMP bus interconnecting the clusters. In an SMP system, all the individual processor memories look like a single memory to an application program.

How non-uniform memory access works

When a processor looks for data at a certain memory address, it first looks in the L1 cache on the microprocessor. Then it moves to the larger L2 cache chip and finally to a third level of cache (L3). The NUMA configuration provides this third level. If the processor still cannot find the data, it will look in the remote memory located near the other microprocessors.

Each of these clusters is viewed by NUMA as a node in the interconnection network. NUMA maintains a hierarchical view of the data on all nodes. Data is moved on the bus between the clusters using a scalable coherent interface. SCI coordinates cache coherence (consistency) across the nodes of the multiple clusters in the NUMA architecture.

NUMA and symmetric multiprocessing

NUMA is commonly used in a symmetric multiprocessing system. An SMP system is a tightly coupled, share-everything system in which multiple processors work under a single operating system and access each other's memory over a common bus or interconnect path. These microprocessors work on a single motherboard connected by a bus.

One limitation of SMP is that as microprocessors are added, it overloads the shared bus or data path, creating a performance bottleneck. NUMA adds an intermediate level of memory that is shared among a few microprocessors so that all accessed data doesn't travel on the main bus. This helps address performance bottleneck issues.

SMP and NUMA systems are typically used in data mining applications and decision support systems. In these types of applications, processing can be parceled out to multiple processors that collectively work on a common database.

NUMA node architecture

The NUMA architecture is common in multiprocessing systems. These systems include multiple hardware resources including memory, input/output devices, chipset, networking devices and storage devices (in addition to processors). Each collection of resources is a node. Multiple nodes are linked via a high-speed interconnect or bus.

Every NUMA system contains a coherent global memory and I/O address space that can be accessed by all processors in the system. The other components can vary, although at least one node must have memory, one must have I/O resources, and one must have processors.

In this type of memory architecture, a processor is assigned a specific local memory for its own use, and this memory is placed close to the processor. The signal paths are shorter, which is why these processors can access local memory faster than non-local memory. Also, since there is no sharing of non-local memory, there is an appreciable drop in delays (latency) when multiple access requests come in for the same memory location.

Advantages and disadvantages of NUMA

One of the biggest advantages of NUMA is the fast movement of data and lower latency in the multiprocessing system. Additionally, NUMA reduces data replication and simplifies programming. And the parallel computers in a NUMA architecture are highly scalable and responsive to data allocation in local memories.

One disadvantage of NUMA is that it can be expensive. And the lack of programming standards for larger configurations can make implementation challenging.

Difference between NUMA and UMA

There are three shared-memory models in multiprocessor systems.

  1. Uniform memory access (UMA)
  2. Non-uniform memory access (NUMA)
  3. Cache-only memory access/architecture (COMA)

Figure 1 shows the differences between UMA and NUMA.

uniform memory access (UMA) vs. non-uniform memory access (NUMA)
Figure 1

Like UMA and NUMA, COMA is also a shared-memory model used in multiprocessors. In a COMA machine, multiple processing nodes are connected by an interconnection network. Each node includes a processor and cache and is allocated as part of the global shared memory. The local memory (typically DRAM) at each node is used as the cache. This feature differentiates COMA from NUMA in which the local memory is used as the main memory.

COMA generally enables more efficient use of memory resources. The drawback is that it also makes it harder to find a particular piece of data since there is no home node in COMA (unlike NUMA). Moreover, it can be challenging to address memory shortage problems once the local memory fills up.

See also: vNUMA (virtual NUMA), IT Acronyms at your fingertips

This was last updated in September 2022

Continue Reading About non-uniform memory access (NUMA)

  • remote infrastructure management

    Remote infrastructure management, or RIM, is a comprehensive approach to handling and overseeing an organization's IT ...

  • port address translation (PAT)

    Port address translation (PAT) is a type of network address translation (NAT) that maps a network's private internal IPv4 ...

  • network fabric

    'Network fabric' is a general term used to describe underlying data network infrastructure as a whole.

  • governance, risk and compliance (GRC)

    Governance, risk and compliance (GRC) refers to an organization's strategy for handling the interdependencies among the following...

  • total risk

    Total risk is an assessment that identifies all the risk factors associated with pursuing a specific course of action.

  • steganography

    Steganography is the technique of hiding data within an ordinary, nonsecret file or message to avoid detection; the hidden data ...

  • microtargeting

    Microtargeting (also called micro-niche targeting) is a marketing strategy that uses consumer data and demographics to identify ...

  • business process

    A business process is an activity or set of activities that accomplish a specific organizational goal. Business processes should ...

  • business process improvement (BPI)

    Business process improvement (BPI) is a practice in which enterprise leaders analyze their business processes to identify areas ...

  • employee onboarding and offboarding

    Employee onboarding involves all the steps needed to get a new employee successfully deployed and productive, while offboarding ...

  • skill-based learning

    Skill-based learning develops students through hands-on practice and real-world application.

  • gamification

    Gamification is a strategy that integrates entertaining and immersive gaming elements into nongame contexts to enhance engagement...

Customer Experience
  • Microsoft Dynamics 365

    Dynamics 365 is a cloud-based portfolio of business applications from Microsoft that are designed to help organizations improve ...

  • Salesforce Commerce Cloud

    Salesforce Commerce Cloud is a cloud-based suite of products that enable e-commerce businesses to set up e-commerce sites, drive ...

  • Salesforce DX

    Salesforce DX, or SFDX, is a set of software development tools that lets developers build, test and ship many kinds of ...