Load balancing is a technique used to distribute workloads uniformly across servers or other compute resources to optimize network efficiency, reliability and capacity. Load balancing is performed by an appliance -- either physical or virtual -- that identifies in real time which server in a pool can best meet a given client request, while ensuring heavy network traffic doesn't unduly overwhelm a single server.
In addition to maximizing network capacity and performance, load balancing provides failover. If one server fails, a load balancer immediately redirects its workloads to a backup server, thus mitigating the impact on end users.
Load balancing is usually categorized as supporting either Layer 4 or Layer 7. Layer 4 load balancers distribute traffic based on transport data, such as IP addresses and Transmission Control Protocol (TCP) port numbers. Layer 7 load-balancing devices make routing decisions based on application-level characteristics that include HTTP header information and the actual contents of the message, such as URLs and cookies. Layer 7 load balancers are more common, but Layer 4 load balancers remain popular, particularly in edge deployments.
How load balancing works
Load balancers handle incoming requests from users for information and other services. They sit between the servers that handle those requests and the internet. Once a request is received, the load balancer first determines which server in a pool is available and online and then routes the request to that server. During times of heavy loads, a load balancer can dynamically add servers in response to spikes in traffic. Conversely, they can drop servers if demand is low.
A load balancer can be a physical appliance, a software instance or a combination of both. Traditionally, vendors have loaded proprietary software onto dedicated hardware and sold them to users as stand-alone appliances -- usually in pairs, to provide failover if one goes down. Growing networks require purchasing additional and/or bigger appliances.
In contrast, software load balancing runs on virtual machines (VMs) or white box servers, most likely as a function of an application delivery controller (ADC). ADCs typically offer additional features, like caching, compression, traffic shaping, etc. Popular in cloud environments, virtual load balancing can offer a high degree of flexibility -- for example, enabling users to automatically scale up or down to mirror traffic spikes or decreased network activity.
Load-balancing algorithms determine which servers receive specific incoming client requests. Standard methods are as follows:
- The hash-based approach calculates a given client's preferred server based on designated keys, such as HTTP headers or IP address information. This method supports session persistence, or stickiness, which benefits applications that rely on user-specific stored state information, such as checkout carts on e-commerce sites.
- The least-connections method favors servers with the fewest ongoing transactions, i.e., the "least busy."
- The least-time algorithm considers both server response times and active connections -- sending new requests to the fastest servers with the fewest open requests.
- The round robin method -- historically, the load-balancing default -- simply cycles through a list of available servers in sequential order.
Formulas can vary significantly in sophistication and complexity. Weighted load-balancing algorithms, for example, also take into account server hierarchies -- with preferred, high-capacity servers receiving more traffic than those assigned lower weights.