Load balancers are infrastructure components which distribute incoming network traffic between multiple backend servers. They improve capacity and add redundancy by keeping services accessible if one of your servers fails.
Load balancers act as the public gateway to your application. They’re specialized in their role so they can be highly optimized to maximize traffic throughput. Load balancers are usually configurable with several kinds of routing algorithm to match your application’s requirements.
In this article we’ll explore what load balancers are, how they work, and some of the complications they can cause. We’ll also explain the differences between the most common load balancing algorithms.
What Load Balancers Do
Load balancers are responsible for providing a reverse proxy in front of your application’s servers. All clients connect to this single proxy instead of the individual backend instances. The load balancer is responsible for selecting a server to handle each request. This occurs invisibly to the external client.
Both hardware- and software-based load balancer implementations are available. On the software side, most web servers such as Apache and NGINX are capable of fulfilling the role. Hardware-type load balancers are deployed as standalone infrastructure components from your hosting provider.
Load balancers usually monitor the health of instances in their backend server pool. Backends that become unhealthy stop being sent new traffic, reducing service flakiness and downtime. Similarly, load balancers generally let you add new backend instances at any time, so you can scale your service with extra capacity during peak hours.
The primary objective of a load balancer is to maximize throughput and make the most efficient use of available resources. Being able to scale horizontally across physical servers is usually more effective than vertically growing a single node with extra CPU or memory. Horizontal scaling gives you more redundancy as well as capacity while the overhead incurred by the load balancer layer is generally nominal.
Load Balancing Algorithms
Although the aim of load balancing is always to distribute traffic between multiple servers, there are several ways in which this can be achieved. Before looking at specific strategies, it’s important to identify the two fundamental types of algorithm you can select:
Static balancing – These methods work from hardcoded config values making them completely predictable in their operation. This kind of algorithm doesn’t take into account the state of the backend servers it can forward to, so it could keep sending new requests to an already congested instance. Dynamic balancing – Dynamic algorithms tune themselves in real-time based on traffic flow and the availability of servers in your pool. These strategies are able to automatically avoid instances that are already handling several requests. Dynamic load balancing can marginally increase overheads as the load balancer has to track each request’s completion status.
Static balancing systems are usually easier to configure, test, and inspect. Dynamic balancing is much more powerful and is usually the preferred choice for production applications. Within each of these classes, there are several specific routing strategies you can choose:
Round robin – Round robin is a static balancing method that directs requests to each server in turn. If you’ve got three servers A, B, and C, the first incoming request will go to A, the second one to B, and the third to C. The load balancer will start again at A for the fourth request. Weighted round robin – A variation of the round robin algorithm where admins define the relative priorities of each server in the pool. A heavily weighted server will be used more frequently, receiving a higher share of the traffic. This method lets you use the round robin strategy when your server pool comprises servers with unequal specifications. Random – Many load balancers include a true random option as an alternative static choice. Hashed – This static balancing strategy hashes the client’s IP address to determine which of the backend servers will handle the request. This ensures the same instance serves every connection that originates from that client. Fewest connections – This is a popular dynamic algorithm that always directs incoming requests to the server with the least number of open connections. In many applications this the most effective way to maximize overall performance. Highest bandwidth availability – This method sends new traffic to the server with the most available bandwidth. This is ideal in situations where individual requests are likely to use large amounts of bandwidth even if the total request count remains low. Custom health/load endpoint – Many load balancers include a way to make traffic distribution decisions based on custom metrics exposed by your backend servers. Queries could be made against CPU usage, memory consumption, and other critical measures using a mechanism such as SNMP.
Other Load Balancer Characteristics
Load balancers can create a few complications for your application. One of the most prevalent is the challenge of achieving sticky backend sessions. It’s common for systems to maintain state on the server and need to persist it between client connections.
You can mitigate this using the hashed balancing algorithm or a similar client-based option. This guarantees that connections from the same IP address terminate at a particular server. Most load balancers also offer an explicit sticky sessions option that looks for a nominated header or cookie in an HTTP request. This value can be used to consistently direct requests to the same server after the client’s initial connection.
Load balancers can create complexity around SSL too. Many organizations configure SSL to terminate at the load balancer. Connections between the load balancer and your backend servers are made over regular HTTP. This usually results in a simpler set up experience with reduced maintenance demands.
Using HTTP-only connections in the forward direction isn’t always acceptable for security-critical workloads. Load balancers capable of performing SSL passthrough can deliver traffic straight to your backend servers, without decrypting the data first. However this restricts the routing functionality you can use: as the load balancer can’t decrypt incoming requests, you won’t be able to perform matching based on attributes like headers and cookies.
Layer 4 and Layer 7 Load Balancers
Load balancing is often discussed in the context of Layer 4 (L4) and Layer 7 (L7) networking. These terms describes the point at which the load balancer routes traffic within the lifecycle of a network request.
A Layer 4 resource operates at the network transport level. These load balancers make routing decisions based on characteristics of the request’s transport, such as the TCP or UDP port that was used. Request-specific data isn’t considered.
A Layer 7 load balancer is located adjacent to the application layer. These load balancers can access complex data within the request and use it to inform workload-specific routing rules. This is where load balancing that accounts for a session ID in a HTTP header or cookie can take place.
Layer 7 load balancing is powerful but relatively resource intensive. It needs to parse and inspect the content of each request before it can be passed to a backend. The packet-based nature of Layer 4 load balancers provides less control but has a correspondingly reduced impact on throughput. Layer 4 doesn’t decrypt traffic either so a load balancer compromise at this stage won’t expose request data.
Conclusion
Load balancers let you route incoming traffic between your servers. They’re a critical component of highly available networking architectures that let you transparently run multiple backend instances. This increases service capacity and avoids a total outage if one server goes offline.
Most load balancer implementations give you the choice of several different algorithms, including both static and dynamic options. Many applications are well served by simple choices such as “fewest connections” or “round robin” but more complex options are useful in specific situations.
It’s good practice to run every production application behind a load balancer. It gives you flexibility to scale on-demand and react to unhealthy servers. Load balancing is usually easy to set up within your hosting stack or cloud provider’s networking infrastructure.