Imagine a small app where each client talks directly to a specific server, say https://10.0.0.1:8080. As traffic grows, that one server becomes a bottleneck, and if it crashes, the whole app is effectively down for any client pointing at it.
To scale out, you might spin up more servers like https://10.0.0.2:8080, https://10.0.0.3:8080, and so on. But now each client must somehow “know” which server to talk to and when to switch, which is brittle and hard to manage. DNS-based tricks (round-robin DNS) help a bit but react slowly to failures and don’t give you fine-grained control over distribution or health.
That operational pain is why Load Balancers exist: they sit in front of a pool of backend servers and act as a traffic director, distributing incoming requests across them based on some algorithm.
public class LoadBalancer {
private final List<BackendServer> servers;
public LoadBalancer(List<BackendServer> servers){
this.servers = servers;
}
public void handleRequest(String request){
BackendServer server = chooseServer(request);
server.handleRequest(request);
}
private BackendServer chooseServer(String request){
// algorithm for choosing server
}
}Layer 4 vs. Layer 7 Load Balancing #
Load balancers typically operate at different layers of the OSI model:
- Layer 4 (Transport Layer): Operates at the network and transport layer (TCP/UDP). It makes routing decisions based on IP addresses and port numbers. It is extremely fast because it doesn’t inspect the application-level data (like HTTP headers or cookies).
- Layer 7 (Application Layer): Operates at the application layer (HTTP/HTTPS). It can inspect the content of the request (URL, headers, cookies, query parameters). This allows for much more sophisticated routing, such as sending traffic for
/api/v1/usersto one set of servers and/api/v1/ordersto another.
Server Selection Algorithms #
To decide which server should receive a request, load balancers use various algorithms:
1. Round Robin #
A classic algorithm where requests are distributed sequentially across the list of servers. Simple but doesn’t account for server load or capacity.
private final AtomicInteger currentIndex = new AtomicInteger(0);
private BackendServer chooseServer(String request){
int index = currentIndex.getAndUpdate(i -> (i + 1) % servers.size());
return servers.get(index);
}Real load balancers monitor backend health and avoid sending traffic to servers that are down or misbehaving. Typical implementations use periodic health checks (e.g., HTTP GET /health) and mark servers healthy or unhealthy based on the response.
2. Sticky Sessions #
In a normal load balancer, each request is routed independently. This is fine for stateless applications, but it’s a problem for stateful applications where the server needs to maintain session information for each client.
One way to handle this is by maintaining a ConcurrentHashMap of session ID to server mapping on the load balancer. However, this only lives in one JVM instance. If you have multiple load balancer instances, you need to sync session information across them, which is complex and inefficient.
This is why many platforms use cookie-based or hashing-based affinity rather than in-memory session maps.
3. Hashing Based Sticky Sessions #
A straightforward way to get stickiness is:
- Choose a key like client IP or session ID.
- Compute a hash
h(key). - Route to server at index
h(key) % N, whereNis the number of servers.
This works as long as the set of servers never changes. But as soon as you add or remove a server, N changes and almost all keys get remapped to different indices, which destroys cache locality and sticky mappings.
Consistent Hashing: Step by Step #
Consistent Hashing is a scheme that allows us to minimize the number of keys that get remapped when the number of servers changes.
- Hash Space as a Ring: Imagine all possible hash values (e.g., 0 to 2^32 - 1) arranged on a circle; this is the “hash ring”.
- Place Servers on the Ring: For each backend server, hash its identifier (e.g., IP address or hostname) to a point on the ring.
- Place Keys on the Ring: For each key (e.g., client IP or session ID), hash it into the space and find the first server clockwise from the key’s position. That server is responsible for the key.
- Adding or Removing a Server: When a server is added, it takes over only the keys that fall between its position and the previous server’s position (counter-clockwise). When a server is removed, its keys are reassigned to the next server clockwise.
Virtual Nodes for Better Balance #
If you place each physical server once on the ring, the distribution can be uneven. To reduce this imbalance, each physical server is mapped to multiple points on the ring. These are called Virtual Nodes.
Now, let’s incrementally design a small, self contained Java implementation that contains - a. a dynamic pool of backend servers. b. a hash function to distribute our keys onto the consistent hash ring. c. a consistent hash ring with virtual nodes. d. a router for requests based on some sticky key (eg: session ID or client IP)
a. Backend Server #
public class BackendServer {
private final String id;
private final String host;
private final int port;
public BackendServer(String id, String host, int port){
this.id = id;
this.host = host;
this.port = port;
}
public String getId(){
return id;
}
public String getHost(){
return host;
}
public int getPort(){
return port;
}
@Override
public String toString(){
return "BackendServer{" +
"id='" + id + '\'' +
", host='" + host + '\'' +
", port=" + port +
'}';
}
}b. Hash Function #
public final class HashUtils {
private static final String HASH_ALGORITHM = "SHA-256";
public static long hashToLong(String key) {
try {
MessageDigest md = MessageDigest.getInstance(HASH_ALGORITHM);
byte[] digest = md.digest(key.getBytes(StandardCharsets.UTF_8));
// Take the first 8 bytes and convert to a long
ByteBuffer buffer = ByteBuffer.wrap(digest);
return buffer.getLong();
} catch (NoSuchAlgorithmException e) {
// Should not happen for SHA-256 in a standard JDK
throw new RuntimeException("Hash algorithm not available", e);
}
}
}c. Consistent Hash Ring #
public class ConsistentHashRing {
private final NavigableMap<Long, BackendServer> ring = new TreeMap<>();
private final int virtualNodes;
public ConsistentHashRing(int virtualNodes) {
if (virtualNodes <= 0) {
throw new IllegalArgumentException("virtualNodes must be > 0");
}
this.virtualNodes = virtualNodes;
}
public synchronized void addServer(BackendServer server) {
Objects.requireNonNull(server, "server must not be null");
for (int i = 0; i < virtualNodes; i++) {
String vnodeKey = server.getId() + "#" + i;
long hash = HashUtils.hashToLong(vnodeKey);
ring.put(hash, server);
}
}
public synchronized void removeServer(BackendServer server) {
Objects.requireNonNull(server, "server must not be null");
for (int i = 0; i < virtualNodes; i++) {
String vnodeKey = server.getId() + "#" + i;
long hash = HashUtils.hashToLong(vnodeKey);
ring.remove(hash);
}
}
public synchronized BackendServer getServerForKey(String key) {
if (ring.isEmpty()) {
return null;
}
long hash = HashUtils.hashToLong(key);
// Find first entry >= hash
var entry = ring.ceilingEntry(hash);
if (entry == null) {
// Wrap around to the first entry
entry = ring.firstEntry();
}
return entry.getValue();
}
public synchronized int size() {
return ring.size();
}
public synchronized Collection<BackendServer> getAllServersSnapshot() {
// This returns a collection view; to be safe, you might want to copy it
return ring.values();
}
}d. ConsistentHashLoadBalancer #
public class ConsistentHashLoadBalancer {
private final ConsistentHashRing ring;
public ConsistentHashLoadBalancer(int virtualNodesPerServer) {
this.ring = new ConsistentHashRing(virtualNodesPerServer);
}
public void registerServer(BackendServer server) {
ring.addServer(server);
}
public void unregisterServer(BackendServer server) {
ring.removeServer(server);
}
/**
* Route based on a sticky key, e.g. session ID, client IP, or custom header.
*/
public BackendServer routeRequest(String stickyKey) {
if (stickyKey == null) {
throw new IllegalArgumentException("stickyKey must not be null");
}
return ring.getServerForKey(stickyKey);
}
}Now let’s verify the above usage with a test
public class ConsistentHashLoadBalancerTest {
@Test
public void testConsistentLoadBalancer() {
ConsistentHashLoadBalancer lb = new ConsistentHashLoadBalancer(100);
BackendServer s1 = new BackendServer("server-A", "10.0.0.1", 8080);
BackendServer s2 = new BackendServer("server-B", "10.0.0.2", 8080);
BackendServer s3 = new BackendServer("server-C", "10.0.0.3", 8080);
lb.registerServer(s1);
lb.registerServer(s2);
lb.registerServer(s3);
String[] clients = {
"user-1-session",
"user-2-session",
"user-3-session",
"user-4-session",
"user-5-session"
};
System.out.println("=== Initial routing ===");
Map<String, BackendServer> initialMapping = new LinkedHashMap<>();
for (String client : clients) {
BackendServer server = lb.routeRequest(client);
initialMapping.put(client, server);
System.out.printf("Client %s -> %s%n", client, server);
}
// Now add a new server and see who moved
BackendServer s4 = new BackendServer("server-D", "10.0.0.4", 8080);
lb.registerServer(s4);
System.out.println("\n=== After adding server-D ===");
int moved = 0;
for (String client : clients) {
BackendServer server = lb.routeRequest(client);
BackendServer before = initialMapping.get(client);
boolean same = before.getId().equals(server.getId());
if (!same) {
moved++;
}
System.out.printf(
"Client %s -> %s (was %s)%s%n",
client,
server,
before,
same ? "" : " <-- moved"
);
}
System.out.printf("%nTotal clients moved: %d out of %d%n", moved, clients.length);
}
}Output:
=== Initial routing ===
Client user-1-session -> BackendServer{id='server-A', host='10.0.0.1', port=8080}
Client user-2-session -> BackendServer{id='server-C', host='10.0.0.3', port=8080}
Client user-3-session -> BackendServer{id='server-B', host='10.0.0.2', port=8080}
Client user-4-session -> BackendServer{id='server-C', host='10.0.0.3', port=8080}
Client user-5-session -> BackendServer{id='server-A', host='10.0.0.1', port=8080}
=== After adding server-D ===
Client user-1-session -> BackendServer{id='server-D', host='10.0.0.4', port=8080} (was BackendServer{id='server-A', host='10.0.0.1', port=8080}) <-- moved
Client user-2-session -> BackendServer{id='server-C', host='10.0.0.3', port=8080} (was BackendServer{id='server-C', host='10.0.0.3', port=8080})
Client user-3-session -> BackendServer{id='server-B', host='10.0.0.2', port=8080} (was BackendServer{id='server-B', host='10.0.0.2', port=8080})
Client user-4-session -> BackendServer{id='server-C', host='10.0.0.3', port=8080} (was BackendServer{id='server-C', host='10.0.0.3', port=8080})
Client user-5-session -> BackendServer{id='server-A', host='10.0.0.1', port=8080} (was BackendServer{id='server-A', host='10.0.0.1', port=8080})
Total clients moved: 1 out of 5Conclusion #
Consistent Hashing is the gold standard for distributing traffic in stateful systems. While simple algorithms like Round Robin or standard modulo hashing serve as a good starting point, they fail to provide the stability needed when backend nodes are dynamic.
This pattern is fundamental to modern distributed architecture, powering distributed caches like Redis and Memcached, partitioned databases like Cassandra and DynamoDB, and global Content Delivery Networks (CDNs) where maximizing cache hit rates is critical. By mapping both servers and keys onto a unified hash ring and using techniques like virtual nodes, we can achieve high performance and minimal data movement during scaling events. Mastering these concepts is essential for building the resilient, auto-scaling infrastructure that powers the web today.