Load Balancing

A load balancer sits between clients and your servers, distributing incoming requests so no single server is overwhelmed. Without one, horizontal scaling gives you capacity that clients cannot reach. With one, you can add or remove servers transparently — clients always hit the same address. Load balancers use different algorithms to route requests. Round-robin cycles through servers sequentially. Least-connections routes to the server with the fewest active requests — better when requests vary in duration. Consistent hashing maps clients to servers deterministically, minimising cache misses when the server pool changes. Layer 7 load balancers can also route based on URL path or headers — sending `/api/*` to one cluster and static assets to another.

Before

Single server — one failure kills everything

Client → Server A (single point of failure)

If Server A goes down:
→ All requests fail immediately
→ No capacity to add without downtime
→ Deployment requires taking the app offline

After

Load balancer + server pool

Client → Load Balancer → Server A (healthy)
                            ↘ Server B (healthy)
                            ↘ Server C (draining)

Benefits:
→ Server failure: traffic shifts in seconds
→ Add Server D with zero downtime
→ Deploy by draining one server at a time

Key Takeaway

A load balancer is the entry point for horizontal scaling — it makes your server fleet look like one machine to the outside world.