Skip to content
Back to blog
Load BalancingSystem DesignNetworkingScalability

Load Balancing: Layer 4 vs Layer 7

July 4, 20269 min read

Load balancers distribute incoming traffic across multiple servers so no single machine becomes a bottleneck. But not all load balancers work the same way. Layer 4 (transport) balancers route by IP address and port — fast and protocol-agnostic. Layer 7 (application) balancers inspect HTTP headers, URLs, and cookies to make smarter routing decisions.

Choosing the wrong layer creates either unnecessary complexity or missed optimization opportunities. An L7 balancer can route /api to your API servers and /static to a CDN, but adds parsing overhead. An L4 balancer handles millions of connections per second but cannot distinguish between API and static traffic on the same port. This article explains both layers, common algorithms, and how AWS, NGINX, and cloud-native setups combine them.

Layer 4 (Transport)Routes by IP + portFast, any TCP/UDP trafficNo HTTP awarenessLayer 7 (Application)Routes by URL, headers, cookiesSSL termination, path routingSlightly more latency
Layer 4 vs Layer 7 load balancing

Layer 4 (Transport) Load Balancing

An L4 load balancer operates at the TCP/UDP level. It sees source IP, destination IP, and port — nothing about HTTP paths or headers. When a client connects to port 443, the balancer picks a backend server (using round-robin, least connections, or IP hash) and forwards the raw TCP stream. It does not terminate SSL, parse JSON, or read cookies.

This makes L4 balancers extremely fast — they can handle millions of concurrent connections with minimal CPU. AWS Network Load Balancer (NLB), HAProxy in TCP mode, and Linux IPVS all operate at this layer. Use L4 when you need to balance non-HTTP traffic (databases, gRPC, WebSockets) or when raw throughput matters more than content-aware routing.

ClientL4 Load BalancerServer 1Server 2Server 3
L4: distribute connections by IP/port

Quick reference

  • Best for: TCP/UDP traffic, WebSockets, database connections, maximum throughput, multi-protocol balancing.
  • Strengths: very fast, low latency, handles any protocol, preserves client IP with PROXY protocol.
  • Weaknesses: no URL routing, no SSL termination (unless passthrough), no header-based decisions.
  • Algorithms: round-robin, least connections, IP hash (sticky sessions by client IP).
  • AWS NLB supports millions of requests per second with static IP and low latency.
  • Use PROXY protocol to pass the real client IP to backend servers behind the balancer.

Remember this

L4 load balancers are the throughput champions — use them when speed and protocol flexibility matter.

Layer 7 (Application) Load Balancing

An L7 load balancer understands HTTP. It terminates SSL, reads the Host header, URL path, cookies, and query parameters to decide where to route each request. Route /api/* to your API cluster, /admin/* to internal servers, and everything else to the frontend. Add a header check to send mobile clients to a different backend version.

This enables powerful patterns: path-based routing, host-based routing (api.example.com vs www.example.com), cookie-based sticky sessions, request rewriting, and WAF integration. AWS Application Load Balancer (ALB), NGINX, and Traefik all operate at L7. The trade-off is higher latency per request and lower maximum throughput compared to L4.

ClientL7 Load Balancer/api → API/static → CDN
L7: route by URL path and headers

Quick reference

  • Best for: HTTP/HTTPS APIs, microservices routing, SSL termination, content-based routing.
  • Strengths: URL/path routing, SSL termination, header manipulation, WAF integration, health checks on HTTP endpoints.
  • Weaknesses: higher latency, HTTP-only (without extensions), more complex configuration.
  • Route by path: /api → API servers, /static → CDN, /ws → WebSocket servers.
  • Use health check endpoints (/health) so the balancer removes unhealthy backends automatically.
  • Combine with CDN for static assets — do not load-balance what CloudFront can cache.

Remember this

L7 load balancers are the smart routers — use them when HTTP-aware routing and SSL termination are needed.

Combining L4 and L7 in Production

Most production architectures use both layers. An L4 load balancer (NLB or DNS round-robin) sits at the edge for high-throughput entry and DDoS protection. Behind it, L7 load balancers (ALB or NGINX Ingress) handle SSL termination, path routing, and service discovery in a Kubernetes cluster.

In Kubernetes, the typical flow is: Internet → Cloud Load Balancer (L4) → Ingress Controller (L7) → Service → Pod. The Ingress Controller reads the URL and routes to the correct microservice. For gRPC or database traffic that bypasses HTTP, an L4 service load balancer handles distribution directly.

Quick reference

  • Edge: L4 NLB for DDoS protection and high-throughput entry.
  • Application: L7 ALB/Ingress for SSL, path routing, and microservice dispatch.
  • Internal: L4 for service-to-service gRPC or database connection pooling.
  • Health checks: L7 checks HTTP /health; L7 removes unhealthy nodes from rotation.
  • Auto-scaling: attach target groups to ASG/EKS so new instances join the pool automatically.
  • Global: use GeoDNS or AWS Global Accelerator (L4) for multi-region entry points.

Remember this

Layer L4 at the edge for throughput, L7 inside for smart HTTP routing — they complement each other.

Key takeaway

Share:

Load balancing is not one-size-fits-all. Use L4 when you need raw speed and protocol flexibility. Use L7 when HTTP routing, SSL termination, and content-aware decisions matter. In cloud-native systems, you almost always need both — an L4 entry point and L7 routing inside the cluster. Understanding the OSI layer your balancer operates at is the first step to designing a scalable, resilient architecture.

Related Articles

ScalingSystem Design

Scaling is how a system handles more users, data, or traffic. Vertical scaling (scale up) means giving your existing ser

Read

When a production system breaks at 3am, you need to answer three questions fast: what happened, how bad is it, and where

Read

Caching is the fastest way to scale a system without changing your database or adding servers. Done right, it cuts datab

Read

Keep learning

Follow a structured path or browse all courses to go deeper.