Rate Limiting: Token Bucket vs Sliding Window
Every public API needs rate limiting — to prevent abuse, protect downstream services, and ensure fair usage across tenants. Without it, one aggressive client can overwhelm your database while others get timeouts. The two most common algorithms are the token bucket and the sliding window log — each with different burst tolerance and precision characteristics.
A fixed window counter is simple but allows burst spikes at window boundaries. A leaky bucket smooths traffic but adds latency. Token buckets allow controlled bursts while maintaining an average rate. Sliding windows count requests in a rolling interval for precise limits. This article compares the algorithms, shows how Redis implements them, and explains where API gateways like Kong and AWS API Gateway apply each.
Token Bucket
The token bucket algorithm maintains a bucket with a maximum capacity (burst limit) and a refill rate (sustained limit). Each incoming request consumes one token. If tokens are available, the request proceeds; if the bucket is empty, the request is rejected with HTTP 429 Too Many Requests. Tokens refill continuously at the configured rate — e.g. 100 tokens per minute with a bucket size of 20 allows 20 immediate requests, then throttles to ~1.67 per second.
This burst-friendly behavior makes token buckets ideal for APIs where occasional spikes are acceptable — like a dashboard loading ten endpoints at once. AWS API Gateway, Stripe, and GitHub's API all use token bucket variants. Implementation is simple: store {tokens, lastRefillTime} in Redis and compute refilled tokens on each request.
Quick reference
- Best for: APIs with acceptable burst traffic, user-facing endpoints, payment APIs.
- Strengths: allows controlled bursts, smooth average rate, simple Redis implementation.
- Weaknesses: burst can still overwhelm downstream if bucket size is too large.
- Parameters: bucket size (max burst) + refill rate (sustained requests per second).
- Return Retry-After header on 429 so clients know when to retry.
- Use per-user and per-IP limits — a user on a shared IP should not block others.
Remember this
Token buckets are the default for most APIs — they balance burst tolerance with sustained rate control.
Sliding Window
The sliding window algorithm counts requests within a rolling time window. For a limit of 100 requests per minute, it counts all requests in the last 60 seconds — not just the current clock minute. When a new request arrives, remove timestamps older than 60 seconds; if the count exceeds 100, reject the request.
This eliminates the boundary spike problem of fixed windows (where 100 requests at 00:59 and 100 at 01:00 pass a 100/min limit). The cost is memory — you must store a timestamp for each request in the window. Redis sorted sets (ZADD/ZCOUNT) or a circular buffer implement this efficiently. NGINX's limit_req module and Redis Cell use sliding window variants.
Quick reference
- Best for: strict rate limits, anti-abuse, login endpoints, SMS/email sending.
- Strengths: precise limits, no boundary spikes, fair across time.
- Weaknesses: higher memory usage (stores timestamps), no burst allowance.
- Redis implementation: ZADD key timestamp timestamp; ZREMRANGEBYSCORE key 0 (now - window); ZCARD key.
- Sliding window counter (approximate) reduces memory by dividing the window into sub-buckets.
- Combine with exponential backoff on the client for retry storms.
Remember this
Sliding windows enforce precise limits — use them where burst tolerance is a security risk, not a feature.
Implementing Rate Limits in Production
Rate limiting belongs at the edge — API Gateway, NGINX, or a dedicated middleware — not deep inside business logic. Centralize limits in Redis so all app instances share the same counters. Return standard HTTP 429 with Retry-After and X-RateLimit-Remaining headers so clients can adapt.
Tier limits by user plan: free tier gets 100 req/min, pro gets 1000. Use different limits per endpoint — login gets 5/min (anti-brute-force), read endpoints get 1000/min. Log rate limit hits to detect abuse patterns. For distributed systems, consider a global rate limiter (Redis Cluster) vs per-region limits to avoid cross-region latency.
Quick reference
- Place rate limiting at API Gateway or reverse proxy — before requests hit application code.
- Use Redis for shared state across multiple app instances.
- Return headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After.
- Different limits per endpoint: strict on /login, generous on /health.
- Tier by plan: free, pro, enterprise with different bucket sizes and refill rates.
- Monitor 429 rates — a spike may indicate a bug, abuse, or need to raise limits.
Remember this
Centralize rate limiting at the edge with Redis-backed counters and clear HTTP 429 responses.
Token buckets and sliding windows solve different problems. Use token buckets when controlled bursts are acceptable and you want simple implementation. Use sliding windows when precise, strict limits matter — login endpoints, SMS gateways, and anti-abuse scenarios. Most production APIs combine both: token bucket for general API traffic and sliding window for sensitive endpoints. The algorithm matters less than consistent enforcement, clear error responses, and monitoring.
Related Articles
Explore this topic