CachingRedisCDNPerformanceSystem Design

Caching Strategies: Redis, CDN, and Multi-Level Caches

July 4, 202614 min read

Caching is the fastest way to scale a system without changing your database or adding servers. Done right, it cuts database load by 90%, drops response times from 200ms to 2ms, and absorbs traffic spikes that would otherwise topple your stack. Done wrong, it serves stale data, hides real bugs, and adds invisible complexity that bites you at 2am.

This article covers every caching layer a production system uses — application-level cache-aside, read-through and write-through patterns, Redis data structures, CDN edge caching, and how to layer them together. We also tackle the hard problem: cache invalidation.

What you'll learn

Cache read-heavy, slow-changing data. Measure hit ratio — below 70% means your strategy or TTLs need tuning.
Cache-aside is the safest default. It handles cold starts gracefully and never serves stale data for long.
Use write-through when stale reads are unacceptable. Use read-through when you want the cache to own miss logic.
Match the Redis data structure to the access pattern. Hashes save memory on partial updates; Sorted Sets make range queries trivial.
CDN caching is the highest-leverage layer — it eliminates round trips to your origin entirely. Master Cache-Control headers.
Write-invalidate (delete on write, repopulate on next read) is the safest strategy when data must be fresh. TTL alone is only for data where brief staleness is acceptable.
Layer caches from fastest to slowest: in-process → Redis → CDN. Invalidation must cascade through all layers — a key deleted in Redis may still live in process memory.

Why Caching Matters

A database query that takes 50ms is slow. An in-memory Redis lookup that takes 0.1ms is fast. When 1,000 users request the same product page simultaneously, serving it from cache costs one database roundtrip instead of a thousand. That's the core trade-off: spend a little memory to save a lot of computation.

Cache hits improve latency and throughput. They also protect databases during traffic spikes — the cache absorbs demand that would otherwise queue up, time out, and cascade into full outages. Understanding which data benefits from caching is the first step: read-heavy, change-infrequently data (product catalogs, user profiles, configuration) is a perfect fit. Write-heavy, highly-personalized, or financial data (account balances, order state) needs careful handling.

Quick reference

Cache-hit ratio: the percentage of requests served from cache. Aim for 80–95% for read-heavy workloads.
Cache miss: request falls through to the origin (database or API). Every cold start is a miss.
Hot data: the small subset of records that account for most requests — often follows a power-law distribution.
Cold data: rarely-accessed records that waste cache memory if kept. Eviction policies handle this.
Cache warming: pre-populating cache at startup to avoid a wave of misses under initial load.

Remember this

Cache read-heavy, slow-changing data. Measure hit ratio — below 70% means your strategy or TTLs need tuning.

Cache-Aside (Lazy Loading)

Cache-aside is the most widely used pattern. The application checks the cache first. On a miss, it fetches from the database, writes the result to cache, then returns it. The cache is populated lazily — only the data that is actually requested gets cached.

This pattern works well because the cache only holds data that users actually need. It's resilient: if the cache goes down, the application falls back to the database without crashing. The downside is that the first request for any key is always slow (a cache miss), and if you have many simultaneous first requests for the same key (a thundering herd), you can hammer the database.

Quick reference

On a cache miss, add jitter to TTL (e.g. 270–330s) to prevent all keys expiring simultaneously.
For thundering herd: use a mutex lock (Redis SETNX) so only one request populates the key.
Cache-aside works with any backend — database, external API, or computed result.
If the cache node is unavailable, fall through to the database — don't let cache failures crash the app.

Before

Without cache — every request hits the database

1async function getProduct(id: string): Promise<Product> {2  // Every call hits the database3  return await db.products.findById(id);4}

After

Cache-aside — database only on first miss

1async function getProduct(id: string): Promise<Product> {2  const cacheKey = `product:${id}`;3 4  // 1. Check cache5  const cached = await redis.get(cacheKey);6  if (cached) return JSON.parse(cached);7 8  // 2. Cache miss — fetch from database9  const product = await db.products.findById(id);10 11  // 3. Populate cache with TTL (5 minutes)12  await redis.set(cacheKey, JSON.stringify(product), "EX", 300);13 14  return product;15}

Remember this

Cache-aside is the safest default. It handles cold starts gracefully and never serves stale data for long.

Read-Through and Write-Through

Read-through differs from cache-aside in who owns the miss logic. The cache itself fetches from the database when a key is absent, not the application. The application always talks to the cache layer and never directly to the database. Libraries like Amazon ElastiCache or a Hibernate second-level cache implement this pattern.

Write-through keeps cache and database always in sync: writes go to the cache first, which synchronously writes through to the database before confirming to the caller. Every write is slower (two writes instead of one), but every read is guaranteed to be fresh. This is useful when you can't tolerate stale reads — user account data, inventory counts, or permissions.

Quick reference

Read-through: useful when the cache provider supports loaders (Redis with a sidecar, Spring Cache).
Write-through avoids cache misses on data you just wrote — the next read finds it immediately.
Write-through doubles write latency. Acceptable for low-write, high-read scenarios.
Write-behind (async): write to cache immediately, flush to DB asynchronously — higher risk, faster writes.
Refresh-ahead: proactively reload cache keys before TTL expires to avoid latency spikes on expiry.

Remember this

Use write-through when stale reads are unacceptable. Use read-through when you want the cache to own miss logic.

Redis Data Structures for Caching

Redis isn't just a key-value store — its native data structures let you cache complex access patterns efficiently without deserializing entire objects.

Strings store serialized JSON for simple object caching. Hashes store individual fields of a record, so you can update one field without fetching and rewriting the full object. Sorted Sets let you maintain a leaderboard or a time-ordered feed in cache. Sets support fast membership checks — 'is user X in the beta group?' answered in O(1). Lists power queues and recent-activity feeds. Choosing the right structure directly reduces memory usage and eliminates unnecessary round trips.

Quick reference

String (GET/SET): best for full-object caching. Use JSON.stringify / JSON.parse.
Hash (HSET/HGET): best for records with many fields you update independently.
Sorted Set (ZADD/ZRANGE): leaderboards, time-series feeds, priority queues.
Set (SADD/SISMEMBER): membership checks — rate limiting, feature flags, dedup.
List (LPUSH/LRANGE): recent activity, job queues. Use Streams for durable queues.
Set a TTL on every key with EXPIRE or the EX option — unbounded keys fill memory and cause OOM eviction.

Before

Naive — serialize full user on every partial update

1// Update just the display name — must fetch and rewrite entire object2const user = JSON.parse(await redis.get(`user:${id}`) ?? "{}");3user.displayName = newName;4await redis.set(`user:${id}`, JSON.stringify(user));

After

Hash — update only the changed field

1// Hash: store each field separately2await redis.hset(`user:${id}`, {3  displayName: newName,4  // Other fields unchanged5});6 7// Read one field8const name = await redis.hget(`user:${id}`, "displayName");9 10// Read all fields11const user = await redis.hgetall(`user:${id}`);

Remember this

Match the Redis data structure to the access pattern. Hashes save memory on partial updates; Sorted Sets make range queries trivial.

CDN Edge Caching

A CDN (Content Delivery Network) caches responses at edge nodes physically close to users. A user in London hitting a CDN node in London receives a cached response in ~5ms; the same request without CDN might travel to a US origin server and back in 180ms. CDNs are the outermost caching layer — they sit in front of everything.

For static assets (JS, CSS, images), CDN caching is straightforward: set long Cache-Control max-age headers and bust them with content hashes in filenames. For API responses, CDN caching is trickier: you must set Surrogate-Control or Cache-Control headers on the response, vary correctly on headers that affect content (Accept-Language, Accept-Encoding), and purge keys when content changes.

Quick reference

Cache-Control: public, max-age=31536000, immutable — for versioned static assets (safe for 1 year).
Cache-Control: public, s-maxage=60, stale-while-revalidate=600 — for API responses (CDN holds 60s, serves stale for 10min while revalidating).
Vary: Accept-Encoding — tells the CDN to cache separate copies for gzip vs br responses.
Cache-Control: private — prevents CDN from caching user-specific responses.
Purge by tag (Cloudflare Cache Tags, Fastly Surrogate-Key) to invalidate groups of keys on content change.
stale-while-revalidate: serve the old response while fetching a fresh one — eliminates latency on expiry.

Remember this

CDN caching is the highest-leverage layer — it eliminates round trips to your origin entirely. Master Cache-Control headers.

Cache Invalidation: The Hard Part

Phil Karlton famously said there are only two hard problems in computer science: cache invalidation and naming things. The difficulty is that cached data has two owners — the cache and the source of truth — and keeping them in sync without serving stale data or over-evicting is genuinely hard.

The three main strategies are TTL-based expiry (simplest, always eventually consistent), event-driven invalidation (purge specific keys when the underlying data changes — complex but precise), and versioning (never invalidate; instead use new keys when data changes — cache:user:v2:{id}).

Quick reference

TTL expiry: simple but accepts eventual consistency. Choose TTL based on how stale you can tolerate.
Event-driven: subscribe to DB change events (Debezium, DB triggers) and delete affected cache keys.
Write-invalidate pattern: delete the cache key on write; let the next read repopulate it (safer than write-through when consistency matters).
Versioned keys (cache:resource:v{n}:{id}): no invalidation — increment version on schema change. Old keys expire naturally.
Cache stampede: when a TTL expires under load, many requests simultaneously miss and hammer the DB. Fix: probabilistic early expiration or mutex locks.
Never cache mutable state without a clear invalidation plan — 'just set a short TTL' is not a plan.

Remember this

Write-invalidate (delete on write, repopulate on next read) is the safest strategy when data must be fresh. TTL alone is only for data where brief staleness is acceptable.

Multi-Level Cache Architecture

Production systems rarely use a single cache. A typical architecture has three layers: an in-process cache (memory inside the application process), a distributed cache (Redis shared across all instances), and a CDN or reverse proxy cache (Nginx, Varnish, or a cloud CDN).

In-process cache (e.g. a ConcurrentDictionary in .NET or an LRU Map in Node) is the fastest — nanoseconds — but is per-instance and loses state on restart. Distributed cache (Redis) is milliseconds and shared across instances. CDN is the outermost layer and absorbs the most traffic. Requests move outward when they miss: process → Redis → origin database.

Quick reference

L1 (in-process): fastest (ns), zero network. Use for hot config, static lookup tables, session data within one request.
L2 (Redis): shared across instances (ms). The main application cache tier for dynamic data.
L3 (CDN/reverse proxy): absorbs edge traffic before it hits your servers (5–50ms from user).
Invalidation across layers: deleting a Redis key doesn't clear in-process caches on other instances. Use Redis Pub/Sub to broadcast invalidation events.
Memory limits: set maxmemory on Redis and choose an eviction policy (allkeys-lru for cache workloads).
Monitoring: track cache hit rate, eviction rate, memory usage, and key TTL distribution in dashboards.

Remember this

Layer caches from fastest to slowest: in-process → Redis → CDN. Invalidation must cascade through all layers — a key deleted in Redis may still live in process memory.

Key takeaway

Caching is a spectrum, not a switch. Start with cache-aside and Redis for your most-read endpoints. Add TTLs that match your data's change frequency — product prices every 5 minutes, user profiles every hour, static configuration at startup. Layer a CDN in front of your API for public, read-heavy routes. Measure hit ratio from day one.

The moment you add a cache, you introduce eventual consistency. That's a trade-off you must own consciously. Write-invalidate keeps you honest: delete on write, repopulate on read, and you'll never wonder why users are seeing yesterday's data at tomorrow's price.

ScalingSystem Design

Horizontal vs Vertical Scaling

Scaling is how a system handles more users, data, or traffic. Vertical scaling (scale up) means giving your existing ser…

Read

Rate LimitingAPI

Rate Limiting: Token Bucket vs Sliding Window

Every public API needs rate limiting — to prevent abuse, protect downstream services, and ensure fair usage across tenan…

Read

DatabasesSQL

Database Indexing Explained: B-Trees, Composite Indexes, and Query Optimization

A query that takes 4 seconds without an index takes 0.2ms with one. That's a 20,000x improvement from a single line of S…

Read