Back to Blog
Cache Strategies in Distributed Systems
backenddevelopmentsystemdesigncaching

Cache Strategies in Distributed Systems

A fixed TTL that works fine at small scale can silently destroy your system at scale. Start with TTL jitter, understand the other strategies, and choose based on your traffic patterns and tolerance for complexity.

The Bug That Took Us Two Days to Find: Cache Stampede in Production

At my previous company, users were getting silently logged out.

Support tickets were piling up.

The frontend team believed it was a backend issue. The backend team suspected something was wrong on the frontend.

For two days, the teams kept investigating different parts of the system.

Eventually, we discovered the real issue.

All our Redis TTLs were expiring at the same time.

Because of time constraints, we needed an immediate solution. We decided to add jitter to the TTL values, so that cache entries would expire at different times instead of simultaneously.

Later, while studying system design in more depth, I realized this issue has a well-known name:

Cache Stampede (also known as the Thundering Herd Problem).

It turns out this is a very common issue in distributed systems, and many large-scale companies have faced it at some point.


Why Basic TTL Is Not Enough

A basic caching strategy might look something like this:

cache -> user_profile:123
TTL -> 60 seconds

After 60 seconds, the cache entry expires. The next request fetches fresh data from the database and rebuilds the cache.

For small applications with low traffic, this approach works perfectly fine.

However, in large-scale systems, this simple strategy can create serious problems.


The Biggest Problem: Cache Stampede (Thundering Herd)

Imagine your application receives 10,000+ requests for a popular endpoint.

If the cache expires at the same moment:

  • The cache entry disappears
  • All requests miss the cache
  • Every request directly queries the database
Cache expires
10,000 cache misses
10,000 database queries

The database suddenly receives a massive spike in traffic.

This can lead to:

  • Database overload
  • Increased latency
  • Request timeouts
  • Cascading system failures

This phenomenon is known as the Cache Stampede or Thundering Herd problem.

The root cause is synchronized cache expiration.


Why Synchronized Expiration Is Dangerous

Basic TTL means every key expires at exactly the same time.

If many requests depend on the same cache entry, they will all attempt to rebuild the cache simultaneously.

This creates traffic spikes and unnecessary pressure on the database.

To avoid this problem, engineers use different strategies. Each approach comes with its own trade-offs.


Strategies to Prevent Cache Stampede

1. TTL Jitter (The Approach We Used in Production)

Instead of assigning the same TTL to every cache entry, we introduce randomness.

TTL = 60 seconds + Math.floor(Math.random() * 60)

Now cache entries expire at slightly different times, which spreads the load over time.

Instead of thousands of requests hitting the database simultaneously, the traffic is distributed more evenly.

This is one of the simplest and most effective solutions.


2. Mutex Locking

In this approach, when the cache expires and multiple requests arrive:

  • One request acquires a lock
  • Other requests wait
Request A -> acquires lock
Request B -> blocked
Request C -> blocked

Request A then:

  1. Fetches data from the database
  2. Rebuilds the cache
  3. Releases the lock

After that, the blocked requests read the data directly from the cache.

This ensures only one database query happens per cache miss.


3. Cache Coalescing (Request Coalescing)

Cache coalescing takes a slightly different approach.

Instead of blocking requests with locks, identical requests are grouped together.

One request fetches data from the database, and the response is shared with all waiting requests.

100 requests
1 database query
→ response shared with all

This reduces duplicate backend calls and improves overall efficiency.


4. Probability-Based Early Re-computation

Another technique is refreshing the cache before it expires using a probability function.

Instead of waiting for the TTL to reach zero, some requests will refresh the cache earlier.

if (probability(TTL) < threshold) {
  refreshCache();
}

This spreads cache refresh operations across time and avoids sudden spikes.

Trade-off: The cache might be recomputed earlier than necessary, which increases compute usage.


Key Takeaway

Caching seems simple:

set(key, value, TTL);

But in high-traffic systems, expiration strategies become extremely important.

A fixed TTL that works fine at small scale can silently destroy your system at scale. Start with TTL jitter, understand the other strategies, and choose based on your traffic patterns and tolerance for complexity.

Related Posts

The Thundering Herd Problem

The Thundering Herd Problem

A business Nightmare. A massive number of concurrent requests overwhelm a server all at once.

systemdesignbackenddistributed system
Read More
Why I Added Redis to My Auth Flow (And What I Learned)

Why I Added Redis to My Auth Flow (And What I Learned)

A backend lesson on reducing repeated database lookups using in-memory caching with Redis — learned the hard way while building an authentication module.

backenddevelopmentredis
Read More
How MongoDB Aggregation Pipelines Saved My Profile API

How MongoDB Aggregation Pipelines Saved My Profile API

What started as a simple profile edit page turned into a lesson on database efficiency, scaling costs, and the power of MongoDB aggregation pipelines.

backenddevelopmentmongodb
Read More


© 2026. All rights reserved.