How to Cache?

Table of Contents

Modern APIs frequently access databases, or complex business logic that introduce significant latency and consume CPU and I/O resources. Without caching, every request pays the full cost of database queries, network calls, and computation. This can lead to slow response times and poor scalability as traffic increases.

Caching is a technique for storing expensive to compute or slow to fetch data in a faster storage layer so repeated requests can be served quickly without hitting the original source each time. For Java APIs, a solid caching strategy can drastically improve latency, throughput and reliability, especially under heavy load.

Caching is especially valuable for:

Expensive database queries that aggregate or join large tables, such as analytics dashboards or complex search queries.
Frequently accessed but rarely changing data, such as reference data, configuration, product catalogs or content metadata.
API responses from third-party services that are rate-limited or charge per call, where caching reduces cost and protects against throttling.
CPU-intensive computations like report generation, HTML rendering, or recommendation results which can be used across many requests.

A very prevalent framework for caching is Redis.

What Redis Brings to Caching
#

Redis is an open source, in-memory key value data structure that keeps data in RAM rather than on disk, making access orders of magnitude faster than traditional databases.

For caching, Redis offers several critical features :-

In-memory storage for very low read and write latency.
Configurable Time To Live (TTL) for automatic cache eviction.
Eviction policies like LRU, LFU, FIFO to remove least useful data when memory is full.
Atomic operations for thread-safe cache updates.
Optional persistence and replication, allowing data to survive restarts.

Typical manual caching steps include:

On a read, compute a cache key (for example, user:123:profile), query Redis. If present, deserialize and return the value.
If not present, fetch from the database, serialize and store in Redis with a TTL, then return the value.
On writes, either update the cache entry (write-through) or invalidate/delete the key and allow it to be repopulated on the next read (write-invalidate/cache-aside).

Core Caching Principles
#

1. What to Cache (Data Selection)
#

Not all data should be cached. Good candidates include:

Data that is read frequently and updated infrequently, such as user profiles, product details, or configuration settings.
Expensive-to-calculate results such as complex reports, aggregations, or multi-step workflows.
Responses from external APIs where rate limits or cost constraints apply.

Poor candidates include highly volatile data that changes very frequently or data with strong transactional consistency requirements that cannot tolerate staleness.

2. Read-Write Patterns and Workload
#

Understanding the API’s read/write ratio and access distribution / data shape is essential for determining cache value.

Read-heavy workloads benefit greatly from caching because a single cached object may serve thousands of requests.
Write-heavy or strongly consistent workloads may require more conservative caching (short TTLs, write-through, or event driven invalidation) to avoid stale responses.
Skewed traffic patterns (e.g. hot keys, popular items) can be handled well with caching but also pose risks of cache stampedes when keys expire.

3. Key Design and Namespacing
#

Cache keys must be designed so that they uniquely and predictably identify cached items.

Keys typically incorporate entity type, identifier and sometimes version or locale, such as user:123:profile:v1.
Namespacing keys by domain or service (for example, catalog:product:42) simplifies bulk invalidation and avoids collisions across teams.
Versioning keys allows mass invalidation when schemas or interpretation logic changes by simply bumping a version prefix/suffix, without individually deleting old entries.

Clear conventions for key construction should be documented and enforced to avoid subtle cache bugs.

4. Consistency and Freshness Requirements
#

A core design trade-off in caching is between consistency and data freshness.

TTL-based expiration: Each key is given a TTL; once expired, the entry is refetched on the next request. This provides eventual consistency and predictable staleness windows.
Write-through: Updates are written to the cache at the write time, either updating or invalidating entries immediately, providing stronger consistency at the cost of increased write path complexity.
Event-driven invalidation: Cache invalidation is triggered by events from the data source (e.g., database change events, message queue updates) rather than relying solely on TTLs. This allows for more immediate cache updates and can reduce data staleness in write-heavy workloads.

The strategy should specify, per data type, the acceptable staleness window, whether stale reads are allowed, and how conflicts are resolved when outdated data is served.

5. Handling Cache Stampede
#

A cache stampede occurs when many requests simultaneously miss the cache for the same key, causing a surge of requests to the underlying data source. Mitigation techniques include:

Request Coalescing: Only one request is allowed to fetch the data from the database and populate the cache, while others wait for the cache to be populated.
Staggered Expiration: Instead of setting the same TTL for all keys, set random TTLs for each key to avoid simultaneous expiration.
Cache Warming: Pre-populate the cache with frequently accessed data to avoid cache misses.

6. Serialization Format and Object Size
#

Java objects must be serialized before being stored in Redis.

Common formats include JSON, or binary formats.
Large objects can hurt performance, increase network latency and memory usage. Sometimes it’s preferred to only store the fields required by the API or partition the large object into multiple keys.

7. Observability and Tuning
#

A caching strategy is not complete without observability. a. Track cache hit/miss ratios, latency distributions, error rates, and eviction counts to understand whether the cache is effective. b. Use insights from monitoring to adjust TTLs, eviction policies, and which data is cached as access patterns evolve.

Real World Example
#

Now let’s consider a real world example of caching in a Java API. Consider an ecommerce API with the below endpoint :-

GET /products/{id}: Fetch product details by ID.

The backing store is a relational database (eg: Postgres/SQL Server) and Redis used as a distributed cache.

High-level strategy

What to cache? We can cache the individual product details (GET /products/{id}). These are read-heavy, not updated frequently.
Read/Write Patterns
- Read: First check the cache, if the data is not found, retrieve from the database, store in cache and return the value.
- Write: Every time a product is updated or deleted, delete the cache key.
Key Design
- Single Product: product:{id}
TTL The product details are not updated frequently, so we can use a long TTL. If changes to your product catalog are seasonal, you could go as high as 30 days.
Stampede protection Simple request coalescing by using a lock or by allowing only one repopulate per key.
Serialization Format and Object Size If product object is too big and you see cache fetches taking longer for some products, you should populate only the relevant fields which are returned by the API.
Observability and Tuning Track cache hit ratios, time taken for cache fetches, cache evictions, and error rates. Use these insights to tune TTLs, eviction policies, and which data is cached as access patterns evolve.

Talk is cheap, let’s look at the above high level strategy in action.

First let’s look at our product API without stampede protection :-

@Service
public class ProductService {
    private final ProductRepository repository;
    private final RedisTemplate<String, Product> redisTemplate;
    
    public ProductService(ProductRepository repository, RedisTemplate<String, Product> redisTemplate) {
        this.repository = repository;
        this.redisTemplate = redisTemplate;
    }

    public Product getProduct(Long id) {
        String key = "product:" + id;
        
        Product cached = null;
        try {
            cached = redisTemplate.opsForValue().get(key);
        } catch (Exception e) {
            // Logging
        }
        
        if (cached != null) {
            return cached;
        }
        
        Product product = repository.findById(id).orElse(null);
        if (product == null) {
            return null;
        }
        try {
            redisTemplate.opsForValue().set(key, product, 1, TimeUnit.MONTH);
        } catch (Exception e) {
            // Logging
        }
        return product;
    }

    public Product upsertProduct(Product product){
        Product saved = repository.save(product);
        String key = "product:" + saved.getId();
        try {
            redisTemplate.delete(key)
        } catch (Exception e){
            // Logging / logic to retry deletion
        }
        return saved;
    }
}

Now let’s try to add stampede protection to the above get method.

    public Product getProduct(Long id) {
        String key = "product:" + id;
        
        Product cached = null;
        try {
            cached = redisTemplate.opsForValue().get(key);
        } catch (Exception e) {
            // Logging
        }
        
        if (cached != null) {
            return cached;
        }

        String lockKey = "lock:" + key;
        String lockValue = UUID.randomUUID().toString();
        boolean lockAcquired = false;
        try {
            lockAcquired = Boolean.TRUE.equals(redisTemplate.opsForValue().setIfAbsent(lockKey, lockValue, 10, TimeUnit.SECONDS)); // tune as needed
        } catch (Exception e){
            // Logging
        }

        if (lockAcquired) {
            try {
                // Double check in case another request loaded the cache key and released the lock just before this request tried to acquire the lock
                cached = redisTemplate.opsForValue().get(key);
            } catch (Exception e){
                // Logging
            }
            if (cached != null) {
                return cached;
            }
            
            // Populate cache
            product = repository.findById(id).orElse(null);
            if (product == null) {
                return null;
            }
            try {
                redisTemplate.opsForValue().set(key, product, 1, TimeUnit.MONTH);
            } catch (Exception e) {
                // Logging
            }   
            return product;
        } finally {
            if (lockAcquired) {
                try {
                    redisTemplate.delete(lockKey);
                } catch (Exception e) {
                    // Logging
                }
            }
        }

        //lock not acquired
        // Wait for lock to be released
        try {
            Thread.sleep(1000); // small backoff, tune as necessary
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
        if(cached != null){
            return cached;
        }

        //fallback just hit the db if the cache is still not populated
        return repository.findById(id).orElse(null);
    }

In the above implementation, we do request coalescing such that only one request goes to the db and subsequent requests sleep for 1000 ms till the cache gets populated. In case there are any failures while setting the cache, the fallback is to hit the db directly.

What Redis Brings to Caching #

Core Caching Principles #

1. What to Cache (Data Selection) #

2. Read-Write Patterns and Workload #

3. Key Design and Namespacing #

4. Consistency and Freshness Requirements #

5. Handling Cache Stampede #

6. Serialization Format and Object Size #

7. Observability and Tuning #

Real World Example #

Related