Staying out of TTL hell

Cache strategies that don't trade-off correctness for speed

diagram of eviction from a volatile cache

Seven in the bed and the little one said: "roll over". So they all rolled over - and one fell out!

Coming up with a TTL for some cached data ("time to live", or how long to retain it) can become a kind of quack numerology for programmers. Caching by TTL gives up correctness to gain speed. But how much correctness can you give up? How long can you show the wrong value somewhere before a user is confused? How long before that user suspects a problem in their account and becomes a burden on customer services?

Caching matters because it's a big speed up. A program with naive code but which uses a cache well is normally quicker than a program with clever code that doesn't. Using a cache often changes the implicit complexity class of the system - downwards, and towards constant time.

But it's fair to say that caching is uncool. There's a snobbery about application level caches. It's more respectable to rewrite the program in a "systems programming language" (read: fast, compiled, for the serious) than to apply Memcache liberally to your PHP app (read: a "band-aid", apache2, used only by jokers). That's a shame, particularly because often the rewriting doesn't begin (or never finishes) and the PHP app stays in prod, steadfast and immovable, and with easy optimisations forgone.

In this article I will detail caching strategies that don't give up correctness to gain speed. These strategies all do away with TTLs and instead use active invalidation.

Strategy 1: Just never invalidate

The simplest strategy is to just never invalidate. Some data doesn't go bad. The contents of CSV upload #42345 won't change. Neither will the HTML conversion of some markdown-formatted product data. One reason people don't consider this strategy is that they wrongly worry that the cache will "fill up". Volatile caches don't work like that.

You don't need to manage the contents of a volatile cache via the rubber lever of a TTL. A volatile cache will manage its own contents: by eviction.

Application caches, whether Memcache or appropriately configured Redis (see below), operate under a "least recently used" (LRU) algorithm. Infrequently used - "cold" - data will be evicted as necessary in order to make space for more frequently used - "hot" - data. This ensures that the cache as a whole is as hot as possible, given the space available. The aim is to have a "full" cache at all times.

diagram of eviction from a volatile cache

When new data arrives, older (colder) data is moved down the pecking order. If the cache is full - as it should be - the coldest data is rudely evicted. I've added a puff of smoke for emphasis.

The result is a cache that is chock full of data your application is likely to need.

Don't mix-up eviction - the LRU part, with expiry - the TTL part. Data can be evicted before it has expired. And it often will be: most caches don't consider the TTL when looking for stuff to evict.

Strategy 2: Update-on-write

Sometimes you want to cache something that will change. For example, your application's User object: username, email address, postal address, etc. This data is bound to change over time but is also accessed extremely frequently, often on every request or as part of every operation.

The best approach here is to update the cache whenever you update the data in your persistent store. That way the cache stays up to date without needing any fuzzy TTLs and all the associated guesses at how long something will stay valid. Brill.

Under this strategy, reads will predominately come from the cache and only writes need go to the permanent store or database. Cache evictions are not a problem: when the cache doesn't hold something, you fall back to a read from the database (a requirement for every strategy I discuss on this page).

A little bonus is that many databases can do writes faster when writers aren't contending with readers for locks.

Strategy 3: Invalidate-on-write

Sometimes a change to data can affect multiple things in the cache. For example: you might hold the User object in the cache under both its user_id and email_address. Whenever you change the user you'll have everything in memory already, so you can easily update both cached values. You can encapsulate this nicely in the one place in your code-base where users are edited. Happy days.

The trouble comes when you don't have on hand all the data you'd need to do those updates. For example, when a user likes an article then the cache of their list of most liked article authors needs to be updated, but you don't have all of the data for that in memory when you're just processing a single like. You could pull it all in each time in order to recalculate this list in the cache—or you could simply invalidate the list in the cache and regenerate it when it's next needed.

It can vary from scenario to scenario but usually invalidating will out-speed pulling extra data into the program just to feed the cache. It's no slower overall to regenerate the value next time it's required and, you never know, you might be lucky: perhaps the user won't look at it before it changes yet again.

This strategy comes up commonly with linked data but extremely frequently with aggregated data. Invalidate-on-write can be applied pretty widely but has a big limitation: you need to know the cache keys in order to invalidate them. That isn't always possible.

Strategy 4: Namespacing

When there is no finite, knowable, list of keys to invalidate as the knock-on effect from a change the best solution is cache "namespacing".

Namespacing works like this: you have a special namespace value that you include as part of the key to other cached data. Often, but not always, a namespace value is a monotonically increasing timestamp of the last change.

A worked example:

/namespaces/user657 will store a monotonically increasing timestamp of the last time user #657 changed their profile, say 1514937600.

Then, keys regarding that user include that timestamp somewhere, for example:

Each logical cache lookup now requires two real lookups - first for the namespace key and then for the actual key. When the user changes something, you update the contents of their namespace key and all of the keys that were under the old namespace key are now unreachable and so de facto invalidated. When they cool down enough, they will be evicted.

The obvious downside: each namespaced cache access now needs two round-trips instead of one. It's often worth it. Firstly because namespacing allows for a much wider use of caching than without - you're caching stuff you couldn't otherwise. The second reason is that two cache round trips is usually still much faster the regenerating or retrieving whatever it is. A cache hit is hopefully much more than twice as fast as a cache miss.

Be careful: there are many ways to naively "improve on" this strategy in ways that don't work. I won't go into detail on the dubiously "improved" versions, except to point out two invariants that are required for a scheme like this to work correctly:

  1. The namespace key needs to at least as hot as the hottest key under it, else it's in danger of being persistently evicted ahead of those keys. Beware "optimisations" that avoid having to retrieve it - that will cause LRU caches to perceive it as cold.
  2. De facto invalidations of this kind assume that there are no other ways to access stale subkeys. Beware attempts to replace the hierarchical namespace with tag-based approaches, which allow for these stale subkeys to continue to be retrieved after an intended invalidation.

Strategy 5: HTTP - and PUSH and PURGE

HTTP has a caching system built-in but it's a TTL-only system. Broadly speaking though, strategy #1 continues to work: just set an absurdly long TTL and browsers and CDNs will obey it - to the extent they care to.

Strategy #4 also works well in HTTP, except in this context it's typically called "cache busting". A common usage is to put version strings or release timestamps into URL paths or query-strings to ensure the that the browser downloads only the latest JavaScript bundle.

Sometimes CDNs provide explicit purge APIs, though this is mainly a convienience for people deploying changes to static files. Traditional CDN purges tend to take minutes to complete and so are probably not something to rely on from the application level though something like fly.io where you're running a bit of your code inside the CDN may work better.

Caching reverse proxies that you self-host, like Varnish and Apache Traffic Server, can use non-standard PUSH and PURGE verbs that let you explicitly control the cache contents. If you can invalidate explicitly you can use strategies #2 and #3. If you have the file on hand, why not also populate your reverse proxy's cache?

The nice thing overall about caching at the HTTP level is that it takes work off the applications server's plate.

Tips and pitfalls

Caching isn't all sunshine and rainbows. Here are some problems to avoid and some tips to consider.

Don't put non-cache data in the cache

One practical issue is that occasionally programmers succumb to baser software architectures and start using the cache as a database and putting real data in there.

Some people commit this sin because getting a new data store approved by the organisation's Central Galactic Technical Advisory Board is a bureaucratic nightmare. Meanwhile the cache server is sitting there, looking like a whacking great hash table that you can just jam stuff into. Consequently it appears to be great place to keep some static data without needing to ask anyone for permission. "Surely this data will always be hot enough not to get evicted", someone thinks. But eventually their special data will go cold and there will be great wailing and gnashing of teeth when Memcache vents their stuff into space. "Stupid cache! How dare you!"

Other times, the cause is tempting extra features that the cache has. Redis has a huge number of built in features, far beyond caching: it can serve as a message bus, a service finder, a work queue and even as a kind of non-relational database. However, Redis's memory limits and eviction policies are set globally for the whole instance and the defaults are not right for caching - they are aimed at the non-volatile uses of Redis.

Redis's default maximum memory setting (maxmemory) is unlimited, meaning that the instance will expand forever - even beyond physical memory - without ever evicting a thing. For caching, set a max memory limit that is strictly smaller than physical memory. The default eviction policy (maxmemory-policy) is not to evict but instead to raise errors when the memory is full. Again, not right for caching where you want eviction. Set this to allkeys-lru.

Most managed services, like AWS' ElastiCache, have a different default for the eviction policy: volatile-lru. This setting considers only for evictions those keys which have a TTL set. This is intended to be "clever" and allow additional usage of the cache instance for non-cache data via overloading the TTL flag. volatile-lru is, in fact, a booby trap that has a lot of surprising failure modes. For one it serves to encourage programmers to put non-cache data into a server that the sysadmins think of as volatile and transient. For another, programmers who don't know about this special setting are apt to fill the instance by mistake by repeatedly inserting cache data without a TTL.

You can either set Redis up as a "data-structures" server or you set it up right as a cache. You can't do both. If you choose to use Redis as your cache, ensure that the cache instance is only serving as your cache. Your inter-system message bus should be on a different Redis with a different configuration.

Never require a cache hit

It's important never to require a cache hit - even as a soft requirement. Evictions can happen at inconvenient times - such as when some other part of the system is under load - and there mustn't be any negative consequences to a cache miss.

One particularly naughty thing to do is to store web-sessions only in the cache. A cache miss then becomes an involuntary logout for the hapless user who has done nothing wrong and transgressed no one. Instead, use strategy #2 above and store the web-sessions in your database, using your cache just as a speedup.

To shake out issues like this one it's worth trying to run your automated tests with special caches: ones that have a 0% hit rate or a random hit rate. Nothing should be broken by a cache miss, so any tests that fail are worthy of investigation.

Decorator-style APIs

In Python land specifically, people love decorators. For example

from functools import lru_cache

@lru_cache(maxsize=1000)
def get_slow_thing_v1(thing_id):
    thing = get_slow_thing_from_a_database_layer(thing_id)
    return thing

There's nothing wrong with the above code but it only allows for strategy #1: never invalidate. If you need to invalidate/update the code is necessarily more involved:

from pyappcache import RedisCache

my_cache = RedisCache()

def get_slow_thing_v2(thing_id):
    thing = my_cache.get_by_str(thing_id)

    if thing is None:
        thing = get_slow_thing_from_a_database_layer()
        my_cache.set_by_str(thing_id, thing)

    return thing

def update_something_slow(thing_id, new_thing):
    set_thing_in_a_database_layer(new_thing)
    my_cache.set_by_str(thing_id, new_thing)

The second version is wordier but but makes it possible to update the cache when setting something.

The other advantage is that you can retrieve the value from the cache in other places by key - it's no longer tied to get_something_slow_v1. This doesn't matter much in trivial cases but does start to matter in bigger systems. If you can cache by arguments, why not, but best not to develop a fixation on caching purely based on function arguments as that can be a bit limiting.

Mentioning my own library

If you're working in Python and want to cache stuff, consider using my library: pyappcache (code here, docs here).

It has many whizzy features, including:

There are other good libraries for Python, including dogpile.cache, cachew and you can of course use redis-py or pylibmc directly.

Caching is undeniably extra faff - but sometimes worth it

None of this is to imply that using a cache is always worthwhile. It's more work to program the caching, an extra backing service you need to run and of course more precious space for bugs to hide. The "Rashomon effect" of cache bugs makes them particularly frustrating both to experience and to debug.

All the same, there are a huge number of systems out there which could go down a couple of AWS instance sizes if a spot of Memcache was applied. Think of the environment, if that motivates you.

One tip: it's best to start by looking to invalidate. To paraphrase someone: "ask not of what you can cache - ask of what you can invalidate". If you can't invalidate something reliably, it's unlikely that you can cache it in the first place. Before you start trying to cache something, check that you can invalidate it in all the places where it needs to be invalidated.


Contact/etc


See also

I like this explanation from the Memcache maintainer on why it's not a good idea just to store sessions in your cache.

Apache (née Inktomi) Traffic Server supports PUSH and PURGE.

I think Varnish supports both too but you have to configure it.

Two pages I like that describe the (mildly outdated, big picture) design of Memcache: Tinou's Memcache for dummies and Joshua Thijssen's Memcache Internals. Memcache's documentation is good and while clearly unfinished includes some tricks I haven't mentioned.

Redis's docs on using it an LRU cache are required reading before trying to do that. As with most things in Redis, there are sharp edges. It's important to know that the defaults cause issues for the caching use-case.