Niko's Project Corner

Caching and perf. monitoring with Redis and Python

Description An easy to use @cached decorator in Python.
Languages Python
Tags Databases
Duration Spring 2016
Modified 10th October 2016

When im­ple­ment­ing real-time APIs most of the time server load can greatly be re­duced by caching fre­quently ac­cessed and rarely mod­ified data, or re-us­able cal­cu­la­tion re­sults. Luck­ily Python has sev­eral fea­tures which make it easy to add new con­structs and wrap­pers to the lan­guage, for ex­am­ple thanks to *args, **kwargs func­tion ar­gu­ments, first-class func­tions, dec­ora­tors and so fort. Thus it doesn't take too much ef­fort to im­ple­ment a @cached dec­ora­tor with business-specific logic on cache invalidation. Redis is the perfect fit for the job thanks to its high performance, binary-friendly key-value store with TTL and different data eviction policies and support for other data structures which make it trivial to store additional key metrics there.

There al­ready ex­ists many Re­dis li­braries and even com­plete caching so­lu­tions, but as caching is fun­da­men­tally tricky to get right and has many edge-cases I de­cided to write my own (NIH :/). It also made it easy to add any nice-to-have fea­tures such as in­cre­ment­ing a value on the com­pu­ta­tion time his­togram if a cache miss oc­curs. This is "suf­fi­cient statis­tic" to cal­cu­late to­tal num­ber of cache misses, av­er­age time taken, its vari­ance and any per­centiles, in­clud­ing the fre­quently used 50th per­centile "me­dian".

To-be cached func­tions might have some ar­gu­ments which need to be skipped when cal­cu­lat­ing the cache key (such as a database con­nec­tion), non-de­fault TTL, whether to up­date TTL on cache hit or not and so forth. Thus @cached was im­ple­mented as a func­tion which takes these con­fig­ura­tion ar­gu­ments, and re­turns a "wrap­per" func­tion which is the ac­tual dec­ora­tor. Python dec­ora­tors are sim­ply func­tions which take a sin­gle ar­gu­ment (the dec­orated func­tion) and re­turns a new "wrapped" func­tion.

There are a few con­di­tions un­der which the cache code is skipped, and in­stead the orig­inal func­tion's value is di­rectly eval­uated and the re­sult is re­turned. One case is when Re­dis is not con­fig­ured, for ex­am­ple in dev en­vi­ron­ment. An other case is when the cache key cal­cu­la­tion fails (more on that later), of course ide­ally it wouldn't ever hap­pen. A third case is when a speci­fic "X-Skip-Re­dis-Cache" HTTP header is pre­sent in the API query, this is use­ful when run­ning in­te­gra­tion tests as it is im­por­tant to test the full code path and not get re­sults from the cache. It can also be used as an op­ti­miza­tion when the client knows it is not go­ing to ben­efit from caching.

As Re­dis is a key-value store, a "cache key" has to be gen­er­ated based on the func­tion ar­gu­ments. Key is first used to check if the re­sult al­ready ex­ists in Re­dis, and if not then it is cal­cu­lated and stored there. A ba­sic im­ple­men­ta­tion has many race-con­di­tions but in prac­tice they should be very rare and not too crit­ical. Func­tion ar­gu­ments can be quite large but it is bet­ter for Re­dis if we use rel­atively short keys, this is eas­ily achieved by cal­cu­lat­ing a hash and rep­re­sent­ing it in base-64. But we can only cal­cu­late the hash from a bi­nary rep­re­sen­ta­tion of ar­gu­ment, and as they can have ar­bi­trary types and item or­der may or may not be rel­evant (lists and tu­ples vs. dicts and sets) it needs to be nor­mal­ized some­how. I chose en­code them in JSON with al­pha­bet­ically or­dered keys in­stead of for ex­am­ple pick­ling, as it is a straight­for­ward pro­cess to im­ple­ment and un­der­stand. It doesn't han­dle all cor­ner cases but gets the job done for now.

Cache in­val­ida­tion is known to be dif­fi­cult to get right, even more so when you have mul­ti­ple ver­sions of the soft­ware run­ning con­cur­rently and hav­ing been con­fig­ured to dif­fer­ent cus­tomers' dif­fer­ent en­vi­ron­ments (dev vs. QA vs. pro­duc­tion). Thus con­fig file's con­tents and lat­est git com­mit's hash af­fect the gen­er­ated cache key and the is­sue is avoided.

Re­sults are pick­led and zlib com­pressed to min­imize mem­ory re­quire­ments on Re­dis. Also on cache miss var­ious met­rics are up­dated such as the time it took to cal­cu­late the re­sult, num­ber of hits vs. misses, how large the pick­led and com­pressed bi­na­ries were and so forth. This makes it easy to in­spect these in ret­ro­spect and de­tect any is­sues on for ex­am­ple low hit rates, or huge ob­jects be­ing cached for no gain.

Related blog posts: