Usage Guide¶
Overview¶
At the time of this writing, popular key/value servers include Memcached, Redis and many others. While these tools all have different usage focuses, they all have in common that the storage model is based on the retrieval of a value based on a key; as such, they are all potentially suitable for caching, particularly Memcached which is first and foremost designed for caching.
With a caching system in mind, dogpile.cache provides an interface to a particular Python API targeted at that system.
A dogpile.cache configuration consists of the following components:
A region, which is an instance of
CacheRegion
, and defines the configuration details for a particular cache backend. TheCacheRegion
can be considered the “front end” used by applications.A backend, which is an instance of
CacheBackend
, describing how values are stored and retrieved from a backend. This interface specifies onlyget()
,set()
anddelete()
. The actual kind ofCacheBackend
in use for a particularCacheRegion
is determined by the underlying Python API being used to talk to the cache, such as Pylibmc. TheCacheBackend
is instantiated behind the scenes and not directly accessed by applications under normal circumstances.Value generation functions. These are user-defined functions that generate new values to be placed in the cache. While dogpile.cache offers the usual “set” approach of placing data into the cache, the usual mode of usage is to only instruct it to “get” a value, passing it a creation function which will be used to generate a new value if and only if one is needed. This “get-or-create” pattern is the entire key to the “Dogpile” system, which coordinates a single value creation operation among many concurrent get operations for a particular key, eliminating the issue of an expired value being redundantly re-generated by many workers simultaneously.
Rudimentary Usage¶
dogpile.cache includes a Pylibmc backend. A basic configuration looks like:
from dogpile.cache import make_region
region = make_region().configure(
'dogpile.cache.pylibmc',
expiration_time = 3600,
arguments = {
'url': ["127.0.0.1"],
}
)
@region.cache_on_arguments()
def load_user_info(user_id):
return some_database.lookup_user_by_id(user_id)
Above, we create a CacheRegion
using the make_region()
function, then
apply the backend configuration via the CacheRegion.configure()
method, which returns the
region. The name of the backend is the only argument required by CacheRegion.configure()
itself, in this case dogpile.cache.pylibmc
. However, in this specific case, the pylibmc
backend also requires that the URL of the memcached server be passed within the arguments
dictionary.
The configuration is separated into two sections. Upon construction via make_region()
,
the CacheRegion
object is available, typically at module
import time, for usage in decorating functions. Additional configuration details passed to
CacheRegion.configure()
are typically loaded from a configuration file and therefore
not necessarily available until runtime, hence the two-step configurational process.
Key arguments passed to CacheRegion.configure()
include expiration_time, which is the expiration
time passed to the Dogpile lock, and arguments, which are arguments used directly
by the backend - in this case we are using arguments that are passed directly
to the pylibmc module.
Region Configuration¶
The make_region()
function currently calls the CacheRegion
constructor directly.
- class dogpile.cache.region.CacheRegion(name: typing.Optional[str] = None, function_key_generator: typing.Callable[[...], typing.Callable[[...], str]] = <function function_key_generator>, function_multi_key_generator: typing.Callable[[...], typing.Callable[[...], typing.Sequence[str]]] = <function function_multi_key_generator>, key_mangler: typing.Optional[typing.Callable[[str], str]] = None, serializer: typing.Optional[typing.Callable[[typing.Any], bytes]] = None, deserializer: typing.Optional[typing.Callable[[bytes], typing.Any]] = None, async_creation_runner: typing.Optional[typing.Callable[[dogpile.cache.region.CacheRegion, str, typing.Callable[[], typing.Any], dogpile.cache.api.CacheMutex], None]] = None)
A front end to a particular cache backend.
- Parameters
name¶ – Optional, a string name for the region. This isn’t used internally but can be accessed via the
.name
parameter, helpful for configuring a region from a config file.function_key_generator¶ –
Optional. A function that will produce a “cache key” given a data creation function and arguments, when using the
CacheRegion.cache_on_arguments()
method. The structure of this function should be two levels: given the data creation function, return a new function that generates the key based on the given arguments. Such as:def my_key_generator(namespace, fn, **kw): fname = fn.__name__ def generate_key(*arg): return namespace + "_" + fname + "_".join(str(s) for s in arg) return generate_key region = make_region( function_key_generator = my_key_generator ).configure( "dogpile.cache.dbm", expiration_time=300, arguments={ "filename":"file.dbm" } )
The
namespace
is that passed toCacheRegion.cache_on_arguments()
. It’s not consulted outside this function, so in fact can be of any form. For example, it can be passed as a tuple, used to specify arguments to pluck from **kw:def my_key_generator(namespace, fn): def generate_key(*arg, **kw): return ":".join( [kw[k] for k in namespace] + [str(x) for x in arg] ) return generate_key
Where the decorator might be used as:
@my_region.cache_on_arguments(namespace=('x', 'y')) def my_function(a, b, **kw): return my_data()
See also
function_key_generator()
- default key generatorkwarg_function_key_generator()
- optional gen that also uses keyword argumentsfunction_multi_key_generator¶ –
Optional. Similar to
function_key_generator
parameter, but it’s used inCacheRegion.cache_multi_on_arguments()
. Generated function should return list of keys. For example:def my_multi_key_generator(namespace, fn, **kw): namespace = fn.__name__ + (namespace or '') def generate_keys(*args): return [namespace + ':' + str(a) for a in args] return generate_keys
key_mangler¶ – Function which will be used on all incoming keys before passing to the backend. Defaults to
None
, in which case the key mangling function recommended by the cache backend will be used. A typical mangler is the SHA1 mangler found atsha1_mangle_key()
which coerces keys into a SHA1 hash, so that the string length is fixed. To disable all key mangling, set toFalse
. Another typical mangler is the built-in Python functionstr
, which can be used to convert non-string or Unicode keys to bytestrings, which is needed when using a backend such as bsddb or dbm under Python 2.x in conjunction with Unicode keys.serializer¶ –
function which will be applied to all values before passing to the backend. Defaults to
None
, in which case the serializer recommended by the backend will be used. Typical serializers includepickle.dumps
andjson.dumps
.New in version 1.1.0.
deserializer¶ –
function which will be applied to all values returned by the backend. Defaults to
None
, in which case the deserializer recommended by the backend will be used. Typical deserializers includepickle.dumps
andjson.dumps
.New in version 1.1.0.
async_creation_runner¶ –
A callable that, when specified, will be passed to and called by dogpile.lock when there is a stale value present in the cache. It will be passed the mutex and is responsible releasing that mutex when finished. This can be used to defer the computation of expensive creator functions to later points in the future by way of, for example, a background thread, a long-running queue, or a task manager system like Celery.
For a specific example using async_creation_runner, new values can be created in a background thread like so:
import threading def async_creation_runner(cache, somekey, creator, mutex): ''' Used by dogpile.core:Lock when appropriate ''' def runner(): try: value = creator() cache.set(somekey, value) finally: mutex.release() thread = threading.Thread(target=runner) thread.start() region = make_region( async_creation_runner=async_creation_runner, ).configure( 'dogpile.cache.memcached', expiration_time=5, arguments={ 'url': '127.0.0.1:11211', 'distributed_lock': True, } )
Remember that the first request for a key with no associated value will always block; async_creator will not be invoked. However, subsequent requests for cached-but-expired values will still return promptly. They will be refreshed by whatever asynchronous means the provided async_creation_runner callable implements.
By default the async_creation_runner is disabled and is set to
None
.New in version 0.4.2: added the async_creation_runner feature.
One you have a CacheRegion
, the CacheRegion.cache_on_arguments()
method can
be used to decorate functions, but the cache itself can’t be used until
CacheRegion.configure()
is called. The interface for that method is as follows:
- CacheRegion.configure(backend: str, expiration_time: Optional[Union[float, datetime.timedelta]] = None, arguments: Optional[Mapping[str, Any]] = None, _config_argument_dict: Optional[Mapping[str, Any]] = None, _config_prefix: Optional[str] = None, wrap: Sequence[Union[dogpile.cache.proxy.ProxyBackend, Type[dogpile.cache.proxy.ProxyBackend]]] = (), replace_existing_backend: bool = False, region_invalidator: Optional[dogpile.cache.region.RegionInvalidationStrategy] = None) dogpile.cache.region.CacheRegion
Configure a
CacheRegion
.The
CacheRegion
itself is returned.- Parameters
backend¶ – Required. This is the name of the
CacheBackend
to use, and is resolved by loading the class from thedogpile.cache
entrypoint.expiration_time¶ –
Optional. The expiration time passed to the dogpile system. May be passed as an integer number of seconds, or as a
datetime.timedelta
value.The
CacheRegion.get_or_create()
method as well as theCacheRegion.cache_on_arguments()
decorator (though note: not theCacheRegion.get()
method) will call upon the value creation function after this time period has passed since the last generation.arguments¶ – Optional. The structure here is passed directly to the constructor of the
CacheBackend
in use, though is typically a dictionary.wrap¶ –
Optional. A list of
ProxyBackend
classes and/or instances, each of which will be applied in a chain to ultimately wrap the original backend, so that custom functionality augmentation can be applied.New in version 0.5.0.
See also
replace_existing_backend¶ –
if True, the existing cache backend will be replaced. Without this flag, an exception is raised if a backend is already configured.
New in version 0.5.7.
region_invalidator¶ –
Optional. Override default invalidation strategy with custom implementation of
RegionInvalidationStrategy
.New in version 0.6.2.
The CacheRegion
can also be configured from a dictionary, using the CacheRegion.configure_from_config()
method:
- CacheRegion.configure_from_config(config_dict, prefix)
Configure from a configuration dictionary and a prefix.
Example:
local_region = make_region() memcached_region = make_region() # regions are ready to use for function # decorators, but not yet for actual caching # later, when config is available myconfig = { "cache.local.backend":"dogpile.cache.dbm", "cache.local.arguments.filename":"/path/to/dbmfile.dbm", "cache.memcached.backend":"dogpile.cache.pylibmc", "cache.memcached.arguments.url":"127.0.0.1, 10.0.0.1", } local_region.configure_from_config(myconfig, "cache.local.") memcached_region.configure_from_config(myconfig, "cache.memcached.")
Using a Region¶
The CacheRegion
object is our front-end interface to a cache. It includes
the following methods:
- CacheRegion.get(key, expiration_time=None, ignore_expiration=False)
Return a value from the cache, based on the given key.
If the value is not present, the method returns the token
NO_VALUE
.NO_VALUE
evaluates to False, but is separate fromNone
to distinguish between a cached value ofNone
.By default, the configured expiration time of the
CacheRegion
, or alternatively the expiration time supplied by theexpiration_time
argument, is tested against the creation time of the retrieved value versus the current time (as reported bytime.time()
). If stale, the cached value is ignored and theNO_VALUE
token is returned. Passing the flagignore_expiration=True
bypasses the expiration time check.Changed in version 0.3.0:
CacheRegion.get()
now checks the value’s creation time against the expiration time, rather than returning the value unconditionally.The method also interprets the cached value in terms of the current “invalidation” time as set by the
invalidate()
method. If a value is present, but its creation time is older than the current invalidation time, theNO_VALUE
token is returned. Passing the flagignore_expiration=True
bypasses the invalidation time check.New in version 0.3.0: Support for the
CacheRegion.invalidate()
method.- Parameters
key¶ – Key to be retrieved. While it’s typical for a key to be a string, it is ultimately passed directly down to the cache backend, before being optionally processed by the key_mangler function, so can be of any type recognized by the backend or by the key_mangler function, if present.
expiration_time¶ –
Optional expiration time value which will supersede that configured on the
CacheRegion
itself.Note
The
CacheRegion.get.expiration_time
argument is not persisted in the cache and is relevant only to this specific cache retrieval operation, relative to the creation time stored with the existing cached value. Subsequent calls toCacheRegion.get()
are not affected by this value.New in version 0.3.0.
ignore_expiration¶ –
if
True
, the value is returned from the cache if present, regardless of configured expiration times or whether or notinvalidate()
was called.New in version 0.3.0.
- CacheRegion.get_or_create(key: str, creator: Callable[[...], Any], expiration_time: Optional[float] = None, should_cache_fn: Optional[Callable[[Any], bool]] = None, creator_args: Optional[Tuple[Any, Mapping[str, Any]]] = None) Any
Return a cached value based on the given key.
If the value does not exist or is considered to be expired based on its creation time, the given creation function may or may not be used to recreate the value and persist the newly generated value in the cache.
Whether or not the function is used depends on if the dogpile lock can be acquired or not. If it can’t, it means a different thread or process is already running a creation function for this key against the cache. When the dogpile lock cannot be acquired, the method will block if no previous value is available, until the lock is released and a new value available. If a previous value is available, that value is returned immediately without blocking.
If the
invalidate()
method has been called, and the retrieved value’s timestamp is older than the invalidation timestamp, the value is unconditionally prevented from being returned. The method will attempt to acquire the dogpile lock to generate a new value, or will wait until the lock is released to return the new value.Changed in version 0.3.0: The value is unconditionally regenerated if the creation time is older than the last call to
invalidate()
.- Parameters
key¶ – Key to be retrieved. While it’s typical for a key to be a string, it is ultimately passed directly down to the cache backend, before being optionally processed by the key_mangler function, so can be of any type recognized by the backend or by the key_mangler function, if present.
creator¶ – function which creates a new value.
creator_args¶ –
optional tuple of (args, kwargs) that will be passed to the creator function if present.
New in version 0.7.0.
expiration_time¶ –
optional expiration time which will override the expiration time already configured on this
CacheRegion
if not None. To set no expiration, use the value -1.Note
The
CacheRegion.get_or_create.expiration_time
argument is not persisted in the cache and is relevant only to this specific cache retrieval operation, relative to the creation time stored with the existing cached value. Subsequent calls toCacheRegion.get_or_create()
are not affected by this value.should_cache_fn¶ –
optional callable function which will receive the value returned by the “creator”, and will then return True or False, indicating if the value should actually be cached or not. If it returns False, the value is still returned, but isn’t cached. E.g.:
def dont_cache_none(value): return value is not None value = region.get_or_create("some key", create_value, should_cache_fn=dont_cache_none)
Above, the function returns the value of create_value() if the cache is invalid, however if the return value is None, it won’t be cached.
New in version 0.4.3.
See also
CacheRegion.cache_on_arguments()
- appliesget_or_create()
to any function using a decorator.CacheRegion.get_or_create_multi()
- multiple key/value version
- CacheRegion.set(key: str, value: Any) None
Place a new value in the cache under the given key.
- CacheRegion.delete(key: str) None
Remove a value from the cache.
This operation is idempotent (can be called multiple times, or on a non-existent key, safely)
- CacheRegion.cache_on_arguments(namespace: typing.Optional[str] = None, expiration_time: typing.Optional[typing.Union[float, typing.Callable[[], float]]] = None, should_cache_fn: typing.Optional[typing.Callable[[typing.Any], bool]] = None, to_str: typing.Callable[[typing.Any], str] = <class 'str'>, function_key_generator: typing.Optional[typing.Callable[[...], typing.Callable[[...], str]]] = None) Callable[[Callable[[...], Any]], Callable[[...], Any]]
A function decorator that will cache the return value of the function using a key derived from the function itself and its arguments.
The decorator internally makes use of the
CacheRegion.get_or_create()
method to access the cache and conditionally call the function. See that method for additional behavioral details.E.g.:
@someregion.cache_on_arguments() def generate_something(x, y): return somedatabase.query(x, y)
The decorated function can then be called normally, where data will be pulled from the cache region unless a new value is needed:
result = generate_something(5, 6)
The function is also given an attribute
invalidate()
, which provides for invalidation of the value. Pass toinvalidate()
the same arguments you’d pass to the function itself to represent a particular value:generate_something.invalidate(5, 6)
Another attribute
set()
is added to provide extra caching possibilities relative to the function. This is a convenience method forCacheRegion.set()
which will store a given value directly without calling the decorated function. The value to be cached is passed as the first argument, and the arguments which would normally be passed to the function should follow:generate_something.set(3, 5, 6)
The above example is equivalent to calling
generate_something(5, 6)
, if the function were to produce the value3
as the value to be cached.New in version 0.4.1: Added
set()
method to decorated function.Similar to
set()
isrefresh()
. This attribute will invoke the decorated function and populate a new value into the cache with the new value, as well as returning that value:newvalue = generate_something.refresh(5, 6)
New in version 0.5.0: Added
refresh()
method to decorated function.original()
on other hand will invoke the decorated function without any caching:newvalue = generate_something.original(5, 6)
New in version 0.6.0: Added
original()
method to decorated function.Lastly, the
get()
method returns either the value cached for the given key, or the tokenNO_VALUE
if no such key exists:value = generate_something.get(5, 6)
New in version 0.5.3: Added
get()
method to decorated function.The default key generation will use the name of the function, the module name for the function, the arguments passed, as well as an optional “namespace” parameter in order to generate a cache key.
Given a function
one
inside the modulemyapp.tools
:@region.cache_on_arguments(namespace="foo") def one(a, b): return a + b
Above, calling
one(3, 4)
will produce a cache key as follows:myapp.tools:one|foo|3 4
The key generator will ignore an initial argument of
self
orcls
, making the decorator suitable (with caveats) for use with instance or class methods. Given the example:class MyClass: @region.cache_on_arguments(namespace="foo") def one(self, a, b): return a + b
The cache key above for
MyClass().one(3, 4)
will again produce the same cache key ofmyapp.tools:one|foo|3 4
- the nameself
is skipped.The
namespace
parameter is optional, and is used normally to disambiguate two functions of the same name within the same module, as can occur when decorating instance or class methods as below:class MyClass: @region.cache_on_arguments(namespace='MC') def somemethod(self, x, y): "" class MyOtherClass: @region.cache_on_arguments(namespace='MOC') def somemethod(self, x, y): ""
Above, the
namespace
parameter disambiguates betweensomemethod
onMyClass
andMyOtherClass
. Python class declaration mechanics otherwise prevent the decorator from having awareness of theMyClass
andMyOtherClass
names, as the function is received by the decorator before it becomes an instance method.The function key generation can be entirely replaced on a per-region basis using the
function_key_generator
argument present onmake_region()
andCacheRegion
. If defaults tofunction_key_generator()
.- Parameters
namespace¶ – optional string argument which will be established as part of the cache key. This may be needed to disambiguate functions of the same name within the same source file, such as those associated with classes - note that the decorator itself can’t see the parent class on a function as the class is being declared.
expiration_time¶ –
if not None, will override the normal expiration time.
May be specified as a callable, taking no arguments, that returns a value to be used as the
expiration_time
. This callable will be called whenever the decorated function itself is called, in caching or retrieving. Thus, this can be used to determine a dynamic expiration time for the cached function result. Example use cases include “cache the result until the end of the day, week or time period” and “cache until a certain date or time passes”.should_cache_fn¶ – passed to
CacheRegion.get_or_create()
.to_str¶ – callable, will be called on each function argument in order to convert to a string. Defaults to
str()
. If the function accepts non-ascii unicode arguments on Python 2.x, theunicode()
builtin can be substituted, but note this will produce unicode cache keys which may require key mangling before reaching the cache.function_key_generator¶ – a function that will produce a “cache key”. This function will supersede the one configured on the
CacheRegion
itself.
Creating Backends¶
Backends are located using the setuptools entrypoint system. To make life easier for writers of ad-hoc backends, a helper function is included which registers any backend in the same way as if it were part of the existing sys.path.
For example, to create a backend called DictionaryBackend
, we subclass
CacheBackend
:
from dogpile.cache.api import CacheBackend, NO_VALUE
class DictionaryBackend(CacheBackend):
def __init__(self, arguments):
self.cache = {}
def get(self, key):
return self.cache.get(key, NO_VALUE)
def set(self, key, value):
self.cache[key] = value
def delete(self, key):
self.cache.pop(key)
Then make sure the class is available underneath the entrypoint
dogpile.cache
. If we did this in a setup.py
file, it would be
in setup()
as:
entry_points="""
[dogpile.cache]
dictionary = mypackage.mybackend:DictionaryBackend
"""
Alternatively, if we want to register the plugin in the same process
space without bothering to install anything, we can use register_backend
:
from dogpile.cache import register_backend
register_backend("dictionary", "mypackage.mybackend", "DictionaryBackend")
Our new backend would be usable in a region like this:
from dogpile.cache import make_region
region = make_region("myregion")
region.configure("dictionary")
data = region.set("somekey", "somevalue")
The values we receive for the backend here are instances of
CachedValue
. This is a tuple subclass of length two, of the form:
(payload, metadata)
Where “payload” is the thing being cached, and “metadata” is information we store in the cache - a dictionary which currently has just the “creation time” and a “version identifier” as key/values. If the cache backend requires serialization, pickle or similar can be used on the tuple - the “metadata” portion will always be a small and easily serializable Python structure.
Changing Backend Behavior¶
The ProxyBackend
is a decorator class provided to easily augment existing
backend behavior without having to extend the original class. Using a decorator
class is also adventageous as it allows us to share the altered behavior between
different backends.
Proxies are added to the CacheRegion
object using the CacheRegion.configure()
method. Only the overridden methods need to be specified and the real backend can
be accessed with the self.proxied
object from inside the ProxyBackend
.
For example, a simple class to log all calls to .set()
would look like this:
from dogpile.cache.proxy import ProxyBackend
import logging
log = logging.getLogger(__name__)
class LoggingProxy(ProxyBackend):
def set(self, key, value):
log.debug('Setting Cache Key: %s' % key)
self.proxied.set(key, value)
ProxyBackend
can be be configured to optionally take arguments (as long as the
ProxyBackend.__init__()
method is called properly, either directly
or via super()
. In the example
below, the RetryDeleteProxy
class accepts a retry_count
parameter
on initialization. In the event of an exception on delete(), it will retry
this many times before returning:
from dogpile.cache.proxy import ProxyBackend
class RetryDeleteProxy(ProxyBackend):
def __init__(self, retry_count=5):
super(RetryDeleteProxy, self).__init__()
self.retry_count = retry_count
def delete(self, key):
retries = self.retry_count
while retries > 0:
retries -= 1
try:
self.proxied.delete(key)
return
except:
pass
The wrap
parameter of the CacheRegion.configure()
accepts a list
which can contain any combination of instantiated proxy objects
as well as uninstantiated proxy classes.
Putting the two examples above together would look like this:
from dogpile.cache import make_region
retry_proxy = RetryDeleteProxy(5)
region = make_region().configure(
'dogpile.cache.pylibmc',
expiration_time = 3600,
arguments = {
'url':["127.0.0.1"],
},
wrap = [ LoggingProxy, retry_proxy ]
)
In the above example, the LoggingProxy
object would be instantated by the
CacheRegion
and applied to wrap requests on behalf of
the retry_proxy
instance; that proxy in turn wraps
requests on behalf of the original dogpile.cache.pylibmc backend.
New in version 0.4.4: Added support for the ProxyBackend
class.
Configuring Logging¶
New in version 0.9.0.
CacheRegion
includes logging facilities that will emit debug log
messages when key cache events occur, including when keys are regenerated as
well as when hard invalidations occur. Using the Python logging module, set the log level to
dogpile.cache
to logging.DEBUG
:
logging.basicConfig()
logging.getLogger("dogpile.cache").setLevel(logging.DEBUG)
Debug logging will indicate time spent regenerating keys as well as when keys are missing:
DEBUG:dogpile.cache.region:No value present for key: '__main__:load_user_info|2'
DEBUG:dogpile.cache.region:No value present for key: '__main__:load_user_info|1'
DEBUG:dogpile.cache.region:Cache value generated in 0.501 seconds for keys: ['__main__:load_user_info|2', '__main__:load_user_info|3', '__main__:load_user_info|4', '__main__:load_user_info|5']
DEBUG:dogpile.cache.region:Hard invalidation detected for key: '__main__:load_user_info|3'
DEBUG:dogpile.cache.region:Hard invalidation detected for key: '__main__:load_user_info|2'