This project presents one possible solution to the following problem:
Suppose there exists a pure function
struct Background;
struct State;
template
auto f (State&, In&&, std::shared_ptr);
So f modifies a state based on some input and some background
knowledge.
The customer calls f multiple times with the same parameters for
Background, State and In. The running time of f is quite high,
so the customer got the idea to implement a "cache" within the
Background. The goal of the cache is to speed up calls to f by
re-using results that have been computed in earlier calls to f.
Please describe how you would approach that task. In your solution:
- What changes to the interface are required?
- What requirements on the types
Background,StateandInare implied? Prefer to formulate your answer in terms of code. Certainly the customer will ask for resource consumption (time, memory) of the cached version in comparision with the unchached version. Please also take into account that the background knowledge exists for a long time and across multiple sessions and must not accumulate memory indefinitely.
The core idea is to use a map, that maps the input arguments to the computed solution. This reduces the complexity to be logarithmic in the amount of cached function calls, in the case where the function call was cached.
This is implemented by the class Function_cache.
With this implementation no changes to the existing interface are necessary. This enables an update without the need for the customer to adapt the code that has already been written.
Note that this implementation does not take up the customers' idea to implement the cache within the Background. This decision was made based on the following:
- The
Backgroundis part of the function call, implementing the cache within it would entail the storing of the cache itself for every cached function call. This would increase the memory usage drastically. - The functionality of a function cache can easily be encapsulated. This improves code maintainability and allowes it to be used for other functions as well.
- It enables the unchanged interface.
The changes to the existing code are all within the function f:
- The costly computation was moved into a local lambda, which translates the pure function into a returning function. This makes the handling of the cache easier since the
stateis input and output. Both values has to be stored separately. - The type of
Inis converted tostd::any. This ensures that the type of the map keys are the same. - The cache object is a static object within
f.
See bin/function_cache.cpp
Implied requirements for these types can be deduced from the usage:
-
The are used as a key in a
std::map, so they have to be:- copyable
- assignable
- comparable
Inhas to be convertible into astd::anyobject.
-
The cache is stored to memory, so they have to be:
- serializable
If the function call was cached the complexity is logarithmic in the amount of cached entries. One entry consists of the input and output parameter and two map iterators. Otherwise it is in the same complexity class as the function f on its own.
The memory usage for the cache amounts to the following:
cache_.size() * (2*sizeof(State) + sizeof(In) + sizeof(Background)) + 2*sizeof(Cache_t::iterator)
The allowed data usage can be adapted by setting the constant FC_MAX_CACHE_SIZE at compile time.
The location where the persistent cache file is stored can be set by FC_CACHE_FILEPATH at compile time.
Since the cache should be used across multiple sessions, the Function_cache loads an existing cache file from disk when constructed. The destructor updates the same file with the updated chache.
To prevent unlimited memory usage the amount of cached function calls is limited by a specified amount. Each time a new function call is cashed, the size of the map is checked and if it exceeds the specified maximal amount, the oldest cashed entry is removed from the map. See Function_cache::check_cache_size()
In order to maximize the chance of a cache hit the cache is updated with every call to the function. It is ordered from most frequent function call to the function call that was called the longest time ago. This is implemented in the function Function_cache::make_newest_entry. Start and end are saved as member variables Function_cache::newest_cache_entry_ & Function_cache::newest_cache_entry_). To facilitate the correct order of all cached entries each of these points to its immediate predecessor and successor. See Function_cache::Cache_entry::prev_ & Function_cache::Cache_entry::next_. This adds the functionality of a doubly linked list to the map.
This seemed to be an acceptable increase in memory usage and complexity, since it does not change the complexity class of the function call (it is still O(ln)).
Some things has been left out, that should be part of a complete project, since it seems they are not at the core of the Task:
- The reading and writing functions are left as placeholders.
- This project does not contain any tests.
- Proper exception handling was left out.