Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions content/develop/connect/clients/client-side-caching.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision a quick start with the tabbed examples in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mortensi I wanted to check exactly what you want in the tabbed samples. Are you thinking of stuff from the Google doc like:

client.set("hola", "mundo");
client.set("hello", "world");
client.mget("hola", "hello"); // read from the server
client.mget("hola", "hello"); // cache hit
client.mget("hello", "hola"); // read from server, the order matters
.
.

...or are there other use cases you want to have samples for? I know the Google doc has quite a few examples for Python and Jedis but I'm not sure if we need them in the doc page. Maybe we could have a concrete example at the bottom of some of the entries in this list for extra clarity? I'm not sure that having tabbed samples for these in each language is necessary either, because the samples don't actually do anything useful in themselves, so people wouldn't normally copy/paste them.

However, if we do decide that tabbed samples are a good thing here then I'm confident we can get them ready in good time for the release.

Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
description: Server-assisted, client-side caching in Redis
linkTitle: Client-side caching
title: Client-side caching introduction
weight: 20
---

*Client-side caching* is a technique to reduce network traffic between
a Redis client and the server. This generally gives better performance.

By default, an [application server](https://en.wikipedia.org/wiki/Application_server)
(which sits between the user app and the database) contacts the
Redis database server through the client library for every read request.
The diagram below shows the flow of communication from the user app,
through the application server to the database and back again:

{{< image filename="images/csc/CSCNoCache.drawio.svg" >}}

When you use CSC, the client library
maintains its own local cache of data items as it retrieves them
from the database. When the same items are needed again, the client
can satisfy the read requests from the cache instead of the database:

{{< image filename="images/csc/CSCWithCache.drawio.svg" >}}

Accessing the cache is much faster than communicating with the database over the
network. Also, this technique reduces the load on the database server, so you may
be able to run it using fewer nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fewer resources? In Enterprise and CE there are different approaches to address scalability (more shards, more nodes). Also, we can mention that CSC achieves lowering bandwidth consumption and expenses


As with other forms of [caching](https://en.wikipedia.org/wiki/Cache_(computing)),
CSC works well in the very common use case where a small subset of the data
gets accessed much more frequently than the rest of the data (according
to the [Pareto principle](https://en.wikipedia.org/wiki/Pareto_principle)).

## Updating the cache when the data changes

All caching systems must implement a scheme to update data in the cache
when the corresponding data changes in the main database. Redis uses an
approach called *tracking*.

When CSC is enabled, the Redis server remembers or *tracks* the set of keys
that each client connection has previously read. This includes cases where the client
reads data directly, as with the [`GET`]({{< relref "/commands/get" >}})
command, and also where the server calculates values from the stored data,
as with [`STRLEN`]({{< relref "/commands/strlen" >}}). When any client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph seems a bit long. Maybe try to find a way to break it up?

writes new data to a tracked key, the server sends an invalidation message
to all clients that have accessed that key previously. This message warns
the clients that their cached copies of the data are no longer valid and the clients
will evict the stale data in response. Next time a client reads from
the same key, it will access the database directly and refresh its cache
with the updated data.

The sequence diagram below shows how two clients might interact as they
access and update the same key:

{{< image filename="images/csc/CSCSeqDiagram.drawio.svg" >}}

## Which commands can cache data?

All read-only commands (with the `@read`
[ACL category]({{< relref "/operate/oss_and_stack/management/security/acl" >}}))
will use cached data, except for the following:

- Any commands for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to read if you indent the explanations under the bullet point for the command type.

[probabilistic data types]({{< relref "/develop/data-types/probabilistic" >}}).
These types are designed to be updated frequently, which means that caching
them gives little or no benefit.
- Non-deterministic commands such as [`HSCAN`]({{< relref "/commands/hscan" >}})
and [`ZRANDMEMBER`]({{< relref "/commands/zrandmember" >}}). By design, these commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mention HGETALL, being a popular one. @uglide can you validate this?

give different results each time they are called.
- Search and query commands (with the `FT.*` prefix), such as
[`FT.SEARCH`]({{< baseurl >}}/commands/ft.search).

You can use the [`MONITOR`]({{< relref "/commands/monitor" >}}) command to
check the server's behavior when you are using CSC. Because `MONITOR` only
reports activity from the server, you should find that the first cacheable
access to a key causes a response from the server. However, subsequent
accesses are satisfied by the cache, and so `MONITOR` should report no
server activity if CSC is working correctly.

## What data gets cached for a command?

Broadly speaking, the data from the *specific response* to a command invocation
gets cached after it is used for the first time. Subsets of that data
or values calculated from it are retrieved from the server as usual and
then cached separately. For example:

- The whole string retrieved by [`GET`]({{< relref "/commands/get" >}})
is added to the cache. Parts of the same string retrieved by
[`SUBSTR`]({{< relref "/commands/substr" >}}) are calculated on the
server the first time and then cached separately from the original
string.
- Using [`GETBIT`]({{< relref "/commands/getbit" >}}) or
[`BITFIELD`]({{< relref "/commands/bitfield" >}}) on a string
caches the returned values separately from the original string.
- For composite data types accessed by keys
([hash]({{< relref "/develop/data-types/hashes" >}}),
[JSON]({{< relref "/develop/data-types/json" >}}),
[set]({{< relref "/develop/data-types/sets" >}}), and
[sorted set]({{< relref "/develop/data-types/sorted-sets" >}})),
the whole object is cached separately from the individual fields.
So the results of `HGETALL mykey` and `HGET mykey myfield` create
separate entries in the cache.
- Ranges from [lists]({{< relref "/develop/data-types/lists" >}}),
[streams]({{< relref "/develop/data-types/streams" >}}),
and [sorted sets]({{< relref "/develop/data-types/sorted-sets" >}})
are cached separately from the object they form a part of. Likewise,
subsets returned by [`SINTER`]({{< relref "/commands/sinter" >}}) and
[`SDIFF`]({{< relref "/commands/sdiff" >}}) create separate cache entries.
- For multi-key read commands such as [`MGET`]({{< relref "/commands/mget" >}}),
the ordering of the keys is significant. For example `MGET name:1 name:2` is
cached separately from `MGET name:2 name:1` because the server returns the
values in the order you specify.
- Boolean or numeric values calculated from data types (for example
[`SISMEMBER`]({{< relref "/commands/sismember" >}})) and
[`LLEN`]({{< relref "/commands/llen" >}}) are cached separately from the
object they refer to.


## Enabling CSC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required because the client library enables the tracking at the connection establishment time. So it is necessary to just configure the connection


Use the [`CLIENT TRACKING`]({{< relref "/commands/client-tracking" >}})
command to enable CSC from [`redis-cli`]({{< relref "/develop/connect/cli" >}}).
You can also enable CSC when you connect to a server from one of the Redis
[client libraries]({{< relref "/develop/connect/clients" >}}).

## Usage recommendations

Like any caching system, CSC has some limitations:

- The cache has only a limited amount of memory available. When the limit
is reached, the client must *evict* potentially useful items from the
cache to make room for new ones.
- Cache misses, tracking, and invalidation messages always add a slight
performance penalty.

Below are a few guidelines to help you use CSC efficiently, within these
limitations:

- **Use a separate connection for data that is not cache-friendly**:
Caching gives the most benefit
for keys that are read frequently and updated infrequently. However, you
may also have data such as counters and scoreboards that receive frequent
updates. In cases like this, the performance overhead of the invalidation
messages can be greater than the savings made by caching. Avoid this problem
by using a separate connection *without* CSC for any data that is
not cache-friendly.
- **Set a maximum time-to-live (TTL) for cached items**: The client libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTL is not supported in this phase, so it has been removed recently. cc @uglide

let you set a *time-to-live (TTL)* value for items in the cache. This is
the maximum time between cache reads for any particular key. If a key
isn't accessed during its TTL period, it will be deleted from the cache
automatically and the memory that it uses will be freed. Re-using cache
memory like this is more efficient than evicting items when the cache
gets full. The exact TTL value depends on your specific requirements, but
bear in mind that the whole purpose of the cache is to optimize reads for
data that you access frequently. By definition, data that needs a long
TTL isn't accessed frequently.
- **Estimate how many items you can cache**: The client libraries let you
specify the maximum number of items you want to hold in the cache. You
can calculate an estimate for this number by dividing the
maximum desired size of the
cache in memory by the average size of the items you want to store
(use the [`MEMORY USAGE`]({{< relref "/commands/memory-usage" >}})
command to get the memory footprint of a key). So, if you had
10MB (or 10485760 bytes) available for the cache, and the average
size of an item was 80 bytes, you could fit approximately
10485760 / 80 = 131072 items in the cache. Monitor memory usage
on your server with a realistic test load to adjust your estimate
up or down.

## Reference

The Redis server implements extra features for CSC that are not used by
the main Redis clients, but may be useful for custom clients and other
advanced applications. See
[Client-side caching reference]({{< relref "/develop/reference/client-side-caching" >}})
for a full technical guide to all the options available for CSC.
62 changes: 62 additions & 0 deletions content/develop/connect/clients/python/redis-py.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a great and simple addition. The example will be validated when the clients are stable (TTL has been removed). Per the client library example, we may propose the following examples.

  1. Connection instantiation and configuration
  2. Pool instantiation and configuration
  3. Three example with string, hash, and JSON
  4. Cache flush
  5. Remove one entry from the cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mortensi For 1) and 2): Are the pool and cluster connections any different, aside from protocol: 3 and the cache_xxx parameters? I've mentioned that you can use the same parameters with all of the different connection variants. I'd prefer not to have all combinations of all the connection options, but if you think there is something significantly different about this then I'll certainly add it.

For 3): There's already an example for string, just to give a simple demonstration of how you can see the cache working using MONITOR. I'm not sure it would help to have hash and JSON examples here, given that they are the same as the usual commands from a coding point of view (again, if I'm wrong about this then please let me know!) There is already a link to the CSC intro, where I discuss the way different datatypes/commands affect the cache (eg, hash fields cached separately from the whole object).

For 4) and 5): Yes, definitely - if there are other cache commands then we should document them. I was also thinking that the options for supplying your own cache implementation and selecting an eviction policy should be documented somewhere. Maybe in a "advanced" page or an advanced section in the intro?

Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,68 @@ r.get('foo')
```
For more information, see [redis-py TLS examples](https://redis-py.readthedocs.io/en/stable/examples/ssl_connection_examples.html).

## Connect using client-side caching (CSC)

*Client-side caching* is a technique to reduce network traffic between
the client and server, resulting in better performance. See
[Client-side caching introduction]({{< relref "/develop/connect/clients/client-side-caching" >}})
for more information about how CSC works and how to use it effectively.

To enable CSC, you simply need to add a few extra parameters when you connect
to the server:

- `protocol`: (Required) You must pass a value of `3` here because
CSC requires the [RESP3]({{< relref "/develop/reference/protocol-spec#resp-versions" >}})
protocol.
- `cache_enabled`: (Required) Pass a `True` value here to enable CSC
- `cache_ttl`: (Recommended) Time-to-live (TTL) between reads for items in the cache,
measured in seconds. This defaults to zero (indicating that items in the cache never expire)
but it is strongly recommended that you choose a realistic TTL for
your needs. See
[Usage recommendations]({{< relref "/develop/connect/clients/client-side-caching#usage-recommendations" >}})
for more information.

The example below shows the simplest CSC connection to the default host and port,
`localhost:6379`.
All of the connection variants described above accept these parameters, so you can
use CSC with a connection pool or a cluster connection in exactly the same way.

```python
r = redis.Redis(
protocol=3,
cache_enabled=True,
cache_ttl=180,
decode_responses=True
)

r.set("city", "New York")
cityNameAttempt1 = r.get("city") # Retrieved from Redis server and cached
cityNameAttempt2 = r.get("city") # Retrieved from cache
```

You can see the cache working if you connect to the same Redis database
with [`redis-cli`]({{< relref "/develop/connect/cli" >}}) and run the
[`MONITOR`]({{< relref "/commands/monitor" >}}) command. If you run the
code above with the `cache_enabled` line commented out, you should see
the following in the CLI among the output from `MONITOR`:

```
1723109720.268903 [...] "SET" "city" "New York"
1723109720.269681 [...] "GET" "city"
1723109720.270205 [...] "GET" "city"
```

This shows that the server responds to both `get("city")` calls.
If you the code again with `cache_enabled` uncommented, you will see

```
1723110248.712663 [...] "SET" "city" "New York"
1723110248.713607 [...] "GET" "city"
```

This shows that the first `get("city")` call contacted the server but the second
call was satisfied by the cache.

## Example: Indexing and querying JSON documents

Make sure that you have Redis Stack and `redis-py` installed. Import dependencies:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,17 @@ description: 'Server-assisted, client-side caching in Redis

'
linkTitle: Client-side caching
title: Client-side caching in Redis
title: Client-side caching reference
aliases: /develop/use/client-side-caching/
weight: 2
---

{{<note>}}This document is intended as an in-depth reference for
client-side caching. See
[Client-side caching introduction]({{< relref "/develop/connect/clients/client-side-caching" >}})
for general usage guidelines.
{{</note>}}

Client-side caching is a technique used to create high performance services.
It exploits the memory available on application servers, servers that are
usually distinct computers compared to the database nodes, to store some subset
Expand Down
4 changes: 4 additions & 0 deletions static/images/csc/CSCNoCache.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions static/images/csc/CSCSeqDiagram.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions static/images/csc/CSCWithCache.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.