@@ -23,7 +23,7 @@ The types fall into two basic categories:
2323- [ Set operations] ( #set-operations ) : These types let you calculate (approximately)
2424 the number of items in a set of distinct values, and whether or not a given value is
2525 a member of a set.
26- - [ Numeric data calculations ] ( #numeric-data ) : These types give you an approximation of
26+ - [ Statistics ] ( #statistics ) : These types give you an approximation of
2727 statistics such as the percentile, rank, and frequency of numeric data points in a list.
2828
2929To see why these approximate calculations would be useful, consider the task of
@@ -97,6 +97,7 @@ add. The following example adds some names to a Bloom filter representing
9797a list of users and checks for the presence or absence of users in the list.
9898Note that you must use the ` bf() ` method to access the Bloom filter commands.
9999
100+ <!--
100101```py
101102res1 = r.bf().madd("recorded_users", "andy", "cameron", "david", "michelle")
102103print(res1) # >>> [1, 1, 1, 1]
@@ -107,12 +108,16 @@ print(res2) # >>> 1
107108res3 = r.bf().exists("recorded_users", "kaitlyn")
108109print(res3) # >>> 0
109110```
111+ -->
112+ {{< clients-example home_prob_dts bloom Python >}}
113+ {{< /clients-example >}}
110114
111115A Cuckoo filter has similar features to a Bloom filter, but also supports
112116a deletion operation to remove hashes from a set, as shown in the example
113117below. Note that you must use the ` cf() ` method to access the Cuckoo filter
114118commands.
115119
120+ <!--
116121```py
117122res4 = r.cf().add("other_users", "paolo")
118123print(res4) # >>> 1
@@ -132,6 +137,9 @@ print(res8)
132137res9 = r.cf().exists("other_users", "paolo")
133138print(res9) # >>> 0
134139```
140+ -->
141+ {{< clients-example home_prob_dts cuckoo Python >}}
142+ {{< /clients-example >}}
135143
136144Which of these two data types you choose depends on your use case.
137145Bloom filters are generally faster than Cuckoo filters when adding new items,
@@ -143,11 +151,14 @@ reference pages for more information and comparison between the two types.
143151
144152### Set cardinality
145153
146- A HyperLogLog object doesn't support the set membership operation but
147- instead is specialized to calculate the cardinality of the set. You can
148- also merge two or more HyperLogLogs to find the cardinality of the
154+ A [ HyperLogLog] ({{< relref "/develop/data-types/probabilistic/hyperloglogs" >}})
155+ object calculates the cardinality of a set. As you add
156+ items, the HyperLogLog tracks the number of distinct set members but
157+ doesn't let you retrieve them or query which items have been added.
158+ You can also merge two or more HyperLogLogs to find the cardinality of the
149159union of the sets they represent.
150160
161+ <!--
151162```py
152163res10 = r.pfadd("group:1", "andy", "cameron", "david")
153164print(res10) # >>> 1
@@ -167,19 +178,50 @@ print(res14) # >>> True
167178res15 = r.pfcount("both_groups")
168179print(res15) # >>> 7
169180```
181+ -->
182+ {{< clients-example home_prob_dts hyperloglog Python >}}
183+ {{< /clients-example >}}
170184
171185The main benefit that HyperLogLogs offer is their very low
172186memory usage. They can count up to 2^64 items with less than
173- 1% standard error using a maximum 12KB of memory.
187+ 1% standard error using a maximum 12KB of memory. This makes
188+ them very useful for counting things like the total of distinct
189+ IP addresses that access a website or the total of distinct
190+ bank card numbers that make purchases within a day.
174191
175- ## Numeric data
192+ ## Statistics
176193
177194Redis supports several approximate statistical calculations
178195on numeric data sets:
179196
180- - Frequency: The Count-min sketch data type lets you find the
181- approximate frequency of a labeled item in a data stream.
197+ - [ Frequency] ( #frequency ) : The Count-min sketch data type lets you
198+ find the approximate frequency of a labeled item in a data stream.
182199- Percentiles: The t-digest data type estimates the percentile
183200 of a supplied value in a data stream.
184201- Ranking: The Top-K data type estimates the ranking of items
185202 by frequency in a data stream.
203+
204+ ### Frequency
205+
206+ A [ Count-min sketch] ({{< relref "/develop/data-types/probabilistic/count-min-sketch" >}})
207+ (CMS) object keeps count of a set of related items represented by
208+ string labels. The count is approximate, but you can specify
209+ how close you want to keep the count to the true value (as a fraction)
210+ and the acceptable probability of failing to keep it in this
211+ desired range. For example, you can request that the count should
212+ stay within 0.1% of the true value and have a 0.05% probability
213+ of going outside this limit.
214+
215+ {{< clients-example home_prob_dts cms Python >}}
216+ {{< /clients-example >}}
217+
218+ The advantage of using a CMS over keeping an exact count with a
219+ [ sorted set] ({{< relref "/develop/data-types/sorted-sets" >}})
220+ is that that a CMS has very low and fixed memory usage, even for
221+ large numbers of items. Use CMS objects to keep daily counts of
222+ items sold, accesses to individual web pages on your site, and
223+ other similar statistics.
224+
225+ ### Percentiles
226+
227+
0 commit comments