@@ -99,16 +99,45 @@ add. The following example adds some names to a Bloom filter representing
99
99
a list of users and checks for the presence or absence of users in the list.
100
100
Note that you must use the ` bf() ` method to access the Bloom filter commands.
101
101
102
- {{< clients-example home_prob_dts bloom Python >}}
103
- {{< /clients-example >}}
102
+ ``` py
103
+ res1 = r.bf().madd(" recorded_users" , " andy" , " cameron" , " david" , " michelle" )
104
+ print (res1) # >>> [1, 1, 1, 1]
105
+
106
+ res2 = r.bf().exists(" recorded_users" , " cameron" )
107
+ print (res2) # >>> 1
108
+
109
+ res3 = r.bf().exists(" recorded_users" , " kaitlyn" )
110
+ print (res3) # >>> 0
111
+ ```
112
+ <!-- < clients-example home_prob_dts bloom Python >}}
113
+ < /clients-example >}} -->
104
114
105
115
A Cuckoo filter has similar features to a Bloom filter, but also supports
106
116
a deletion operation to remove hashes from a set, as shown in the example
107
117
below. Note that you must use the ` cf() ` method to access the Cuckoo filter
108
118
commands.
109
119
110
- {{< clients-example home_prob_dts cuckoo Python >}}
111
- {{< /clients-example >}}
120
+ ``` py
121
+ res4 = r.cf().add(" other_users" , " paolo" )
122
+ print (res4) # >>> 1
123
+
124
+ res5 = r.cf().add(" other_users" , " kaitlyn" )
125
+ print (res5) # >>> 1
126
+
127
+ res6 = r.cf().add(" other_users" , " rachel" )
128
+ print (res6) # >>> 1
129
+
130
+ res7 = r.cf().mexists(" other_users" , " paolo" , " rachel" , " andy" )
131
+ print (res7) # >>> [1, 1, 0]
132
+
133
+ res8 = r.cf().delete(" other_users" , " paolo" )
134
+ print (res8) # >>> 1
135
+
136
+ res9 = r.cf().exists(" other_users" , " paolo" )
137
+ print (res9) # >>> 0
138
+ ```
139
+ <!-- < clients-example home_prob_dts cuckoo Python >}}
140
+ < /clients-example >}} -->
112
141
113
142
Which of these two data types you choose depends on your use case.
114
143
Bloom filters are generally faster than Cuckoo filters when adding new items,
@@ -128,8 +157,27 @@ You can also merge two or more HyperLogLogs to find the cardinality of the
128
157
[ union] ( https://en.wikipedia.org/wiki/Union_(set_theory) ) of the sets they
129
158
represent.
130
159
131
- {{< clients-example home_prob_dts hyperloglog Python >}}
132
- {{< /clients-example >}}
160
+ ``` py
161
+ res10 = r.pfadd(" group:1" , " andy" , " cameron" , " david" )
162
+ print (res10) # >>> 1
163
+
164
+ res11 = r.pfcount(" group:1" )
165
+ print (res11) # >>> 3
166
+
167
+ res12 = r.pfadd(" group:2" , " kaitlyn" , " michelle" , " paolo" , " rachel" )
168
+ print (res12) # >>> 1
169
+
170
+ res13 = r.pfcount(" group:2" )
171
+ print (res13) # >>> 4
172
+
173
+ res14 = r.pfmerge(" both_groups" , " group:1" , " group:2" )
174
+ print (res14) # >>> True
175
+
176
+ res15 = r.pfcount(" both_groups" )
177
+ print (res15) # >>> 7
178
+ ```
179
+ <!-- < clients-example home_prob_dts hyperloglog Python >}}
180
+ < /clients-example >}} -->
133
181
134
182
The main benefit that HyperLogLogs offer is their very low
135
183
memory usage. They can count up to 2^64 items with less than
@@ -169,8 +217,35 @@ a Count-min sketch object, add data to it, and then query it.
169
217
Note that you must use the ` cms() ` method to access the Count-min
170
218
sketch commands.
171
219
172
- {{< clients-example home_prob_dts cms Python >}}
173
- {{< /clients-example >}}
220
+ ``` py
221
+ # Specify that you want to keep the counts within 0.01
222
+ # (1%) of the true value with a 0.005 (0.5%) chance
223
+ # of going outside this limit.
224
+ res16 = r.cms().initbyprob(" items_sold" , 0.01 , 0.005 )
225
+ print (res16) # >>> True
226
+
227
+ # The parameters for `incrby()` are two lists. The count
228
+ # for each item in the first list is incremented by the
229
+ # value at the same index in the second list.
230
+ res17 = r.cms().incrby(
231
+ " items_sold" ,
232
+ [" bread" , " tea" , " coffee" , " beer" ], # Items sold
233
+ [300 , 200 , 200 , 100 ]
234
+ )
235
+ print (res17) # >>> [300, 200, 200, 100]
236
+
237
+ res18 = r.cms().incrby(
238
+ " items_sold" ,
239
+ [" bread" , " coffee" ],
240
+ [100 , 150 ]
241
+ )
242
+ print (res18) # >>> [400, 350]
243
+
244
+ res19 = r.cms().query(" items_sold" , " bread" , " tea" , " coffee" , " beer" )
245
+ print (res19) # >>> [400, 200, 350, 100]
246
+ ```
247
+ <!-- < clients-example home_prob_dts cms Python >}}
248
+ < /clients-example >}} -->
174
249
175
250
The advantage of using a CMS over keeping an exact count with a
176
251
[ sorted set] ({{< relref "/develop/data-types/sorted-sets" >}})
@@ -202,8 +277,52 @@ shows how to merge two or more t-digest objects to query the combined
202
277
data set. Note that you must use the ` tdigest() ` method to access the
203
278
t-digest commands.
204
279
205
- {{< clients-example home_prob_dts tdigest Python >}}
206
- {{< /clients-example >}}
280
+ ``` py
281
+ res20 = r.tdigest().create(" male_heights" )
282
+ print (res20) # >>> True
283
+
284
+ res21 = r.tdigest().add(
285
+ " male_heights" ,
286
+ [175.5 , 181 , 160.8 , 152 , 177 , 196 , 164 ]
287
+ )
288
+ print (res21) # >>> OK
289
+
290
+ res22 = r.tdigest().min(" male_heights" )
291
+ print (res22) # >>> 152.0
292
+
293
+ res23 = r.tdigest().max(" male_heights" )
294
+ print (res23) # >>> 196.0
295
+
296
+ res24 = r.tdigest().quantile(" male_heights" , 0.75 )
297
+ print (res24) # >>> 181
298
+
299
+ # Note that the CDF value for 181 is not exactly
300
+ # 0.75. Both values are estimates.
301
+ res25 = r.tdigest().cdf(" male_heights" , 181 )
302
+ print (res25) # >>> [0.7857142857142857]
303
+
304
+ res26 = r.tdigest().create(" female_heights" )
305
+ print (res26) # >>> True
306
+
307
+ res27 = r.tdigest().add(
308
+ " female_heights" ,
309
+ [155.5 , 161 , 168.5 , 170 , 157.5 , 163 , 171 ]
310
+ )
311
+ print (res27) # >>> OK
312
+
313
+ res28 = r.tdigest().quantile(" female_heights" , 0.75 )
314
+ print (res28) # >>> [170]
315
+
316
+ res29 = r.tdigest().merge(
317
+ " all_heights" , 2 , " male_heights" , " female_heights"
318
+ )
319
+ print (res29) # >>> OK
320
+
321
+ res30 = r.tdigest().quantile(" all_heights" , 0.75 )
322
+ print (res30) # >>> [175.5]
323
+ ```
324
+ <!-- < clients-example home_prob_dts tdigest Python >}}
325
+ < /clients-example >}} -->
207
326
208
327
A t-digest object also supports several other related commands, such
209
328
as querying by rank. See the
@@ -225,5 +344,47 @@ top *k* items and query whether or not a given item is in the
225
344
list. Note that you must use the ` topk() ` method to access the
226
345
Top-K commands.
227
346
228
- {{< clients-example home_prob_dts topk Python >}}
229
- {{< /clients-example >}}
347
+ ``` py
348
+ # The `reserve()` method creates the Top-K object with
349
+ # the given key. The parameters are the number of items
350
+ # in the ranking and values for `width`, `depth`, and
351
+ # `decay`, described in the Top-K reference page.
352
+ res31 = r.topk().reserve(" top_3_songs" , 3 , 7 , 8 , 0.9 )
353
+ print (res31) # >>> True
354
+
355
+ # The parameters for `incrby()` are two lists. The count
356
+ # for each item in the first list is incremented by the
357
+ # value at the same index in the second list.
358
+ res32 = r.topk().incrby(
359
+ " top_3_songs" ,
360
+ [
361
+ " Starfish Trooper" ,
362
+ " Only one more time" ,
363
+ " Rock me, Handel" ,
364
+ " How will anyone know?" ,
365
+ " Average lover" ,
366
+ " Road to everywhere"
367
+ ],
368
+ [
369
+ 3000 ,
370
+ 1850 ,
371
+ 1325 ,
372
+ 3890 ,
373
+ 4098 ,
374
+ 770
375
+ ]
376
+ )
377
+ print (res32)
378
+ # >>> [None, None, None, 'Rock me, Handel', 'Only one more time', None]
379
+
380
+ res33 = r.topk().list(" top_3_songs" )
381
+ print (res33)
382
+ # >>> ['Average lover', 'How will anyone know?', 'Starfish Trooper']
383
+
384
+ res34 = r.topk().query(
385
+ " top_3_songs" , " Starfish Trooper" , " Road to everywhere"
386
+ )
387
+ print (res34) # >>> [1, 0]
388
+ ```
389
+ <!-- < clients-example home_prob_dts topk Python >}}
390
+ < /clients-example >}} -->
0 commit comments