Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 12 additions & 179 deletions content/develop/clients/redis-py/prob.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,47 +99,16 @@ add. The following example adds some names to a Bloom filter representing
a list of users and checks for the presence or absence of users in the list.
Note that you must use the `bf()` method to access the Bloom filter commands.

```py
res1 = r.bf().madd("recorded_users", "andy", "cameron", "david", "michelle")
print(res1) # >>> [1, 1, 1, 1]

res2 = r.bf().exists("recorded_users", "cameron")
print(res2) # >>> 1

res3 = r.bf().exists("recorded_users", "kaitlyn")
print(res3) # >>> 0
```

<!-- < clients-example home_prob_dts bloom Python >}}
< /clients-example >}} -->
{{< clients-example home_prob_dts bloom Python >}}
{{< /clients-example >}}

A Cuckoo filter has similar features to a Bloom filter, but also supports
a deletion operation to remove hashes from a set, as shown in the example
below. Note that you must use the `cf()` method to access the Cuckoo filter
commands.

```py
res4 = r.cf().add("other_users", "paolo")
print(res4) # >>> 1

res5 = r.cf().add("other_users", "kaitlyn")
print(res5) # >>> 1

res6 = r.cf().add("other_users", "rachel")
print(res6) # >>> 1

res7 = r.cf().mexists("other_users", "paolo", "rachel", "andy")
print(res7) # >>> [1, 1, 0]

res8 = r.cf().delete("other_users", "paolo")
print(res8) # >>> 1

res9 = r.cf().exists("other_users", "paolo")
print(res9) # >>> 0
```

<!-- < clients-example home_prob_dts cuckoo Python >}}
< /clients-example >}} -->
{{< clients-example home_prob_dts cuckoo Python >}}
{{< /clients-example >}}

Which of these two data types you choose depends on your use case.
Bloom filters are generally faster than Cuckoo filters when adding new items,
Expand All @@ -159,28 +128,8 @@ You can also merge two or more HyperLogLogs to find the cardinality of the
[union](https://en.wikipedia.org/wiki/Union_(set_theory)) of the sets they
represent.

```py
res10 = r.pfadd("group:1", "andy", "cameron", "david")
print(res10) # >>> 1

res11 = r.pfcount("group:1")
print(res11) # >>> 3

res12 = r.pfadd("group:2", "kaitlyn", "michelle", "paolo", "rachel")
print(res12) # >>> 1

res13 = r.pfcount("group:2")
print(res13) # >>> 4

res14 = r.pfmerge("both_groups", "group:1", "group:2")
print(res14) # >>> True

res15 = r.pfcount("both_groups")
print(res15) # >>> 7
```

<!-- < clients-example home_prob_dts hyperloglog Python >}}
< /clients-example >}} -->
{{< clients-example home_prob_dts hyperloglog Python >}}
{{< /clients-example >}}

The main benefit that HyperLogLogs offer is their very low
memory usage. They can count up to 2^64 items with less than
Expand Down Expand Up @@ -220,36 +169,8 @@ a Count-min sketch object, add data to it, and then query it.
Note that you must use the `cms()` method to access the Count-min
sketch commands.

```py
# Specify that you want to keep the counts within 0.01
# (1%) of the true value with a 0.005 (0.5%) chance
# of going outside this limit.
res16 = r.cms().initbyprob("items_sold", 0.01, 0.005)
print(res16) # >>> True

# The parameters for `incrby()` are two lists. The count
# for each item in the first list is incremented by the
# value at the same index in the second list.
res17 = r.cms().incrby(
"items_sold",
["bread", "tea", "coffee", "beer"], # Items sold
[300, 200, 200, 100]
)
print(res17) # >>> [300, 200, 200, 100]

res18 = r.cms().incrby(
"items_sold",
["bread", "coffee"],
[100, 150]
)
print(res18) # >>> [400, 350]

res19 = r.cms().query("items_sold", "bread", "tea", "coffee", "beer")
print(res19) # >>> [400, 200, 350, 100]
```

<!-- < clients-example home_prob_dts cms Python >}}
< /clients-example >}} -->
{{< clients-example home_prob_dts cms Python >}}
{{< /clients-example >}}

The advantage of using a CMS over keeping an exact count with a
[sorted set]({{< relref "/develop/data-types/sorted-sets" >}})
Expand Down Expand Up @@ -281,53 +202,8 @@ shows how to merge two or more t-digest objects to query the combined
data set. Note that you must use the `tdigest()` method to access the
t-digest commands.

```py
res20 = r.tdigest().create("male_heights")
print(res20) # >>> True

res21 = r.tdigest().add(
"male_heights",
[175.5, 181, 160.8, 152, 177, 196, 164]
)
print(res21) # >>> OK

res22 = r.tdigest().min("male_heights")
print(res22) # >>> 152.0

res23 = r.tdigest().max("male_heights")
print(res23) # >>> 196.0

res24 = r.tdigest().quantile("male_heights", 0.75)
print(res24) # >>> 181

# Note that the CDF value for 181 is not exactly
# 0.75. Both values are estimates.
res25 = r.tdigest().cdf("male_heights", 181)
print(res25) # >>> [0.7857142857142857]

res26 = r.tdigest().create("female_heights")
print(res26) # >>> True

res27 = r.tdigest().add(
"female_heights",
[155.5, 161, 168.5, 170, 157.5, 163, 171]
)
print(res27) # >>> OK

res28 = r.tdigest().quantile("female_heights", 0.75)
print(res28) # >>> [170]

res29 = r.tdigest().merge(
"all_heights", 2, "male_heights", "female_heights"
)
print(res29) # >>> OK

res30 = r.tdigest().quantile("all_heights", 0.75)
print(res30) # >>> [175.5]
```

<!-- < clients-example home_prob_dts tdigest Python >}}
< /clients-example >}} -->
{{< clients-example home_prob_dts tdigest Python >}}
{{< /clients-example >}}

A t-digest object also supports several other related commands, such
as querying by rank. See the
Expand All @@ -349,48 +225,5 @@ top *k* items and query whether or not a given item is in the
list. Note that you must use the `topk()` method to access the
Top-K commands.

```py
# The `reserve()` method creates the Top-K object with
# the given key. The parameters are the number of items
# in the ranking and values for `width`, `depth`, and
# `decay`, described in the Top-K reference page.
res31 = r.topk().reserve("top_3_songs", 3, 7, 8, 0.9)
print(res31) # >>> True

# The parameters for `incrby()` are two lists. The count
# for each item in the first list is incremented by the
# value at the same index in the second list.
res32 = r.topk().incrby(
"top_3_songs",
[
"Starfish Trooper",
"Only one more time",
"Rock me, Handel",
"How will anyone know?",
"Average lover",
"Road to everywhere"
],
[
3000,
1850,
1325,
3890,
4098,
770
]
)
print(res32)
# >>> [None, None, None, 'Rock me, Handel', 'Only one more time', None]

res33 = r.topk().list("top_3_songs")
print(res33)
# >>> ['Average lover', 'How will anyone know?', 'Starfish Trooper']

res34 = r.topk().query(
"top_3_songs", "Starfish Trooper", "Road to everywhere"
)
print(res34) # >>> [1, 0]
```

<!-- < clients-example home_prob_dts topk Python >}}
< /clients-example >}} -->
{{< clients-example home_prob_dts topk Python >}}
{{< /clients-example >}}