@@ -2,28 +2,12 @@ defmodule GenStage.PartitionDispatcher do
2
2
@ moduledoc """
3
3
A dispatcher that sends events according to partitions.
4
4
5
- This dispatcher assumes that partitions are evenly distributed.
6
- If the data is uneven for long periods of time, then you may
7
- buffer excessive data from busy partitions for long periods of
8
- time. This happens because the producer is unable to distinguish
9
- from which particular consumer/partition demand arrives.
10
-
11
- Let's see an example. Imagine you have three consumers/partitions:
12
- A, B, and C. Let's assume 60% of the data goes to A, 20% to B, and
13
- 20% to C. Let's also say `max_demand` is 10 and `min_demand` is 5.
14
- When they initially request data, 10 events for each, A will receive
15
- 18 while B and C receive 6 each. After processing 5 events (min demand),
16
- they request additional 5 events each, which at this point will be 9
17
- additional elements for A, and 3 additional elements for B and C.
18
- At the end of these two rounds, we will have:
19
-
20
- A = 18 - 5 + 9 = 22 events
21
- B = 6 - 5 + 3 = 4 events
22
- C = 6 - 5 + 3 = 4 events
5
+ This dispatcher assumes that partitions are *evenly distributed*.
6
+ See the ["Even distribution"](#module-even-distribution) section for
7
+ more information.
23
8
24
- Furthermore, as B and C request more items, A will only go further
25
- behind. This behaviour is fine for spikes that should quickly
26
- resolve, but it can be problematic if the data is consistently uneven.
9
+ When multiple consumers subscribe to one partition, the producer
10
+ behaves like a `GenStage.DemandDispatcher` *within that partition*.
27
11
28
12
## Options
29
13
@@ -35,14 +19,14 @@ defmodule GenStage.PartitionDispatcher do
35
19
is named from 0 up to `integer - 1`. For example, `partitions: 4`
36
20
will contain four partitions named `0`, `1`, `2` and `3`.
37
21
38
- It may also be an *enumerable* that specifies the name of every partition.
22
+ It may also be an *enumerable* that specifies the name of each partition.
39
23
For instance, `partitions: [:odd, :even]` will build two partitions,
40
24
named `:odd` and `:even`.
41
25
42
26
* `:hash` - the hashing algorithm. It's a function of type
43
27
`t:hash_function/0`, which receives the event and returns a tuple with two
44
- elements, the event to be dispatched as first argument and the partition
45
- as second. The function can also return `:none`, in which case the event
28
+ elements: the event to be dispatched and the partition to dispatch it to.
29
+ The function can also return `:none`, in which case the event
46
30
is discarded. The partition must be one of the partitions specified in
47
31
`:partitions` above. The default uses:
48
32
@@ -76,6 +60,35 @@ defmodule GenStage.PartitionDispatcher do
76
60
77
61
GenStage.sync_subscribe(consumer, to: producer, partition: 0)
78
62
63
+ ## Even distribution
64
+
65
+ This dispatcher assumes that partitions are *evenly distributed*.
66
+ If the data is uneven for long periods of time, then you may
67
+ buffer excessive data from busy partitions for long periods of
68
+ time. This happens because the producer is unable to distinguish
69
+ from which particular consumer/partition demand arrives.
70
+
71
+ Let's see an example. Imagine you have three consumers, each
72
+ for one partition: `A`, `B`, and `C`.
73
+
74
+ Let's assume 60% of the data goes to `A`, 20% to `B`, and 20% to
75
+ `C`. Let's also say that `max_demand` is `10` and `min_demand` is
76
+ `5`. When the consumers initially request data (`10` events each),
77
+ the producer receives a total demand of `30`. A will receive `18` of
78
+ those (60%), while `B` and `C` receive `6` each (20%). After
79
+ processing `5` events (the `min_demand`), each consumer requests
80
+ additional `5` events, for a total of `15` additional events. At
81
+ this point, that will be `9` additional elements for A, and 3
82
+ additional elements for B and C. At the end of these two rounds, we
83
+ will have:
84
+
85
+ A = 18 - 5 + 9 = 22 events
86
+ B = 6 - 5 + 3 = 4 events
87
+ C = 6 - 5 + 3 = 4 events
88
+
89
+ Furthermore, as B and C request more items, A will only go further
90
+ behind. This behaviour is fine for spikes that should quickly
91
+ resolve, but it can be problematic if the data is consistently uneven.
79
92
"""
80
93
81
94
@ typedoc """
0 commit comments