Strange weighted round robin behaviour with special configuration of weights

# Description

**tl;dr**: With the use of a test script, we can observe a momentary dip in traffic down to zero due to the weighted round robin algorithm.

## Setup

I am attempting to reproduce this behaviour using a test lua script to investigate the behaviour of the [weighted round robin algorithm (WRR)](https://github.com/openresty/lua-resty-balancer/blob/master/lib/resty/roundrobin.lua). The ingress_nginx balancer utilizes round robin (RR) as it's [default load balancing algorithm](https://github.com/kubernetes/ingress-nginx/blob/61fcca3a3a7c719d8d49c06cc3330431a255b51c/rootfs/etc/nginx/lua/balancer.lua#L27) which is the default implementation provided by [openresty](https://github.com/kubernetes/ingress-nginx/blob/main/rootfs/etc/nginx/lua/balancer/round_robin.lua#L2).

#### Weights Configuration

```lua
-- config 1
local NODES = {}
NODES["10.0.0.1"] = 100
NODES["10.0.0.2"] = 100
NODES["10.0.0.3"] = 25
```

With this, the next step is to use the WRR algorithm to pick a node for a number of cycles to investigate the distribution of nodes. We can repeat this with the second set of weight configurations to observe the distribution of nodes with the WRR algorithm.

```lua
-- config 2
local NODES = {}
NODES["10.0.0.1"] = 100
NODES["10.0.0.2"] = 100
NODES["10.0.0.3"] = 66
```

## Observations

For this test I am using a `lua` script to explicitly call the WRR algorithm for **300** cycles. The distribution of nodes for the first weight configuration is constant with little variations, this is expected. The overall distribution of *traffic* follows the weight configuration.

![image](https://user-images.githubusercontent.com/8213048/159390503-074a6503-0bbf-4b63-b7bd-d236247a0a62.png)

For the next weight configuration, we can observe that although the overall distribution of traffic adheres to the relative weights, the distribution of *traffic* is not constant. The node `10.0.0.3` does not get picked by WRR for some time and then the algorithm picks each node as if they were equal in weight. We can also see that this pattern repeats. After sometime, again node `10.0.0.3` is not picked by WRR for some time.

![image](https://user-images.githubusercontent.com/8213048/159390600-7252219c-fa55-4e6f-9702-2da31e15b538.png)

We can observe similar results regardless of the number of cycles. The fault in this implementation is more apparent when the number of cycles is less than the amount of cycles required for the algorithm to pick node `10.0.0.3`.

![image](https://user-images.githubusercontent.com/8213048/159390877-6f33f969-c15e-42dc-8091-7e7fffae2502.png)

![image](https://user-images.githubusercontent.com/8213048/159391052-afc52313-17cf-44b5-bab9-5d2a095dd8cf.png)

## Openresty WRR implementation

The openresty implementation is an accurate implementation of WRR. However, this implementation is only accurate for large finite sets of data to distribute *traffic* to weighted nodes. It is not viable for real time data with varying weights. This algorithm is an [Interleaved WRR](https://en.wikipedia.org/wiki/Weighted_round_robin#Interleaved_WRR) implementation which utilizes the [greatest common denominator (GCD)](https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L68) between the weights to [calculate the probability](https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L179) a node is picked [based on it's weight](https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L172).

The algorithm initially starts at a [maximum probability for the `last_node`](https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L163) that was last picked where only nodes with weights >= the weight of the `last_picked` node will qualify for the next pick.

```lua
-- cycle and only pick a node, where node.weights >= cw
local last_id, cw, weight = self.last_id, self.cw
```

The `last_picked` node is [randomized on initialization](https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L62), so during the first pick a random node will be used to represent the `last_picked` node.

On each pick the algorithm infinitely iterates through the set of nodes until a node is picked. On every full iteration of all nodes we [increase the chance a node is picked](https://github.com/openresty/lua-resty-balancer/blob/master/lib/resty/roundrobin.lua#L179) by the `(GCD / MAX_WEIGHT) * 100%`.

```lua
cw = cw - self.gcd
```

For example for the first configuration of nodes, we have `GCD = 25` and `MAX_WIEGHT = 100` so we pick each node in the following order.
```
Number of cycles: 20
Pick: 10.0.0.1, Weight: 100, Pick threshold: 75%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 75%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 50%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 50%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 25%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 25%
Pick: 10.0.0.3, Weight: 25, Pick threshold: 25%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 100%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 100%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 75%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 75%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 50%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 50%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 25%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 25%
Pick: 10.0.0.3, Weight: 25, Pick threshold: 25%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 100%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 100%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 75%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 75%

Distribution...
10.0.0.1: 45%
10.0.0.2: 45%
10.0.0.3: 10%
```

Although node `10.0.0.3` only has a weights of `25` compared to `100` of the other two nodes, the algorithm quickly increases it's chance of being picked (by `25%` every complete cycle). However, for the second configuration of nodes, we have a `GCD = 2` and `MAX_WIEGHT = 100`. This only allows an increase of `2%` per complete cycle of nodes. This results in this pattern, where node `10.0.0.3` is not picked.
```
Number of cycles: 34
Pick: 10.0.0.2, Weight: 100, Pick threshold: 98%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 98%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 96%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 96%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 94%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 94%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 92%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 92%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 90%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 90%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 88%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 88%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 86%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 86%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 84%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 84%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 82%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 82%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 80%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 80%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 78%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 78%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 76%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 76%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 74%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 74%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 72%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 72%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 70%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 70%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 68%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 68%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 66%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 66%

Distribution...
10.0.0.1: 47.058823529412%
10.0.0.2: 50%
10.0.0.3: 2.9411764705882%
```

After the initial pick of node `10.0.0.3`, we can see a uniform distribution for all nodes. After a while, when the probability of pick becomes <= `0%`, we [reset back to the `MAX_WEIGHT`](https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L181) and this pattern repeats.

```
Number of cycles: 200
Pick: 10.0.0.1, Weight: 100, Pick threshold: 98%

...

Pick: 10.0.0.2, Weight: 100, Pick threshold: 70%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 68%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 68%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 66%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 66%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 66%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 64%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 64%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 64%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 62%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 62%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 62%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 60%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 60%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 60%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 58%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 58%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 58%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 56%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 56%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 56%

...

Pick: 10.0.0.2, Weight: 100, Pick threshold: 6%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 4%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 4%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 4%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 2%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 2%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 2%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 100%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 100%
Pick: 10.0.0.1, Weight: 100, Pick threshold: 98%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 98%

...

Pick: 10.0.0.1, Weight: 100, Pick threshold: 68%
Pick: 10.0.0.2, Weight: 100, Pick threshold: 68%
Pick: 10.0.0.3, Weight: 66, Pick threshold: 66%

Distribution...
10.0.0.1: 39%
10.0.0.2: 38.5%
10.0.0.3: 22.5%
```

When running this algorithm with a large number of cycles the overall weighted distribution is correct, however this still leads to a incorrect distribution of nodes for smaller intervals.

## Possible Solution

I am suggesting a slight modification to this algorithm to avoid this scenario. Instead of the GCD, we can utilize the smallest weight to better distribute the picked nodes.

```lua
-- Replace get_gcd(nodes) https://github.com/openresty/lua-resty-balancer/blob/a7a8b625c6d79d203702709983b736137be2a9bd/lib/resty/roundrobin.lua#L28
local function get_lowest_weight(nodes)
  local first_id, max_weight = next(nodes)
  if not first_id then
    return error("empty nodes")
  end

  local only_key = first_id
  local lowest = max_weight
  for id, weight in next, nodes, first_id do
    only_key = nil
    lowest = weight < lowest and weight or lowest
    max_weight = weight > max_weight and weight or max_weight
  end

  return only_key, max(lowest, 1), max_weight
end

-- ================================

-- Replace self.gcd with self.lowest
function _M.new(_, nodes)
  ...
  local only_key, lowest, max_weight = get_lowest_weight(newnodes)

  local self = {
      ...
      lowest = lowest,
      ...
  }
  return setmetatable(self, mt)
end

```

Now, instead of resetting the probability of pick back to `max_weight` when it becomes <= `0%`, we can allow a guaranteed pick for the next node. This will avoid the cases when a node can be skipped entirely when using to the `lowest_weight` to increase the pick chance.

```lua
local function find(self)
  ...

  while true do
    while true do
      ...
    end

    -- New logic
    if cw == 0 then
      cw = self.max_weight
    else
      cw = cw - self.lowest
      if cw < 0 then
          cw = 0
      end
    end
  end
end
```

With this solution I have conducted the same test for `20`, `300`, `1000` cycles using the second weight configuration.

**20 Cycles**

![image](https://user-images.githubusercontent.com/8213048/159391227-5c3d51e9-fb2e-418d-8784-e1bac80b78ca.png)
```
Distribution...
10.0.0.1: 40%
10.0.0.2: 35%
10.0.0.3: 25%
```

**300 Cycles**

![image](https://user-images.githubusercontent.com/8213048/159391673-26cd9a8a-defc-4382-ac9b-56b86170b0ee.png)
```
Distribution...
10.0.0.1: 37.666666666667%
10.0.0.2: 37.333333333333%
10.0.0.3: 25%
```

**1000 Cycles**

![image](https://user-images.githubusercontent.com/8213048/159391614-ed537369-eafb-455c-8ee5-ba1120e8b55f.png)
```
Distribution...
10.0.0.1: 37.5%
10.0.0.2: 37.5%
10.0.0.3: 25%
```

We can see that the overall distribution of nodes still follows their respective weights and we no longer have this pattern where node `10.0.0.3` is not picked for large period of time. Even at smaller intervals (20) the distribution of nodes are correct.

## Some Limitations

Even with the above solutions we can still have configurations where this issue persist. For example the following configuration will still have a similar distribution of nodes even with the modified algorithm.

```lua
local NODES = {}
NODES["10.0.0.1"] = 100
NODES["10.0.0.2"] = 100
NODES["10.0.0.3"] = 66
NODES["10.0.0.4"] = 1
```

We can avoid this by introducing an offset to `self.lowest` or by setting a minimum value. However, this will mess with the relative weight distribution at small number of cycles.

## Conclusion

I do not have a perfect solution to this problem. But it is affecting real world data that relies on this algorithm to accurately distribute traffic across a set of nodes. Something to keep in mind is that, [nginx implementation of WRR algorithm](https://github.com/nginx/nginx/blob/master/src/http/ngx_http_upstream_round_robin.c) is vastly different to the one implemented here and on first inspection it doesn't look like that algorithm suffers this same limitation. It possible that algorithm can be adopted in here.  I will include a real world test with production data as comment bellow in this issue.

<details>
  <summary>Test script and setup</summary>

  **Setup**
  ```sh
  $ git clone git@github.com:openresty/lua-resty-balancer.git
  $ touch lib/test.lua
  ```

  <details>
    <summary>Script</summary>

    local rr = require("resty.roundrobin")

    local BALANCE_CALLS = 1000

    local function dump(o)
      if type(o) == 'table' then
        local s = '{ '
        for k,v in pairs(o) do
            if type(k) ~= 'number' then k = '"'..k..'"' end
            s = s .. '['..k..'] = ' .. dump(v) .. ','
        end
        return s .. '} '
      else
        return tostring(o)
      end
    end

    print("Setup Nodes ...")

    local NODES = {}
    NODES["10.0.0.1"] = 100
    NODES["10.0.0.2"] = 100
    NODES["10.0.0.3"] = 66

    print(dump(NODES))

    local NODE_COUNTS = {}
    NODE_COUNTS["10.0.0.1"] = 0
    NODE_COUNTS["10.0.0.2"] = 0
    NODE_COUNTS["10.0.0.3"] = 0

    print ("Setup roundrobin ...")
    local rr_instance = rr:new(NODES)

    print("Number of cycles: ", BALANCE_CALLS)

    local out = io.open('data.csv', 'w')
    for i = 1, BALANCE_CALLS do
      if (rr.DEBUG) then
        print(" ")
      end
      local node = rr_instance:find()
      print("Pick: ", node, ", Weight: ", NODES[node], ", Pick threshold: ", (rr_instance.cw / rr_instance.max_weight) * 100, "%")
      NODE_COUNTS[node] = NODE_COUNTS[node] + 1
      out:write(i .. "," .. NODE_COUNTS["10.0.0.1"] .. "," .. NODE_COUNTS["10.0.0.2"] .. "," .. NODE_COUNTS["10.0.0.3"] .. "\n")
    end
    out:close()

    print("\nDistribution...")
    print("10.0.0.1: ", NODE_COUNTS["10.0.0.1"] / BALANCE_CALLS * 100, "%")
    print("10.0.0.2: ", NODE_COUNTS["10.0.0.2"] / BALANCE_CALLS * 100, "%")
    print("10.0.0.3: ", NODE_COUNTS["10.0.0.3"] / BALANCE_CALLS * 100, "%")

  </details>

  <details>
    <summary>Modified `roundrobin.lua`</summary>

    local pairs = pairs
    local next = next
    local tonumber = tonumber
    local setmetatable = setmetatable
    local math_random = math.random
    local error = error
    local max = math.max

    local utils = require "resty.balancer.utils"

    local copy = utils.copy
    local nkeys = utils.nkeys
    local new_tab = utils.new_tab

    local _M = {}
    local mt = { __index = _M }


    local function get_lowest_weight(nodes)
        local first_id, max_weight = next(nodes)
        if not first_id then
            return error("empty nodes")
        end

        local only_key = first_id
        local lowest = max_weight
        for id, weight in next, nodes, first_id do
            only_key = nil
            lowest = weight < lowest and weight or lowest
            max_weight = weight > max_weight and weight or max_weight
        end

        return only_key, max(lowest, 1), max_weight
    end

    local function get_random_node_id(nodes)
        local count = nkeys(nodes)

        local id = nil
        local random_index = math_random(count)

        for _ = 1, random_index do
            id = next(nodes, id)
        end

        return id
    end


    function _M.new(_, nodes)
        local newnodes = copy(nodes)
        local only_key, lowest, max_weight = get_lowest_weight(newnodes)
        local last_id = get_random_node_id(nodes)

        local self = {
            nodes = newnodes,  -- it's safer to copy one
            only_key = only_key,
            max_weight = max_weight,
            lowest = lowest,
            cw = max_weight,
            last_id = last_id,
        }
        return setmetatable(self, mt)
    end


    function _M.reinit(self, nodes)
        local newnodes = copy(nodes)
        self.only_key, self.lowest, self.max_weight = get_lowest_weight(newnodes)

        self.nodes = newnodes
        self.last_id = get_random_node_id(nodes)
        self.cw = self.max_weight
    end


    local function _delete(self, id)
        local nodes = self.nodes

        nodes[id] = nil

        self.only_key, self.lowest, self.max_weight = get_lowest_weight(nodes)

        if id == self.last_id then
            self.last_id = nil
        end

        if self.cw > self.max_weight then
            self.cw = self.max_weight
        end
    end
    _M.delete = _delete


    local function _decr(self, id, weight)
        local weight = tonumber(weight) or 1
        local nodes = self.nodes

        local old_weight = nodes[id]
        if not old_weight then
            return
        end

        if old_weight <= weight then
            return _delete(self, id)
        end

        nodes[id] = old_weight - weight

        self.only_key, self.lowest, self.max_weight = get_lowest_weight(nodes)

        if self.cw > self.max_weight then
            self.cw = self.max_weight
        end
    end
    _M.decr = _decr


    local function _incr(self, id, weight)
        local weight = tonumber(weight) or 1
        local nodes = self.nodes

        nodes[id] = (nodes[id] or 0) + weight

        self.only_key, self.lowest, self.max_weight = get_lowest_weight(nodes)
    end
    _M.incr = _incr



    function _M.set(self, id, new_weight)
        local new_weight = tonumber(new_weight) or 0
        local old_weight = self.nodes[id] or 0

        if old_weight == new_weight then
            return
        end

        if old_weight < new_weight then
            return _incr(self, id, new_weight - old_weight)
        end

        return _decr(self, id, old_weight - new_weight)
    end


    local function find(self)
        local only_key = self.only_key
        if only_key then
            return only_key
        end

        local nodes = self.nodes
        local last_id, cw, weight = self.last_id, self.cw

        while true do
            while true do
                last_id, weight = next(nodes, last_id)
                if not last_id then
                    break
                end

                if weight >= cw then
                    self.cw = cw
                    self.last_id = last_id
                    return last_id
                end
            end

            if cw == 0 then
                cw = self.max_weight
            else
              cw = cw - self.lowest
              if cw < 0 then
                  cw = 0
              end
            end
        end
    end
    _M.find = find
    _M.next = find


    return _M
  </details>

  **Execute with**
  ```sh
  $ resty lib/test.lua
  ```

</details>

**Data used in graphs:** https://docs.google.com/spreadsheets/d/1ba570tbELUbG_N-Q-5vq15EyOP0x6wCd6X2Pu1ZAPrQ/edit?usp=sharing


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strange weighted round robin behaviour with special configuration of weights #44

Description

Setup

Weights Configuration

Observations

Openresty WRR implementation

Possible Solution

Some Limitations

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strange weighted round robin behaviour with special configuration of weights #44

Description

Description

Setup

Weights Configuration

Observations

Openresty WRR implementation

Possible Solution

Some Limitations

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions