Skip to content

Conversation

@josibake
Copy link
Collaborator

tell k8s up front the min and max CPU, memory we want for a tank. can be overriden from a graph file if a tank needs more/less.

did some testing, and this allows k8s to do autoscaling when trying to start a network. this means if a cluster does not have enough nodes and you try to start a network with 50 tanks, k8s will keep the pods in pending until it adds more nodes to the cluster. this allows for much more graceful cluster management.

annoyingly, if the pod is already scheduled and the workload of the pod increases, k8s can't do cluster autoscaling anymore, but there might be another way to tackle this via vertical scaling. as of today, the best way to handle this would be to set a much higher limits, which should help? (not tested).

testing

tested on digital ocean cluster with autoscaling, has not been tested on Minikube or docker desktop, but should work since requests and limits are not k8s specific config options.

@josibake josibake force-pushed the add-resource-requests branch from 4197899 to df625d5 Compare August 19, 2024 12:21
@m3dwards
Copy link
Collaborator

Ran on Docker Desktop's Kubernetes and could see:

"resources": {
    "limits": {
        "cpu": "1",
        "memory": "1500Mi"
    },
    "requests": {
        "cpu": "500m",
        "memory": "500Mi"
    }
},

@m3dwards
Copy link
Collaborator

CI failed, perhaps because minikube doesn't think it has enough resources to start the tanks

@josibake josibake force-pushed the add-resource-requests branch 9 times, most recently from 8765aeb to ff4ddab Compare August 19, 2024 16:24
@willcl-ark
Copy link
Contributor

Added a commit to

  • fix schema
  • validate generated graphs during creation
  • apply default resource limits to all created graphs (@josibake do we want this?)

@willcl-ark
Copy link
Contributor

An alternative using user-friendly "profiles": willcl-ark#24

Copy link
Contributor

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just set lower resource usage in the CI so the tests can run as written without any changes?


def create_cycle_graph(n: int, version: str, bitcoin_conf: str | None, random_version: bool):
def create_cycle_graph(
n: int, version: str, bitcoin_conf: str | None, random_version: bool, resources: dict | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resources isnt being used in the function?

@pinheadmz
Copy link
Contributor

Error on docker desktop, warcli network start (so, default.graphml)

2024-08-20 12:04:09 | DEBUG   | k8s      | Deploying pods
2024-08-20 12:04:09 | DEBUG   | k8s      | Creating bitcoind container for tank 0
2024-08-20 12:04:09 | ERROR   | server   | Unhandled exception starting warnet: 'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "/root/warnet/src/warnet/server.py", line 469, in thread_start
    wn.warnet_up()
  File "/root/warnet/src/warnet/warnet.py", line 176, in warnet_up
    self.container_interface.up(self)
  File "/root/warnet/src/warnet/backend/kubernetes_backend.py", line 57, in up
    self.deploy_pods(warnet)
  File "/root/warnet/src/warnet/backend/kubernetes_backend.py", line 682, in deploy_pods
    bitcoind_container = self.create_bitcoind_container(tank)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/warnet/src/warnet/backend/kubernetes_backend.py", line 410, in create_bitcoind_container
    requests=tank.resources["requests"],
             ~~~~~~~~~~~~~~^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable


@pinheadmz
Copy link
Contributor

I think the bug Im getting is in the default set in schema.json

@pinheadmz
Copy link
Contributor

example fix:

diff --git a/src/warnet/graph_schema.json b/src/warnet/graph_schema.json
index aa1518f..4b178a4 100644
--- a/src/warnet/graph_schema.json
+++ b/src/warnet/graph_schema.json
@@ -69,25 +69,13 @@
         "comment": "A string of arguments for the lightning network node in command-line format, e.g. '--protocol.wumbo-channels --bitcoin.timelockdelta=80'"
       },
       "resources": {
-            "type": "object",
-            "comment": "Kubernetes resource requests and limits for the node",
-            "properties": {
-              "requests": {
-                "type": "object",
-                "properties": {
-                  "cpu": {"type": "string", "default": "500m"},
-                  "memory": {"type": "string", "default": "500Mi"}
-                }
-              },
-              "limits": {
-                "type": "object",
-                "properties": {
-                  "cpu": {"type": "string", "default":"1000m"},
-                  "memory": {"type": "string", "default":"1500Mi"}
-                }
-              }
-            }
+        "type": "object",
+        "comment": "Kubernetes resource requests and limits for the node",
+        "default": {
+          "requests": {"cpu": "100m", "memory": "250Mi"},
+          "limits": {"cpu": "250m", "memory": "500Mi"}
         }
+      }
     },
     "additionalProperties": false,
     "oneOf": [

@josibake josibake force-pushed the add-resource-requests branch 3 times, most recently from 92e2c92 to 7c38f2d Compare August 20, 2024 13:54
now that we are explicitly requesting resources,
pods will not get scheduled unless the cluster
has enough resources to satisfy the initial request.

this was causing our current tests to fail, so bump
minikubes resources. the tests will be scaled down
in the next commits
test with 6 nodes instead of 8.
also replace version with image: this sets us up for a later change
to remove version from the config completely
we use image for everything now and version is a more of a leftover
from before when we didnt have a docker registry set up. in a later
commit, we will completely remove version from the jsonschema file
@josibake josibake force-pushed the add-resource-requests branch 7 times, most recently from 8056b39 to 90ebfd6 Compare August 20, 2024 20:47
@josibake josibake force-pushed the add-resource-requests branch from 90ebfd6 to 5862a25 Compare August 20, 2024 20:54
@josibake
Copy link
Collaborator Author

Closing as this is now obsolete. We can now configure everything in one place using the network.yaml/node-defaults.yaml and namespaces.yaml/namespace-defaults.yaml helm files.

Only thing currently missing is good documentation and examples on how to use the yaml configuration files which I think we will need before tabconf, opened a backlog issue for that here: #496

@josibake josibake closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants