You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/core/starvation-and-tuning.md
+31-10Lines changed: 31 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,9 @@ title: Starvation and Tuning
6
6
All Cats Effect applications constructed via `IOApp` have an automatic mechanism which periodically checks to see if the application runtime is starving for compute resources. If you ever see warnings which look like the following, they are the result of this mechanism automatically detecting that the responsiveness of your application runtime is below the configured threshold. Note that the timestamp is the time when the starvation was detected, which is not precisely the time when starvation (or the task that is responsible) began.
7
7
8
8
```
9
-
2023-01-28T00:16:24.101Z [WARNING] Your app's responsiveness to a new asynchronous
9
+
2023-01-28T00:16:24.101Z [WARNING] Your app's responsiveness to a new asynchronous
10
10
event (such as a new connection, an upstream response, or a timer) was in excess
11
-
of 40 milliseconds. Your CPU is probably starving. Consider increasing the
11
+
of 40 milliseconds. Your CPU is probably starving. Consider increasing the
12
12
granularity of your delays or adding more cedes. This may also be a sign that you
13
13
are unintentionally running blocking I/O operations (such as File or InetAddress)
@@ -193,9 +196,9 @@ A quick-and-dirty experimental way that this can be established for your specifi
193
196
valexpensiveThing:IO[A] =???
194
197
195
198
IO.unit.timed flatMap {
196
-
case (baseline, _) =>
199
+
case (baseline, _) =>
197
200
IO.println(s"baseline stage cost is $baseline") >> expensiveThing.timed flatMap {
198
-
case (cost, result) =>
201
+
case (cost, result) =>
199
202
if (cost / baseline >1024)
200
203
IO.println("expensiveThing is very expensive").as(result)
201
204
else
@@ -348,6 +351,14 @@ The problem with "going wide" is it restricts the resources available within use
348
351
349
352
Of course, it's never as simple as doubling the number of vCPUs and halving the number of instances. Scaling is complicated, and you'll likely need to adjust other resources such as memory, connection limits, file handle counts, autoscaling signals, and such. Overall though, a good rule of thumb is to consider 8 vCPUs to be the minimum that should be available to a Cats Effect application at scale. 16 or even 32 vCPUs is likely to improve performance even further, and it is very much worth experimenting with these types of tuning parameters.
350
353
354
+
#### Not Enough Threads - Running in Kubernetes
355
+
356
+
One cause of "not enough threads" can be that the application is running inside kubernetes with a `cpu_quota` not configured. When the cpu limit is not configured, the JVM detects the number of available processors as 1, which will severely restrict what the runtime is able to do.
357
+
358
+
This guide on [containerizing java applications for kubernetes](https://learn.microsoft.com/en-us/azure/developer/java/containers/kubernetes#understand-jvm-available-processors) goes into more detail on the mechanism involved.
359
+
360
+
**All Cats Effect applications running in kubernetes should have either a `cpu_quota` configured or use the jvm `-XX:ActiveProcessorCount` argument to explicitly tell the jvm how many cores to use.**
361
+
351
362
### Too Many Threads
352
363
353
364
In a sense, this scenario is like the correlated inverse of the "Not Enough CPUs" option, and it happens surprisingly frequently in conventional JVM applications. Consider the thread list from the previous section (assuming 8 CPUs):
@@ -397,7 +408,7 @@ This can be accomplished in some cases by using `IO.executionContext` or `IO.exe
397
408
398
409
- The source of the rogue threads (e.g. another library) must have some initialization mechanism which accepts an `ExecutionContext` or `Executor`
399
410
- The source of the rogue threads must not ever *block* on its rogue threads: they must only be used for compute
400
-
+ The exception to this is if the library in question is a well-behaved Scala library, often from the Akka ecosystem, which wraps its blocking in `scala.concurrent.blocking(...)`. In this case, it is safe to use the Cats Effect compute pool, and the results will be similar to what happens with `IO.blocking`
411
+
- The exception to this is if the library in question is a well-behaved Scala library, often from the Akka ecosystem, which wraps its blocking in `scala.concurrent.blocking(...)`. In this case, it is safe to use the Cats Effect compute pool, and the results will be similar to what happens with `IO.blocking`
401
412
402
413
Determining both of these factors often takes some investigation, usually of the "thread dumps and async profiler" variety, trying to catch the rogue threads in a blocked state. Alternatively, you can just read the library source code, though this can be very time consuming and error prone.
403
414
@@ -446,6 +457,16 @@ The solution is to eliminate this over-provisioning. If a scheduled container is
446
457
447
458
As a very concrete example of this, if you have a cluster of 16 host instances in your cluster, each of which having 64 CPUs, that gives you a total of 1024 vCPUs to work with. If you configure each application container to use 4 vCPUs, you can support up to 256 application instances simultaneously (without resizing the cluster). Over-provisioning by a factor of 100% would suggest that you can support up to 512 application instances. **Do not do this.** Instead, resize the application instances to use either 8 or 16 vCPUs each. If you take the latter approach, your cluster will support up to 64 application instances simultaneously. This *seems* like a downgrade, but these taller instances should (absent other constraints) support more than 4x more traffic than the smaller instances, meaning that the overall cluster is much more efficient.
448
459
460
+
#### Kubernetes CPU Pinning
461
+
462
+
Even if you have followed the above advice and avoided over-provisioning, the Linux kernel scheduler is unfortunately not aware of the Cats Effect scheduler and will likely actively work against the Cats Effect scheduler by moving Cats Effect worker threads between different CPUs, thereby destroying CPU cache-locality. In certain environments we can prevent this by configuring Kubernetes to pin an application to a gviven set of CPUs:
463
+
1. Set the [CPU Manager Policy to static](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy)
464
+
2. Ensure that your pod is in the [Guaranteed QoS class](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed)
465
+
3. Request an integral number of CPUs for your Cats Effect application
466
+
467
+
You should be able to see the CPU assignment updates reflected in the kubelet logs.
468
+
469
+
449
470
### Process Contention
450
471
451
472
All of the advice in this page is targeted towards the (very common) scenario in which your Cats Effect application `java` process is the only meaningfully active process on a given instance. In other words, there are no additional applications on the same server, no databases, nothing. If this is *not* the case, then a lot of the advice about having too many threads applies, but *worse*.
@@ -487,7 +508,7 @@ To entirely disable the checker (**not** recommended in most cases!), adjust you
0 commit comments