|
1 | | -# Status & Monitoring |
2 | | - |
3 | | -## Runners Status |
4 | | - |
5 | | -The status of a Runner is displayed in the **Runners** page. The following table describes the different statuses: |
6 | | - |
7 | | -| **Runner Status** | **Description** | |
8 | | -|-------------------|---------------------------------------------------------------------------------------------------------------------------------| |
9 | | -| **New** | A new Replica has been created for this Runner but has not yet been installed or sent a heartbeat. | |
10 | | -| **Healthy** | All Replicas for this Runner are sending heartbeats and are available for tasks | |
11 | | -| **Unhealthy** | At least one or more Replicas are unavailable for Down, but there is still at least one Replicas that _is_ available for tasks. | |
12 | | -| **Unknown** | All Replicas of the Runner have not sent a heartbeat in over 30 seconds, but have not yet been declared Down. | |
13 | | -| **Down** | All Replicas of the Runner are declared Down and therefore have not sent a heartbeat in the past 120 seconds. | |
14 | | - |
15 | | -## Replica Status |
16 | | - |
17 | | -The status of Replicas can be seen by navigating to the **Replicas** tab of the Runner. The status of each Replica is shown in the **Last Active** column. The status can be one of the following: |
18 | | -| **Replica Status** | **Description**| |
19 | | -|-------------------|---------------------------------------------------------------------------------------------------------------------------------| |
20 | | -| **New** | The Replica has been created but not yet started. Heartbeats are sent from the Replica every 2 seconds.| |
21 | | -| **Healthy** | The Replica is currently running and available for tasks.| |
22 | | -| **Unhealthy** | The Replica has connected to Runbook Automation but is experiencing a high workload. This status is set to safeguard the execution times and tells Runbook Automation to utilize another Replica - if available.| |
23 | | -| **Unknown** | The server has not heard from the Replica in 30 seconds. Tasks will not be assigned to this Replica.| |
24 | | -| **Down** | The Replica has not been heard from in 120 seconds. Tasks will not be assigned to this Replica.| |
25 | | - |
26 | | -## Tuning Replicas |
27 | | - |
28 | | -Replicas are equipped to execute multiple tasks concurrently - such as executing multiple Job simultaneously or targeting multiple nodes within a Job in parallel. By default, a Replica can handle 50 concurrent task executions. |
29 | | - |
30 | | -- An **Unhealthy** status for a Replica is declared when that Replica can not longer accept new tasks because it has reached the concurrency threshold. You can check the number of concurrent operations via the API endpoint [Get runner information](/api/index.md#get-runner-information) under the variable **runningOperations** |
31 | | -- The maximum number of concurrent executions can be tuned using the parameter ` -Drunner.operations.maxRunning=<EXEC_LIMIT>` when deploying a Replica. However, please note the following: |
32 | | - - The execution limit is linked to the available resources set for the Replica process. Although a maximum number of executions can be established via this parameter, the Replica will throttle the number of executions based on the available resources (CPU, Memory, Stack Memory and Heap Space in Java) as well as the number of tasks associated with that execution. |
33 | | - - It is recommended to review the allocated resources to the machine and the Replica process when it is reporting as **Unhealthy**. While Replicas can be scaled vertically by allocating additional compute resources to the Java process, note that the Runner feature is intentionally designed to scale horizontally by deploying additional Replicas. |
34 | | - |
35 | | -## Ping Replicas |
36 | | - |
37 | | -Users can check that a Replica is available via an ad hoc "ping" operation: |
38 | | - |
39 | | -1. When managing a Runner - either at the Project or System level - click on the **Replicas** tab. |
40 | | -2. Select the **Actions** menu and click on **+ Ping**: |
41 | | - <br> |
42 | | -3. After a few seconds, the response will appear in the upper right. |
43 | | -4. If the Runner is available, the response show that the message was received: |
44 | | - <br> |
45 | | -5. If the Runner is unavailable, the response will show that the ping response timed out: |
46 | | - <br> |
47 | | - |
48 | | -## Monitoring Replicas |
| 1 | +# Monitoring Runners |
49 | 2 |
|
50 | 3 | The Enterprise Runner is a lightweight JVM process. It can therefore be monitored with standard JMX monitoring tools. |
51 | 4 |
|
52 | | -The Replica exposes a number of JMX MBeans that can be used to monitor the Replicas's health and performance. |
| 5 | +The Runner exposes a number of JMX MBeans that can be used to monitor the Runner's health and performance. |
53 | 6 |
|
54 | | -To expose the JMX Mbeans, you can start the Replica with the following Java options: |
| 7 | +To expose the JMX Mbeans, you can start the Runner with the following Java options: |
55 | 8 |
|
56 | 9 | - `-Dcom.sun.management.jmxremote` - This enables remote JMX monitoring. |
57 | 10 | - `-Dcom.sun.management.jmxremote.port` - This sets the port that the JMX Mbeans will be exposed on. |
|
0 commit comments