Skip to content

Conversation

@cplaursen
Copy link
Contributor

Proof of concept for XAPI throttling.
This allows users to specify the user-agent rate limited clients in xapi.conf, which then consume from a token bucket whenever a request is made, and have to wait for it to refill if they exceed its capacity.

stephenchengCloud and others added 30 commits September 12, 2025 09:25
This change introduces a new pool-level parameter that restricts VNC console access
to a single active session per VM/host.
This prevents multiple users from simultaneously connecting to the same VM console,
preventing one user 'watching' another user operating a session.
When the `limit_console_sessions` is true.
- Enforced a single active VNC console connection per VM/host
- Disable connection to websocket

Signed-off-by: Stephen Cheng <[email protected]>
…api-project#6660)

This change introduces a new pool-level parameter that restricts VNC
console access to a single active session per VM/host.
This prevents multiple users from simultaneously connecting to the same
VM console, preventing one user 'watching' another user operating a
session. When the `limit_console_sessions` is true.
- Enforced a single active VNC console connection per VM/host
- Disable connection to websocket
The field sets the maximum time (in seconds) that a VM's console can be idle
before it is automatically disconnected. The default value 0 means never timeout.
This setting applies only to VM consoles;
for host consoles, use the separate parameter 'host.console_idle_timeout'.

Signed-off-by: Stephen Cheng <[email protected]>
The parser only parses the message types for client-to-server messsages,
aiming to identify message types from clients.

Signed-off-by: Stephen Cheng <[email protected]>
Two commands are used to set max_cstate: xenpm to set at runtime
and xen-cmdline to set it in grub conf file to take effect after
reboot.

Signed-off-by: Changlei Li <[email protected]>
String is used to represent the max_cstate and max_sub_cstate.
"" -> unlimited
"N" -> max cstate CN
"N,M" -> max cstate CN with max sub state M
Just follow the xen-cmdline cstate, see
https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#max_cstate-x86

Signed-off-by: Changlei Li <[email protected]>
C-states are power management states for CPUs where higher numbered
states represent deeper sleep modes with lower power consumption but
higher wake-up latency. The max_cstate parameter controls the deepest
C-state that CPUs are allowed to enter.

Common C-state values:
- C0: CPU is active (not a sleep state)
- C1: CPU is halted but can wake up almost instantly
- C2: CPU caches are flushed, slightly longer wake-up time
- C3+: Deeper sleep states with progressively longer wake-up times

To set max_cstate on dom0 host, two commands are used: `xenpm` to set at
runtime and `xen-cmdline` to set it in grub conf file to take effect
after reboot.
xenpm examples:
```
   # xenpm set-max-cstate 0 0
   max C-state set to C0
   max C-substate set to 0 succeeded
   # xenpm set-max-cstate 0
   max C-state set to C0
   max C-substate set to unlimited succeeded
   # xenpm set-max-cstate unlimited
   max C-state set to unlimited
   # xenpm set-max-cstate -1
   Missing, excess, or invalid argument(s)
```
xen-command-line examples:
```
/opt/xensource/libexec/xen-cmdline --get-xen max_cstate
     "" -> unlimited
     "max_cstate=N" -> max cstate N
     "max_cstate=N,M" -> max cstate N, max c-sub-state M *)
/opt/xensource/libexec/xen-cmdline --set-xen max_cstate=1
/opt/xensource/libexec/xen-cmdline --set-xen max_cstate=1,0
/opt/xensource/libexec/xen-cmdline --delete-xen max_cstate
```

[xen-command-line.max_cstate](https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#max_cstate-x86).

This PR adds a new field `host.max_cstate` to manage host's max_cstate.
`host.set_max_cstate` use the two commands mentioned above to configure.
While dbsync on xapi start, the filed will be synced by `xen-cmdline
--get-xen max_cstate`
- write ntp servers to chrony.conf
- interaction with dhclient
  - handle /run/chrony-dhcp/$interface.sources
  - handle chrony.sh
- restart/enable/disable chronyd

Signed-off-by: Changlei Li <[email protected]>
This commit adds idle timeout feature for vnc console connections.

Key changes:
- Add idle timeout detection by monitoring RFB keyEvent and
  pointerEvent.
- Add callback function to `proxy` to parse the RFB messages and
  determine if the connection is idle or not.

Signed-off-by: Stephen Cheng <[email protected]>
This commit adds idle timeout feature for vnc console connections.

Key changes:
- Add idle timeout detection by monitoring RFB keyEvent and
pointerEvent.
- Add callback function to `proxy` to parse the RFB messages and
determine if the connection is idle or not.
Add detailed reason in http response when console
connection limits are exceeded.

Signed-off-by: Stephen Cheng <[email protected]>
At XAPI start, check the actual NTP config to determine the
ntp mode, ntp enabled, ntp custom servers and store in xapi
DB.

Signed-off-by: Changlei Li <[email protected]>
New filed: `host.ntp_mode`, `host.ntp_custom_servers`
New API: `host.set_ntp_mode`, `host.set_ntp_custom_servers`,
`host.get_ntp_mode`, `host.get_ntp_custom_servers`,
`host.get_ntp_servers_status`.

**ntp_mode_dhcp**: In this mode, ntp uses the dhcp assigned ntp servers
as sources. In Dom0, dhclient triggers `chrony.sh` to update the ntp
servers when network event happens. It writes ntp servers to
`/run/chrony-dhcp/$interface.sources` and the dir `/run/chrony-dhcp` is
included in `chrony.conf`. The dhclient also stores dhcp lease in
`/var/lib/xcp/dhclient-$interface.leases`, see
https://github.com/xapi-project/xen-api/blob/v25.31.0/ocaml/networkd/lib/network_utils.ml#L925.
When switch ntp mode to dhcp, XAPI checks the lease file and finds ntp
server then fills chrony-dhcp file. The exec permission of `chrony.sh`
is added. When swith ntp mode from dhcp to others, XAPI removes the
chrony-dhcp files and the exec permission of `chrony.sh`. The operation
is same with xsconsole
https://github.com/xapi-project/xsconsole/blob/v11.1.1/XSConsoleData.py#L593.
In this feature, xsconsole will change to use XenAPI to manage ntp later
to avoid conflict.

**ntp_mode_custom**: In this mode, ntp uses `host.ntp_custom_servers` as
sources. This is implemented by changing `chrony.conf` and restart
chronyd. `host.ntp_custom_servers` is set by the user.

**ntp_mode_default**: In this mode, ntp uses default-ntp-servers in XAPI
config file.
For example, the legacy default ntp servers are
[0-3].centos.pool.ntp.org, and current default
ntp servers are [0-3].xenserver.pool.ntp.org.
After update or upgrade, the legacy default ntp
servers are recognized and changed to current
default ntp servers. The mode is ntp_mode_default
as well.

Signed-off-by: Changlei Li <[email protected]>
For example, the legacy default ntp servers are
`[0-3].centos.pool.ntp.org`, and current default ntp servers are
`[0-3].xenserver.pool.ntp.org`. After update or upgrade, the legacy
default ntp servers are recognized and changed to current default ntp
servers. The mode is `ntp_mode_default` as well.
Add a new config option named legacy-default-ntp-servers. It will be
defined in xapi.conf.d/xenserver.conf (the same with
default-ntp-servers)
Signed-off-by: Christian Lindig <[email protected]>
This commit adds three fields to VM_metrics:
- numa_optimised: bool - whether a VM is optimised for NUMA
- numa_nodes: int - number of NUMA nodes associated with VM
- numa_node_memory: Map(int, int) - amount of VM memory in NUMA node X

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Zero or negative rate limits can cause issues in the behaviour of rate
limiting. In particular, zero fill rate leads to a division by zero in time
calculations. Rather than account for this, we forbid the creation of token
buckets with a bad fill rate by returning None.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Make token bucket type abstract to hide Hashtbl.t
Use `replace` rather than `add` for adding a new bucket

Signed-off-by: Christian Pardillo Laursen <[email protected]>
The current implementation of rate limiting had severe fairness issues.
These have been resolved through the addition of a request queue, to
which rate limited requests are added. A worker thread sleeps until its
associated token bucket has enough tokens to handle the request at the
head of the queue, calls it, and sleeps until the next request is ready.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Creating a token bucket fails if the rate limit supplied is 0 or
negative - this can lead to unexpected and undesirable behaviour, such as
division by 0 or negative token counts.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
The rate limiting can no longer be set from xapi_globs. Instead, the rate
limiter is initialised from the database on startup now.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Synchronous requests can be long-running, which can cause issues if they are all
processed on the same worker thread.

This commit updates the code to process synchronous requests on the original
caller thread - the worker thread is now only responsible for signalling on a
provided channel to wake up the caller.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
When an async request is rate limited, we confirm receipt immediately
but enqueue the actual request, rather than rate limiting the first
response too.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Rather than have the same token costs for all calls, we want to
rate limit clients based on how expensive the functions they are
calling are.

I measured the average runtime of most xapi calls, which are stored
in a hashtable and are used as the token cost for each call. This does
not account for any overheads, and will need to be adjusted - it is
merely a starting point. We also use the median as a default.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Signed-off-by: Christian Pardillo Laursen <[email protected]>
Make Bucket_table a functor that accepts a Map.OrderedType module
parameter, allowing the key type to be customised at instantiation.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Until now, the only way to identify rate limited hosts was through
their user agent. This commit adds the option to rate limit by IP
address.

Signed-off-by: Christian Pardillo Laursen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.