Skip to content

Timeout configuration logic is not very intuitive in Hyperlight #198

@simongdavies

Description

@simongdavies

When debugging I wanted intentionally to set high values for timeouts, to avoid threads being terminated while I keep them paused to explore memory state. When doing so I observed a somewhat surprising behavior, when I set max execution timeout to 5m, it instead was silently changed to 1s.

See https://github.com/hyperlight-dev/hyperlight/blob/main/src/hyperlight_host/src/sandbox/config.rs#L284-L301.

The current behavior seem to be as follows:

When you set timeout to 0, Hyperlight instead uses default timeout
When you set timeout to a value between 1 and maximum currently allowed timeout, Hyperlight clamps the timeout value within the range from minimmum sensible non-0 value (i.e. MIN_MAX_EXECUTION_TIME for the execution timeout) and maximum allowed value
When you set timeout to a value larger than the maximum allowed timeout, Hyperlight instead uses the default value.
This logic applies to all 3 timeouts (execution time, initialization time, cancel wait time).

Cases 1 and 2 seem reasonable, assuming that value 0 is often used as a sentinel special value and we can't use optional type within the SandboxConfiguration type. However case 3 seem surprising and, to me, is unlikely to align with the intent of the caller.

I suggest to change the behavior in case 3, to one of the following:

[preferred] When a value outside of the currently allowed range provided, use the largest allowed value instead
Return an error to indicate that large timeouts are not supported.
NOTE: Both option will change the current behavior, but it seems to me that the first option, while a change is likely less
breaking and probably aligns with the caller intent better than the current behavior, thus it's a preferred option.

Even better would be to allow disabling timeouts all together or support much larger timeouts. It might be not the primary use case for Hyperlight, but there is nothing inhertently wrong with a timeout of a few minutes.

One caveat here is that SandboxConfiguration is a repr(C) structure and timeouts are represented there as milliseconds in u8 and u16 types. Because of that, it might not be possible to change the current behavior without either breaking ABI or changing behavior of the existing APIs. So if we at least can change the behavior in the case 3 above, I think it would already be an improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/duplicateThis issue is a duplicate of another issue

    Type

    No type

    Projects

    Status

    No status

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions