Skip to content

Feature: finalize the design of the async API #47

@ncoghlan

Description

@ncoghlan

There are two open questions responsible for the FutureWarning emitted when instantiating lmstudio.AsyncClient:

>>> import lmstudio as lms
>>> lms.AsyncClient()
/home/acoghlan/lmstudio/python-sdk/src/lmstudio/async_api.py:1444: FutureWarning: Note the async API is not yet stable and is expected to change in future releases

  warnings.warn(_ASYNC_API_STABILITY_WARNING, FutureWarning)

The first open question is relatively straightforward: should the async APIs accept callbacks as coroutines or as regular synchronous functions?

They're currently regular synchronous functions, and it seems best to retain this behaviour:

  • having some callbacks be coroutines and others regular functions would be hard to remember and hence annoying
  • having all callbacks be coroutines would disallow common conventions like on_message=chat.append
  • asyncio.Queue.put_nowait can be used to feed a synchronous callback parameter to a waiting coroutine

Conclusion: we should retain this aspect of the current async API design.

The second open question is more complex: should the design of the async API be changed to use a dedicated event loop running in a background thread, or should we add a third API client variant that works that way? (tentative name for the latter: AsyncCompatClient, since it isn't the truly native async client, but rather one that emulates being a call-and-response API by communicating with a background thread instead of directly with the remote server)

The reason this question comes up is that many AI related network APIs are entirely call-and-response HTTP APIs. This means that they don't introduce any resource management considerations related to structured concurrency (which ensures correct exception handling in async applications by ensuring that all spawned tasks have terminated before a coroutine returns).

The native LM Studio network API is different: the reason it is able to offer features such as streaming of individual prediction fragments as they are generated is because the client creates and maintains persistent websocket connections to the LM Studio instance, and then sets up streaming channels over those multiplexed connections.

This difference in network communications is largely transparent to users of the synchronous API, but for the asynchronous API, it means that trying to create the client in one coroutine and destroy it in another (required for integration with a variety of other async AI client frameworks written on the assumption of call-and-response style network interfaces) may fail with a cryptic exception about cancel scopes.

While the extra code duplication is annoying, adding a third client type specifically for emulating a call-and-response async API seems like the best available option:

  • for async applications and libraries designed with structured concurrency in mind, the existing async API already works the way they will expect, so we don't really want to change that
  • while the sync API can be externally wrapped to emulate a call-and-response async API, actually doing so is quite clumsy, and adds redundant thread synchronisation overhead that a dedicated async call-and-response emulation API could avoid
  • the sync API already uses an async websocket in a background thread, so this would be an opportunity to improve the way those work (for example, the sync client currently creates a thread per websocket connection, but it should really be able to create a single thread that all the websockets share Edit: that specific example was fixed in 1.3.0 via Websocket refactoring: per-client threads #77 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions