Skip to content

Conversation

@awalker4
Copy link
Collaborator

@awalker4 awalker4 commented May 21, 2025

The Issue

We discovered a behavior change in the Python SDK after we merged the platform/serverless api specs. All of a sudden, the SDK level server_url param silently stopped working, and we were forced to set custom urls per function.

# The passed url is ignored and we go to our default Serverless URL. 
# You suddenly get an invalid api key error if you expected to talk to, e.g, freemium
s = UnstructuredClient(server_url="my_own_url")
s.general.partition()

#  This does the right thing
s = UnstructuredClient()
s.general.partition(server_url="my_own_url")

We had to patch some generated code in order to keep backwards compatibility, and set the SDK level server_url the way we used to. This works! However, any file in .genignore will not get updated and eventually the SDK fails to generate because of drift. The better solution is to figure out why the generated code changed on us, and fix it "upstream".

The Fix

Our SDK points to two services - the workflow API at platform.unstructuredapp.io and the older partition endpoint at api.unstructuredapp.io. We merged these two openapi specs in order to generate a combined SDK, but this meant that urls could only be resolved per operation. There is no longer a global default, so a statement like UnstructuredClient(server_url="my_own_url") is ambiguous.

The solution to all this is to go back to one default server - the platform url. The partition url is just one endpoint so it's much easier to handle as a one off. This restores the server_url behavior we had, without us having to fight with the autogenerated code.

The Diff

This pr is huge because I regenerated the relevant files. There are only a few changes that drive all of it:

overlay_client.yaml

After merging the two openapi.yaml specs, remove all child servers blocks and just keep one global config. Now every endpoint is a part of platform.unstructuredapp.io

general.py

This is now the only custom patch. In the partition (and partition_async) call, we need to swap to the right url. We do this only if the user has not already changed the default.

destinations.py, jobs.py, etc

These are the other endpoint files that are no longer patched. After regenerating, you can see the base_url logic cleans itself up. Either the user passed a server_url in the call, or we fetch the globally configured url.

test_server_urls.py

Made some tweaks to these test cases. This locks in our compatibility and asserts that we always use the right url. Users can set a custom url at the SDK init, or at the operation. We need to cover this behavior within general.partition since this has the special logic. Otherwise, make sure both url approaches work for any of the other platform operations.

@awalker4 awalker4 force-pushed the fix/platform-server-urls branch from 113b02b to 67602d3 Compare June 2, 2025 20:44
@awalker4 awalker4 changed the title chore: Remove per-operation server lists from the api spec chore: Revert custom handling for multiple Unstructured base urls Jul 23, 2025
@awalker4 awalker4 marked this pull request as ready for review July 23, 2025 18:57
@awalker4 awalker4 merged commit b0a005f into main Jul 23, 2025
14 checks passed
@awalker4 awalker4 deleted the fix/platform-server-urls branch July 23, 2025 21:24
awalker4 added a commit that referenced this pull request Jul 25, 2025
We have a generate failure because we've pulled `general.py` out for
some custom changes. (See
#270)
This will hit the occasional bump as new code is added to this file
upstream. The steps for fixing this:

- Remove `general.py` from .genignore
- Run speakeasy generate
- Stage and commit the autogenerated changes, working around our custom
code
- Commit without the `.genignore` change

I set `gen.yaml` to version 0.41.0. This will cause the next generate
job to propagate the new version and publish it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants