chore: Revert custom handling for multiple Unstructured base urls #270
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Issue
We discovered a behavior change in the Python SDK after we merged the platform/serverless api specs. All of a sudden, the SDK level server_url param silently stopped working, and we were forced to set custom urls per function.
We had to patch some generated code in order to keep backwards compatibility, and set the SDK level
server_urlthe way we used to. This works! However, any file in.genignorewill not get updated and eventually the SDK fails to generate because of drift. The better solution is to figure out why the generated code changed on us, and fix it "upstream".The Fix
Our SDK points to two services - the workflow API at
platform.unstructuredapp.ioand the older partition endpoint atapi.unstructuredapp.io. We merged these two openapi specs in order to generate a combined SDK, but this meant that urls could only be resolved per operation. There is no longer a global default, so a statement likeUnstructuredClient(server_url="my_own_url")is ambiguous.The solution to all this is to go back to one default server - the platform url. The partition url is just one endpoint so it's much easier to handle as a one off. This restores the
server_urlbehavior we had, without us having to fight with the autogenerated code.The Diff
This pr is huge because I regenerated the relevant files. There are only a few changes that drive all of it:
overlay_client.yamlAfter merging the two
openapi.yamlspecs, remove all childserversblocks and just keep one global config. Now every endpoint is a part ofplatform.unstructuredapp.iogeneral.pyThis is now the only custom patch. In the
partition(andpartition_async) call, we need to swap to the right url. We do this only if the user has not already changed the default.destinations.py,jobs.py, etcThese are the other endpoint files that are no longer patched. After regenerating, you can see the
base_urllogic cleans itself up. Either the user passed aserver_urlin the call, or we fetch the globally configured url.test_server_urls.pyMade some tweaks to these test cases. This locks in our compatibility and asserts that we always use the right url. Users can set a custom url at the SDK init, or at the operation. We need to cover this behavior within
general.partitionsince this has the special logic. Otherwise, make sure both url approaches work for any of the other platform operations.