-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Description
The SDK has two major exception handling issues that violate standard API design principles:
-
VectorSearchClient uses generic exceptions: When resources don't exist,
VectorSearchClientraises a plainExceptionwith raw JSON error messages instead of proper SDK exceptions likeNotFound. This is inconsistent withWorkspaceClientand requires parsing exception strings to figure out what went wrong. -
Exceptions used for normal control flow: All
get()methods throw exceptions for non existent resources. This is a well-known anti-pattern in API design. Checking if a resource exists is not an exceptional circumstance it's a normal operation that happens constantly in normal usage. The SDK forces every existence check into a try/catch block, which is inefficient and makes code harder to read.
This goes against industry standard practices (look at Python's dict.get(), or pathlib.Path.exists()) where normal "not found" cases are handled through return values, not exceptions. Exceptions should signal actual errors like network failures, permission denials, bad requests... not expected/normal states.
Reproduction
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import NotFound, ResourceDoesNotExist
from databricks.vector_search.client import VectorSearchClient
w = WorkspaceClient()
vs_client = VectorSearchClient(
workspace_url=w.config.host,
personal_access_token=w.config.token,
disable_notice=True
)
# Issue 1: VectorSearchClient raises generic Exception
try:
vs_client.get_index(
endpoint_name="dummy",
index_name="nonexistent.catalog.index"
)
except Exception as e:
print(type(e).__name__) # Just "Exception"
print(str(e)) # Raw JSON: b'{"error_code":"RESOURCE_DOES_NOT_EXIST"...'
# Compare to WorkspaceClient (correct behavior)
try:
w.vector_search_endpoints.get_endpoint("nonexistent")
except NotFound: # Proper exception type
print("This works correctly")
# Issue 2: Anti-pattern - exceptions for control flow
# This shouldn't require a try/catch at all
def table_exists(table_name: str) -> bool:
try:
table = w.online_tables.get(name=table_name)
return table is not None
except (NotFound, ResourceDoesNotExist):
return FalseI ran tests across Unity Catalog, Online Tables, and Vector Search:
- Unity Catalog: Consistent, uses
NotFound - Online Tables: Consistent, uses
NotFound - Vector Search: Inconsistent
- WorkspaceClient endpoints: Uses
NotFoundcorrectly - VectorSearchClient indexes: Raises generic
Exceptionwith raw JSON like:Response content b'{"error_code":"RESOURCE_DOES_NOT_EXIST","message":"..."
- WorkspaceClient endpoints: Uses
Expected behavior
The SDK should follow industry-standard API design patterns:
Fix 1: VectorSearchClient must use proper exceptions
try:
vs_client.get_index(endpoint_name="dummy", index_name="nonexistent")
except NotFound: # Should be NotFound, not Exception
passFix 2: Stop using exceptions for control flow (critical)
Every major Python library handles "not found" without exceptions:
dict.get(key)returnsNone, doesn't throw KeyErrorPath.exists()returnsbool, doesn't throwos.path.exists()returnsbool, doesn't throwrequests.get()only throws on network errors, not 404s
The Databricks SDK should do the same:
# Industry standard approach - option 1
if w.online_tables.exists(table_name):
table = w.online_tables.get(table_name)
# Industry standard approach - option 2
table = w.online_tables.get(table_name)
if table is None:
# doesn't exist... handle gracefully
# Reserve exceptions for ACTUAL errors
try:
table = w.online_tables.get(table_name)
except PermissionDenied: # Real error
pass
except InvalidParameterValue: # Real error
passWhy this matters:
-
Performance: Exceptions have significant overhead compared to simple None checks. In loops or batch operations, this adds up.
-
Code readability: Using exceptions for control flow is widely considered an anti-pattern. It makes code harder to understand and maintain.
-
Expected behavior: When a method is called
get(), developers expect it to either return the resource orNone, not throw an exception. The current behavior violates the principle of least surprise. -
Standard practice: Look at any well-designed Python API - they don't throw exceptions for "resource doesn't exist." That's not an error, it's just a state.
Is it a regression?
No, this appears to have been the design from the beginning.
Debug Logs
Validated that the current workaround actually works:
def index_exists(endpoint_name: str, index_name: str) -> bool:
try:
vs_client.get_index(endpoint_name=endpoint_name, index_name=index_name)
return True
except (NotFound, ResourceDoesNotExist):
return False
except Exception as e:
if "RESOURCE_DOES_NOT_EXIST" in str(e) or "does not exist" in str(e):
return False
raise
# Test results:
# - Returns False for non-existent resources
# - Real errors (like AssertionError for invalid params) still propagateOther Information
- Environment: Databricks Serverless Notebook (Azure)
- Python: 3.10.12
Additional context
This design forces a lot of boilerplate throughout our codebase that wouldn't exist with proper API design:
# VectorSearchManager - string parsing workaround
def index_exists(self, index_name: str) -> bool:
try:
self.vs_client.get_index(endpoint_name=self.endpoint_name, index_name=index_name)
return True
except (NotFound, ResourceDoesNotExist):
return False
except Exception as e: # Forced to catch generic Exception
# String parsing because VectorSearchClient doesn't use proper exceptions
if "RESOURCE_DOES_NOT_EXIST" in str(e) or "does not exist" in str(e):
return False
raise
# OnlineTableManager - try/catch everywhere
def table_exists(self, table_name: str) -> bool:
try:
table = self.ws_client.online_tables.get(name=table_name)
return table is not None
except (NotFound, ResourceDoesNotExist):
return False
def delete_table(self, table_name: str) -> None:
try:
self.ws_client.online_tables.delete(name=table_name)
except (NotFound, ResourceDoesNotExist):
pass # Already goneEvery single resource manager class in our code has these same patterns repeated. This is a lot of unnecessary defensive coding that wouldn't be needed if the SDK followed standard API design practices.
Recommended changes:
- Immediate: VectorSearchClient should use
NotFoundlike the rest of the SDK - Short-term: Add
exists()methods to all resource managers - Long-term: Have
get()returnNonefor missing resources (align with Python conventions)
Good examples and SDKs to reference
- boto3 (AWS)
- stripe-python
- Python's standard library
dict.get(key) # Returns None if missing
dict[key] # Throws KeyError if missing
Path.exists() # Returns bool- Industry-standard SDKs like Google Cloud's Python library provides
exists()methods andlookup_*()methods that returnNonefor missing resources.
I understand the SDK is auto-generated from OpenAPI specs, so this likely requires changes at the spec or generator level.