Skip to content

[ISSUE] Inconsistent Exception Handling Requires Control Flow Anti-patterns #1117

@msetsma

Description

@msetsma

Description

The SDK has two major exception handling issues that violate standard API design principles:

  1. VectorSearchClient uses generic exceptions: When resources don't exist, VectorSearchClient raises a plain Exception with raw JSON error messages instead of proper SDK exceptions like NotFound. This is inconsistent with WorkspaceClient and requires parsing exception strings to figure out what went wrong.

  2. Exceptions used for normal control flow: All get() methods throw exceptions for non existent resources. This is a well-known anti-pattern in API design. Checking if a resource exists is not an exceptional circumstance it's a normal operation that happens constantly in normal usage. The SDK forces every existence check into a try/catch block, which is inefficient and makes code harder to read.

This goes against industry standard practices (look at Python's dict.get(), or pathlib.Path.exists()) where normal "not found" cases are handled through return values, not exceptions. Exceptions should signal actual errors like network failures, permission denials, bad requests... not expected/normal states.

Reproduction

from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import NotFound, ResourceDoesNotExist
from databricks.vector_search.client import VectorSearchClient

w = WorkspaceClient()
vs_client = VectorSearchClient(
    workspace_url=w.config.host, 
    personal_access_token=w.config.token,
    disable_notice=True
)

# Issue 1: VectorSearchClient raises generic Exception
try:
    vs_client.get_index(
        endpoint_name="dummy", 
        index_name="nonexistent.catalog.index"
    )
except Exception as e:
    print(type(e).__name__)  # Just "Exception"
    print(str(e))  # Raw JSON: b'{"error_code":"RESOURCE_DOES_NOT_EXIST"...'

# Compare to WorkspaceClient (correct behavior)
try:
    w.vector_search_endpoints.get_endpoint("nonexistent")
except NotFound:  # Proper exception type
    print("This works correctly")

# Issue 2: Anti-pattern - exceptions for control flow
# This shouldn't require a try/catch at all
def table_exists(table_name: str) -> bool:
    try:
        table = w.online_tables.get(name=table_name)
        return table is not None
    except (NotFound, ResourceDoesNotExist):
        return False

I ran tests across Unity Catalog, Online Tables, and Vector Search:

  • Unity Catalog: Consistent, uses NotFound
  • Online Tables: Consistent, uses NotFound
  • Vector Search: Inconsistent
    • WorkspaceClient endpoints: Uses NotFound correctly
    • VectorSearchClient indexes: Raises generic Exception with raw JSON like:
      Response content b'{"error_code":"RESOURCE_DOES_NOT_EXIST","message":"..."
      

Expected behavior

The SDK should follow industry-standard API design patterns:

Fix 1: VectorSearchClient must use proper exceptions

try:
    vs_client.get_index(endpoint_name="dummy", index_name="nonexistent")
except NotFound:  # Should be NotFound, not Exception
    pass

Fix 2: Stop using exceptions for control flow (critical)

Every major Python library handles "not found" without exceptions:

  • dict.get(key) returns None, doesn't throw KeyError
  • Path.exists() returns bool, doesn't throw
  • os.path.exists() returns bool, doesn't throw
  • requests.get() only throws on network errors, not 404s

The Databricks SDK should do the same:

# Industry standard approach - option 1
if w.online_tables.exists(table_name):
    table = w.online_tables.get(table_name)

# Industry standard approach - option 2  
table = w.online_tables.get(table_name)
if table is None:
    # doesn't exist... handle gracefully

# Reserve exceptions for ACTUAL errors
try:
    table = w.online_tables.get(table_name)
except PermissionDenied:  # Real error
    pass
except InvalidParameterValue:  # Real error
    pass

Why this matters:

  1. Performance: Exceptions have significant overhead compared to simple None checks. In loops or batch operations, this adds up.

  2. Code readability: Using exceptions for control flow is widely considered an anti-pattern. It makes code harder to understand and maintain.

  3. Expected behavior: When a method is called get(), developers expect it to either return the resource or None, not throw an exception. The current behavior violates the principle of least surprise.

  4. Standard practice: Look at any well-designed Python API - they don't throw exceptions for "resource doesn't exist." That's not an error, it's just a state.

Is it a regression?

No, this appears to have been the design from the beginning.

Debug Logs

Validated that the current workaround actually works:

def index_exists(endpoint_name: str, index_name: str) -> bool:
    try:
        vs_client.get_index(endpoint_name=endpoint_name, index_name=index_name)
        return True
    except (NotFound, ResourceDoesNotExist):
        return False
    except Exception as e:
        if "RESOURCE_DOES_NOT_EXIST" in str(e) or "does not exist" in str(e):
            return False
        raise

# Test results:
# - Returns False for non-existent resources
# - Real errors (like AssertionError for invalid params) still propagate

Other Information

  • Environment: Databricks Serverless Notebook (Azure)
  • Python: 3.10.12

Additional context

This design forces a lot of boilerplate throughout our codebase that wouldn't exist with proper API design:

# VectorSearchManager - string parsing workaround
def index_exists(self, index_name: str) -> bool:
    try:
        self.vs_client.get_index(endpoint_name=self.endpoint_name, index_name=index_name)
        return True
    except (NotFound, ResourceDoesNotExist):
        return False
    except Exception as e:  # Forced to catch generic Exception
        # String parsing because VectorSearchClient doesn't use proper exceptions
        if "RESOURCE_DOES_NOT_EXIST" in str(e) or "does not exist" in str(e):
            return False
        raise

# OnlineTableManager - try/catch everywhere
def table_exists(self, table_name: str) -> bool:
    try:
        table = self.ws_client.online_tables.get(name=table_name)
        return table is not None
    except (NotFound, ResourceDoesNotExist):
        return False

def delete_table(self, table_name: str) -> None:
    try:
        self.ws_client.online_tables.delete(name=table_name)
    except (NotFound, ResourceDoesNotExist):
        pass  # Already gone

Every single resource manager class in our code has these same patterns repeated. This is a lot of unnecessary defensive coding that wouldn't be needed if the SDK followed standard API design practices.

Recommended changes:

  1. Immediate: VectorSearchClient should use NotFound like the rest of the SDK
  2. Short-term: Add exists() methods to all resource managers
  3. Long-term: Have get() return None for missing resources (align with Python conventions)

Good examples and SDKs to reference

  • boto3 (AWS)
  • stripe-python
  • Python's standard library
dict.get(key)           # Returns None if missing
dict[key]               # Throws KeyError if missing
Path.exists()           # Returns bool
  • Industry-standard SDKs like Google Cloud's Python library provides exists() methods and lookup_*() methods that return None for missing resources.

I understand the SDK is auto-generated from OpenAPI specs, so this likely requires changes at the spec or generator level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions