-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Summary
When calling get_source() or similar methods that return a SourceResponse, the configuration field is incorrectly deserialized to SourceAirtable regardless of the actual source type. This prevents users from accessing source-specific configuration fields like bucket, streams, or globs for S3 sources.
Root Cause
The SourceConfiguration type is a Python Union of 500+ source configuration types. When deserializing JSON responses:
dataclasses_jsontries each type in the Union in order until one succeedsSourceAirtableappears early in the Union (position 8) and has all optional fields- The
@dataclass_json(undefined=Undefined.EXCLUDE)decorator ignores unknown fields - Since
SourceAirtablehas no required fields, it successfully deserializes ANY JSON payload
The OpenAPI spec uses oneOf without a discriminator field, so there is no way for the SDK to determine the correct type to deserialize to.
Reproduction
from airbyte_api import AirbyteAPI
from airbyte_api.models import Security
client = AirbyteAPI(security=Security(bearer_auth="..."))
# Get an S3 source
response = client.sources.get_source(source_id="1f5ca207-5c40-48d6-b9d1-6667de9fe427")
source = response.source_response
print(f"Source type: {source.source_type}") # "s3"
print(f"Config type: {type(source.configuration).__name__}") # "SourceAirtable" (WRONG!)
print(f"Has bucket? {hasattr(source.configuration, 'bucket')}") # False (WRONG!)Expected Behavior
The configuration field should be deserialized to SourceS3 when source_type is "s3".
Workaround
Users can access the raw configuration dict or manually deserialize to the correct type:
import json
from airbyte_api import utils
from airbyte_api.models import SourceS3
# Option 1: Access raw config dict
raw_config = response.raw_response.json()["configuration"]
print(raw_config["bucket"]) # Works
print(raw_config["streams"][0]["globs"]) # Works
# Option 2: Manually deserialize to correct type
config_json = json.dumps(response.raw_response.json()["configuration"])
s3_config = utils.unmarshal_json(config_json, SourceS3)
print(s3_config.bucket) # Works
print(s3_config.streams[0].globs) # WorksPotential Fix
A proper fix would require either:
- Adding a
discriminatorfield to the OpenAPI spec forSourceConfigurationusing thesourceTypeproperty - Modifying Speakeasy's generation to handle discriminated unions based on a sibling field
Since this is a generated SDK, any direct code changes would be overwritten on regeneration.
Context
This issue was reported by a customer trying to modify streams.globs at runtime from Airflow. Investigation requested by @iherdt-airbyte.
Related: Commit 87f7e7ba removed some discriminators from the OpenAPI spec in September 2024.