Skip to content

SourceConfiguration Union type deserializes to wrong type (e.g., SourceAirtable for S3 sources)Β #135

@devin-ai-integration

Description

@devin-ai-integration

Summary

When calling get_source() or similar methods that return a SourceResponse, the configuration field is incorrectly deserialized to SourceAirtable regardless of the actual source type. This prevents users from accessing source-specific configuration fields like bucket, streams, or globs for S3 sources.

Root Cause

The SourceConfiguration type is a Python Union of 500+ source configuration types. When deserializing JSON responses:

  1. dataclasses_json tries each type in the Union in order until one succeeds
  2. SourceAirtable appears early in the Union (position 8) and has all optional fields
  3. The @dataclass_json(undefined=Undefined.EXCLUDE) decorator ignores unknown fields
  4. Since SourceAirtable has no required fields, it successfully deserializes ANY JSON payload

The OpenAPI spec uses oneOf without a discriminator field, so there is no way for the SDK to determine the correct type to deserialize to.

Reproduction

from airbyte_api import AirbyteAPI
from airbyte_api.models import Security

client = AirbyteAPI(security=Security(bearer_auth="..."))

# Get an S3 source
response = client.sources.get_source(source_id="1f5ca207-5c40-48d6-b9d1-6667de9fe427")
source = response.source_response

print(f"Source type: {source.source_type}")  # "s3"
print(f"Config type: {type(source.configuration).__name__}")  # "SourceAirtable" (WRONG!)
print(f"Has bucket? {hasattr(source.configuration, 'bucket')}")  # False (WRONG!)

Expected Behavior

The configuration field should be deserialized to SourceS3 when source_type is "s3".

Workaround

Users can access the raw configuration dict or manually deserialize to the correct type:

import json
from airbyte_api import utils
from airbyte_api.models import SourceS3

# Option 1: Access raw config dict
raw_config = response.raw_response.json()["configuration"]
print(raw_config["bucket"])  # Works
print(raw_config["streams"][0]["globs"])  # Works

# Option 2: Manually deserialize to correct type
config_json = json.dumps(response.raw_response.json()["configuration"])
s3_config = utils.unmarshal_json(config_json, SourceS3)
print(s3_config.bucket)  # Works
print(s3_config.streams[0].globs)  # Works

Potential Fix

A proper fix would require either:

  1. Adding a discriminator field to the OpenAPI spec for SourceConfiguration using the sourceType property
  2. Modifying Speakeasy's generation to handle discriminated unions based on a sibling field

Since this is a generated SDK, any direct code changes would be overwritten on regeneration.

Context

This issue was reported by a customer trying to modify streams.globs at runtime from Airflow. Investigation requested by @iherdt-airbyte.

Related: Commit 87f7e7ba removed some discriminators from the OpenAPI spec in September 2024.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions