Skip to content

Commit 62ba126

Browse files
feat: Add InferredSchemaLoader for runtime schema inference
Add a new InferredSchemaLoader component that infers JSON schemas by reading a sample of records from the stream at discover time. This enables declarative connectors to automatically generate schemas for streams where the schema is not known in advance or changes dynamically. Key features: - Reads up to record_sample_size records (default 100) from the stream - Uses SchemaInferrer to generate JSON schema from sample records - Handles errors gracefully by returning empty schema - Fully integrated with declarative component schema and model factory - Includes comprehensive unit tests Requested by: AJ Steers ([email protected]) @aaronsteers Co-Authored-By: AJ Steers <[email protected]>
1 parent 6504148 commit 62ba126

File tree

6 files changed

+516
-60
lines changed

6 files changed

+516
-60
lines changed

airbyte_cdk/sources/declarative/declarative_component_schema.yaml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1548,13 +1548,15 @@ definitions:
15481548
loaders defined first taking precedence in the event of a conflict.
15491549
anyOf:
15501550
- "$ref": "#/definitions/InlineSchemaLoader"
1551+
- "$ref": "#/definitions/InferredSchemaLoader"
15511552
- "$ref": "#/definitions/DynamicSchemaLoader"
15521553
- "$ref": "#/definitions/JsonFileSchemaLoader"
15531554
- title: Multiple Schema Loaders
15541555
type: array
15551556
items:
15561557
anyOf:
15571558
- "$ref": "#/definitions/InlineSchemaLoader"
1559+
- "$ref": "#/definitions/InferredSchemaLoader"
15581560
- "$ref": "#/definitions/DynamicSchemaLoader"
15591561
- "$ref": "#/definitions/JsonFileSchemaLoader"
15601562
- "$ref": "#/definitions/CustomSchemaLoader"
@@ -2462,6 +2464,40 @@ definitions:
24622464
$parameters:
24632465
type: object
24642466
additionalProperties: true
2467+
InferredSchemaLoader:
2468+
title: Inferred Schema Loader
2469+
description: Infers a JSON Schema by reading a sample of records from the stream at discover time. This is useful for streams where the schema is not known in advance or changes dynamically.
2470+
type: object
2471+
required:
2472+
- type
2473+
- retriever
2474+
properties:
2475+
type:
2476+
type: string
2477+
enum: [InferredSchemaLoader]
2478+
retriever:
2479+
title: Retriever
2480+
description: Component used to coordinate how records are extracted across stream slices and request pages.
2481+
anyOf:
2482+
- "$ref": "#/definitions/SimpleRetriever"
2483+
- "$ref": "#/definitions/AsyncRetriever"
2484+
- "$ref": "#/definitions/CustomRetriever"
2485+
record_sample_size:
2486+
title: Record Sample Size
2487+
description: The maximum number of records to read for schema inference. Defaults to 100.
2488+
type: integer
2489+
default: 100
2490+
example:
2491+
- 100
2492+
- 500
2493+
- 1000
2494+
stream_name:
2495+
title: Stream Name
2496+
description: The name of the stream for which to infer the schema. If not provided, it will be inferred from the stream context.
2497+
type: string
2498+
$parameters:
2499+
type: object
2500+
additionalProperties: true
24652501
InlineSchemaLoader:
24662502
title: Inline Schema Loader
24672503
description: Loads a schema that is defined directly in the manifest file.

0 commit comments

Comments
 (0)