Skip to content

feat(source-datagen): add wide schema flavor and fix bugs#75542

Draft
sophiecuiy wants to merge 2 commits intomasterfrom
sophie/datagen-wide-schema-flavor
Draft

feat(source-datagen): add wide schema flavor and fix bugs#75542
sophiecuiy wants to merge 2 commits intomasterfrom
sophie/datagen-wide-schema-flavor

Conversation

@sophiecuiy
Copy link
Copy Markdown
Contributor

Summary

  • New "wide" flavor: Adds a configurable wide schema stream that generates 1–1000 columns (default 50) cycling through all 12 Airbyte data types (integer, string, boolean, number, big_integer, big_decimal, date, time_with_tz, time_without_tz, timestamp_with_tz, timestamp_without_tz, json). Column 0 is always id (primary key).
  • Fix unsafe !! assertions: Replaced two !! non-null assertions in DataGenPartitionReader with safe alternatives — mapNotNull for resource filtering and an ?: throw IllegalStateException with a clear error message for the record acceptor lookup.
  • Cache codec references: Extracted repeated as codec casts in TypesDataGenerator to class-level properties, reducing per-record overhead.

Test plan

  • Verify the connector builds (note: DataGenStreamState.kt has a pre-existing compilation error on master due to unresolved OpaqueStateValue — unrelated to this PR)
  • Configure with {"flavor": {"data_type": "wide", "column_count": 20}, "max_records": 10} and verify 20 columns appear with correct type cycling
  • Verify existing increment and types flavors still work as expected
  • Test edge cases: column_count=1 (id only), column_count=1000

🤖 Generated with Claude Code

… count

Add a new "wide" flavor that generates a configurable number of columns
(1-1000, default 50) cycling through all 12 Airbyte data types. Also fix
unsafe !! null assertions in DataGenPartitionReader and cache codec
references in TypesDataGenerator to reduce per-record overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@octavia-bot
Copy link
Copy Markdown
Contributor

octavia-bot bot commented Mar 27, 2026

Note

📝 PR Converted to Draft

More info...

Thank you for creating this PR. As a policy to protect our engineers' time, Airbyte requires all PRs to be created first in draft status. Your PR has been automatically converted to draft status in respect for this policy.

As soon as your PR is ready for formal review, you can proceed to convert the PR to "ready for review" status by clicking the "Ready for review" button at the bottom of the PR page.

To skip draft status in future PRs, please include [ready] in your PR title or add the skip-draft-status label when creating your PR.

@octavia-bot octavia-bot bot marked this pull request as draft March 27, 2026 21:11
@github-actions
Copy link
Copy Markdown
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 27, 2026

source-datagen Connector Test Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 71b21df.

♻️ This comment has been updated with latest results.

…ndency

Published CDK artifacts in the 0.x line are missing dependency metadata,
so bulk-cdk-core-base (containing AirbyteSourceRunner, ConfigErrorException,
OpaqueStateValue, etc.) was not resolved transitively via core-extract.
Version 1.0.1 includes proper Gradle module metadata that declares
core-base as a transitive dependency. Also fix field.name -> field.id
in WideDataGenerator to match the Field data class API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Note

Detected that there are differences in the Gradle dependencies.

@sophiecuiy
Copy link
Copy Markdown
Contributor Author

sophiecuiy commented Mar 27, 2026

/format-fix

Format-fix job started... Check job output.

🟦 Job completed successfully (no changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants