feat: add phase prompt tools, fix pydantic validation, add output models #159

pnilan · 2025-10-28T20:37:15Z

What

Adds output Pydantic models for the agent tool definitions
Fixes DeclarativeSource model parsing
Adds retry loop on agent.run for 4xx/5xx model errors
Adds primary key to connector readiness report
Adds new phase1, phase2, and phase3 tools which the manager agent is directed to call after completion of the prior phase. These return the respective phase prompts.
Closes Add primary keys check into the Readiness Report tool and prompt as needed to get it working #93

Summary by CodeRabbit

New Features
- Structured manager→developer phase workflow with explicit phase-start actions and richer progress/status reporting; per-stream primary key surfaced in readiness reports.
Bug Fixes
- Automatic retry for transient model/HTTP errors for more reliable runs.
- Improved manifest preprocessing and reference resolution for more robust validation and clearer failures.
Documentation
- Expanded phase guidance, objectives, success criteria, and added pagination/connector doc references.
Tests
- Updated tests to use referenced manifest components and structured progress outputs.

github-actions · 2025-10-28T20:37:30Z

👋 Welcome to the Airbyte Connector Builder MCP!

Thank you for your contribution! Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pnilan/agents/re-add-output-models", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pnilan/agents/re-add-output-models#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/build-connector - Builds the default connector on-demand using the AI builder
/build-connector prompt="<your prompt>" - Builds a connector on-demand using the AI builder
/poe <command> - Runs any poe command in the uv virtual environment

AI Builder Evaluations

AI builder evaluations run automatically under the following conditions:

When a PR is marked as "ready for review"
When a PR is reopened

A set of standardized evaluations also run on a schedule (Mon/Wed/Fri at midnight UTC) and can be manually triggered via workflow dispatch.

Helpful Resources

If you have any questions, feel free to ask in the PR comments or join our Slack community.

📝 Edit this welcome message.

coderabbitai · 2025-10-28T20:37:40Z

📝 Walkthrough

Walkthrough

Reworks manager–developer build flow into explicit Phase 1/2/3 prompts and tools, adds structured handoff Pydantic models and manager phase-start tools, introduces manifest preprocessing and transformers in evaluators, surfaces primary_key through smoke tests and reports, and adds retry-on-ModelHTTPError logic to the manager run loop.

Changes

Cohort / File(s)	Summary
Prompts `connector_builder_agents/prompts/phase-1-stream-read.md`, `connector_builder_agents/prompts/phase-2-pagination.md`, `connector_builder_agents/prompts/phase-3-remaining-streams.md`, `connector_builder_agents/prompts/root-prompt.md`, `connector_builder_agents/prompts/root-prompt-headless.md`	Editorial updates to phase wording and flow, added objectives (primary key, API path), new process steps, added MCP tool references, and minor wording/formatting fixes.
Agent workflow & models `connector_builder_agents/src/agents.py`	Added Pydantic models `DelegatedDeveloperTask`, `ManagerHandoffInput`, `ManagerTaskOutput`; set explicit agent output types; refactored delegation to use structured inputs/outputs; changed delegate return types; added `start_phase_2`/`start_phase_3` manager tools; removed legacy single-phase entry and adjusted logging/typing.
Prompt constants & guidance `connector_builder_agents/src/constants.py`, `connector_builder_agents/src/guidance.py`	Added PHASE_1/PHASE_2/PHASE_3 prompt file path constants; rewrote manager prompt template to explicit phases, progress checks, and exit criteria.
Evaluator refactor & helpers `connector_builder_agents/src/evals/evaluators.py`, `connector_builder_agents/src/evals/helpers.py`, `connector_builder_agents/src/evals/task.py`	Introduced manifest preprocessing pipeline (ManifestReferenceResolver, ManifestComponentTransformer) before DeclarativeSource parsing; centralized ConnectorBuilderEvalTaskOutput creation via `create_connector_builder_eval_task_output`; changed `readiness_eval` return type to int; updated parsing/error handling.
Runtime / tooling `connector_builder_agents/src/run.py`, `connector_builder_agents/src/tools.py`	Added retry loop around manager_agent.run to handle ModelHTTPError with capped attempts; removed `tool_prefix="filesystem"` from MCP filesystem server creation; docstring tweaks.
Manifest validation & testing `connector_builder_mcp/_util.py`, `connector_builder_mcp/validation_testing.py`	Preprocess manifests with `ManifestReferenceResolver` during validation; added `primary_key` field to `StreamSmokeTest`, extract/propagate per-stream primary_key into readiness reports and per-stream outputs.
MCP docs & scaffold `connector_builder_mcp/connector_builder.py`, `connector_builder_mcp/manifest_scaffold.py`	Minor docstring/content edits (header formatting, paginator example, scaffold note).
Tests `connector_builder_agents/tests/test_evaluators.py`, `connector_builder_agents/tests/test_task.py`	Tests updated to use `$ref` definitions for authenticator/paginator and to expect structured `ManagerTaskOutput` as `final_output`; adjusted mocks and assertions accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Manager
    participant Developer
    participant Tools
    participant Evaluators

    Note over Manager: Phase 1 (manifest scaffold & initial stream)
    Manager->>Developer: delegate DelegatedDeveloperTask (assignment_description)
    Developer->>Tools: create/describe manifest stream (primary_key, API path)
    Developer-->>Manager: return ManagerHandoffInput (progress)

    Note over Manager: Phase 2 (pagination)
    Manager->>Developer: start_phase_2 / delegate pagination task
    Developer->>Tools: implement/test pagination
    Developer-->>Manager: return ManagerHandoffInput (progress)

    Note over Manager: Phase 3 (remaining streams & validation)
    Manager->>Developer: start_phase_3 / delegate remaining streams
    Developer->>Tools: add streams, run smoke tests (primary_key)
    Developer-->>Manager: return ManagerHandoffInput (final)

    Manager->>Evaluators: run evaluation (manifest)
    Evaluators->>Evaluators: preprocess manifest ($ref resolve, transform)
    Evaluators->>Evaluators: parse DeclarativeSource & validate
    Evaluators-->>Manager: return ManagerTaskOutput (phase flags, status)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas needing extra attention:
- connector_builder_agents/src/agents.py — new Pydantic models, changed delegation signatures, manager tool entrypoints, typing/return adjustments.
- connector_builder_agents/src/evals/evaluators.py & connector_builder_agents/src/evals/helpers.py — manifest preprocessing/transformer usage and factory-based output creation.
- connector_builder_mcp/validation_testing.py — primary_key extraction and per-stream propagation/reporting.
- Tests — ensure mocks/fixtures and assertions align with structured ManagerTaskOutput and new helper behavior.

Possibly related PRs

feat(agents): Replace openai-agents with pydantic-ai implementation #139 — Overlaps agents.py changes and handoff/delegation flow; potential conflicts in delegation design.
feat: add primary_key eval #156 — Related evaluator pipeline refactors and task output/manifest parsing changes.
feat: add internal monologue and smaller steps #113 — Also modifies agents.py by introducing delegation models and changing manager→developer handoff logic.

Suggested reviewers

pedroslopez
aaronsteers

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title accurately reflects the three primary architectural changes in the pull request: (1) adding new phase prompt tools (start_phase_2, start_phase_3 structured as manager tools), (2) fixing Pydantic validation through ManifestReferenceResolver preprocessing and output model restructuring, and (3) adding new output models (DelegatedDeveloperTask, ManagerHandoffInput, ManagerTaskOutput). These three items directly correspond to the stated PR objectives and represent the main changes across the codebase. The title is specific and avoids vague terminology.
Docstring Coverage	✅ Passed	Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pnilan/agents/re-add-output-models

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-10-28T20:39:05Z

PyTest Results (Fast)

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
0 files ±0 0 ❌ ±0

Results for commit 7b7fffc. ± Comparison against base commit b2164c2.

♻️ This comment has been updated with latest results.

github-actions · 2025-10-28T20:39:18Z

PyTest Results (Full)

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
0 files ±0 0 ❌ ±0

Results for commit 7b7fffc. ± Comparison against base commit b2164c2.

♻️ This comment has been updated with latest results.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

connector_builder_agents/src/evals/task.py (1)

45-46: Empty results should not count as success.

Use truthiness to ensure [] is treated as failure.
-    success = build_result is not None
+    success = bool(build_result)

connector_builder_agents/src/evals/evaluators.py (2)

62-66: Guard against non-mapping YAML results.

safe_load can return None/list/str. Add a mapping check before preprocessing.

     try:
         manifest_dict = yaml.safe_load(manifest)
     except yaml.YAMLError as e:
         logger.warning(f"Failed to parse manifest YAML: {e}")
         return 0
+    if not isinstance(manifest_dict, dict):
+        logger.warning("Manifest YAML did not parse to a mapping/object")
+        return 0

128-131: Add YAML mapping guard in streams_eval.

     try:
         manifest_dict = yaml.safe_load(manifest_str)
     except yaml.YAMLError as e:
         logger.warning(f"Failed to parse manifest YAML: {e}")
         return 0.0
+    if not isinstance(manifest_dict, dict):
+        logger.warning("Manifest YAML did not parse to a mapping/object")
+        return 0.0

connector_builder_agents/src/agents.py (1)

121-155: Return type mismatch: tool returns ManagerHandoffInput but annotation says DelegatedDeveloperTask.

This breaks type contracts and tool schema. Update the return type and docstring.

-    async def delegate_to_developer(
+    async def delegate_to_developer(
         ctx: RunContext[SessionState],
         delegated_developer_task: DelegatedDeveloperTask,
-    ) -> DelegatedDeveloperTask:
-        """Delegate work to the developer agent.
-
-        Args:
-            assignment_title: Short title or key for this developer assignment.
-            assignment_description: Detailed description of the task assigned to the developer,
-                including all context and success criteria they need to complete it.
-        """
+    ) -> ManagerHandoffInput:
+        """Delegate work to the developer agent.
+
+        Args:
+            delegated_developer_task: Task envelope including title and description.
+
+        Returns:
+            ManagerHandoffInput: Developer’s handoff back to the manager.
+        """
@@
-        return result.output
+        return result.output

🧹 Nitpick comments (14)

connector_builder_agents/src/run.py (1)

219-226: Consider adding delay between retries.

The retry logic immediately retries on ModelHTTPError without any backoff delay. Consider adding an exponential backoff or at least a brief delay to avoid overwhelming the service during temporary outages.

Example with exponential backoff:
except ModelHTTPError as e:
    retry_count += 1
    if retry_count > max_retries:
        update_progress_log(f"\n❌ Max retries reached: {max_retries}", session_state)
        raise e from e
    else:
        delay = 2 ** retry_count  # Exponential backoff: 2, 4, 8 seconds
        update_progress_log(
            f"\n⚠️ Caught retryable error: {e}. Retrying in {delay}s (attempt {retry_count}/{max_retries})",
            session_state
        )
        await asyncio.sleep(delay)
    continue
Note: You'll need to import asyncio at the top of the file.

connector_builder_agents/prompts/phase-2-pagination.md (1)

32-32: Typo: remove “the”.

Small grammar fix.

-Once pagination is working correctly, the you will begin Phase 3 to add all remaining streams.
+Once pagination is working correctly, you will begin Phase 3 to add all remaining streams.

connector_builder_mcp/validation_testing.py (1)

354-359: Docstring and flags don’t match behavior (records/raw responses always None).

The function advertises returning record and raw response data and has include_* flags, but the returned StreamTestResult forcibly sets records=None and raw_api_responses=None.

If intentionally removing heavy payloads: update the docstring and parameter descriptions to note that payloads are suppressed, and consider deprecating include_records_data/include_raw_responses_data (no-ops).

Alternatively, honor the flags and return data when requested (perhaps behind a strict size cap).

Proposed minimal doc fix:
-    Return record data and/or raw request/response metadata from the stream test.
-    We attempt to automatically sanitize raw data to prevent exposure of secrets.
-    We do not attempt to sanitize record data, as it is expected to be user-defined.
+    Note: This function does not return record payloads or raw HTTP data in the result to reduce memory/telemetry size.
+    The `include_records_data` and `include_raw_responses_data` parameters are currently no-ops.
Also applies to: 462-475

connector_builder_agents/src/evals/task.py (4)

7-11: Unify imports to avoid dual module paths.

Import everything from .helpers to prevent duplicate module identities and keep style consistent.
-from connector_builder_agents.src.evals.helpers import ConnectorBuilderEvalTaskOutput
-...
-from .helpers import create_connector_builder_eval_task_output, get_artifact
+from .helpers import (
+    ConnectorBuilderEvalTaskOutput,
+    create_connector_builder_eval_task_output,
+    get_artifact,
+)
24-24: Sanitize session_id to avoid unsafe path chars.

connector_name can carry spaces/slashes. Slug it before composing session_id.
+import re
 ...
-    session_id = f"eval-{connector_name}-{int(time.time())}"
+    safe_name = re.sub(r"[^A-Za-z0-9_.-]+", "-", connector_name)
+    session_id = f"eval-{safe_name}-{int(time.time())}"
54-54: Log message contradicts success flag.

Remove “successfully” or include success in the message.
-    logger.info(f"Task completed successfully for connector '{connector_name}'")
+    logger.info(f"Task completed for connector '{connector_name}' (success={success})")
56-67: Artifact/output block LGTM; minor ergonomics.

Looks good. Optional: include execution log as an artifact for debugging parity with SessionState.
         "artifacts": {
             "readiness_report": readiness_report_content,
             "manifest": manifest_content,
+            "execution_log": get_artifact(workspace_dir, "automated-execution-log.md", logger),
         },

connector_builder_agents/src/evals/evaluators.py (5)

57-57: Avoid repeated factory calls; cache once per function.

Minor perf/clarity: create once, then read artifacts.

-    manifest = create_connector_builder_eval_task_output(output).artifacts.get("manifest", None)
+    task_output = create_connector_builder_eval_task_output(output)
+    manifest = task_output.artifacts.get("manifest", None)

90-96: Cache artifacts and early-return handling.

Same caching suggestion as above for readability.

-    readiness_report = create_connector_builder_eval_task_output(output).artifacts.get(
-        "readiness_report", None
-    )
+    task_output = create_connector_builder_eval_task_output(output)
+    readiness_report = task_output.artifacts.get("readiness_report", None)

109-111: Normalize label string to avoid whitespace/case issues.

Prevent false negatives on “PASSED ”, etc.

-    label = eval_df["label"][0]
-    score = 1 if label.upper() == "PASSED" else 0
+    label = str(eval_df["label"][0]).strip().upper()
+    score = 1 if label == "PASSED" else 0

122-122: Same caching nit for artifacts.

-    manifest_str = create_connector_builder_eval_task_output(output).artifacts.get("manifest", None)
+    task_output = create_connector_builder_eval_task_output(output)
+    manifest_str = task_output.artifacts.get("manifest", None)

179-189: Cache artifacts and add YAML mapping guard in primary_key_eval.

-    manifest_str = create_connector_builder_eval_task_output(output).artifacts.get("manifest")
+    task_output = create_connector_builder_eval_task_output(output)
+    manifest_str = task_output.artifacts.get("manifest")
@@
-    try:
-        manifest_dict = yaml.safe_load(manifest_str)
-    except yaml.YAMLError as e:
-        logger.warning(f"Failed to parse manifest YAML: {e}")
-        return 0.0
+    try:
+        manifest_dict = yaml.safe_load(manifest_str)
+    except yaml.YAMLError as e:
+        logger.warning(f"Failed to parse manifest YAML: {e}")
+        return 0.0
+    if not isinstance(manifest_dict, dict):
+        logger.warning("Manifest YAML did not parse to a mapping/object")
+        return 0.0

connector_builder_agents/src/agents.py (2)

25-58: Tighten Pydantic validation by forbidding unknown fields.

Given “fix pydantic validation,” consider forbidding extras to catch schema drift early.

If using Pydantic v2, apply:

-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, ConfigDict
@@
 class DelegatedDeveloperTask(BaseModel):
+    model_config = ConfigDict(extra="forbid")
@@
 class ManagerHandoffInput(BaseModel):
+    model_config = ConfigDict(extra="forbid")
@@
 class ManagerTaskOutput(BaseModel):
+    model_config = ConfigDict(extra="forbid")

If on Pydantic v1, use class Config: extra = "forbid" inside each model instead.

156-179: Harden phase tool file reads.

Guard against missing prompt files; return a clear message for the agent instead of crashing.

     @manager_agent.tool
     async def start_phase_1(
         ctx: RunContext[SessionState],
     ) -> str:
         """Start phase 1 of the connector build. Returns the prompt for phase 1."""
-        update_progress_log("🔧 [Manager] MCP Tool call: start_phase_1", ctx.deps)
-        return PHASE_1_PROMPT_FILE_PATH.read_text(encoding="utf-8")
+        update_progress_log("🔧 [Manager] MCP Tool call: start_phase_1", ctx.deps)
+        try:
+            return PHASE_1_PROMPT_FILE_PATH.read_text(encoding="utf-8")
+        except FileNotFoundError:
+            update_progress_log("⚠️ Phase 1 prompt file not found.", ctx.deps)
+            return "Phase 1 prompt unavailable."
@@
     async def start_phase_2(
         ctx: RunContext[SessionState],
     ) -> str:
         """Start phase 2 of the connector build. Returns the prompt for phase 2."""
-        update_progress_log("🔧 [Manager] MCP Tool call: start_phase_2", ctx.deps)
-        return PHASE_2_PROMPT_FILE_PATH.read_text(encoding="utf-8")
+        update_progress_log("🔧 [Manager] MCP Tool call: start_phase_2", ctx.deps)
+        try:
+            return PHASE_2_PROMPT_FILE_PATH.read_text(encoding="utf-8")
+        except FileNotFoundError:
+            update_progress_log("⚠️ Phase 2 prompt file not found.", ctx.deps)
+            return "Phase 2 prompt unavailable."
@@
     async def start_phase_3(
         ctx: RunContext[SessionState],
     ) -> str:
         """Start phase 3 of the connector build. Returns the prompt for phase 3."""
-        update_progress_log("🔧 [Manager] MCP Tool call: start_phase_3", ctx.deps)
-        return PHASE_3_PROMPT_FILE_PATH.read_text(encoding="utf-8")
+        update_progress_log("🔧 [Manager] MCP Tool call: start_phase_3", ctx.deps)
+        try:
+            return PHASE_3_PROMPT_FILE_PATH.read_text(encoding="utf-8")
+        except FileNotFoundError:
+            update_progress_log("⚠️ Phase 3 prompt file not found.", ctx.deps)
+            return "Phase 3 prompt unavailable."

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2164c2 and 9dd3dc8.

📒 Files selected for processing (18)

connector_builder_agents/prompts/phase-1-stream-read.md (2 hunks)
connector_builder_agents/prompts/phase-2-pagination.md (3 hunks)
connector_builder_agents/prompts/phase-3-remaining-streams.md (1 hunks)
connector_builder_agents/prompts/root-prompt.md (1 hunks)
connector_builder_agents/src/agents.py (6 hunks)
connector_builder_agents/src/constants.py (1 hunks)
connector_builder_agents/src/evals/evaluators.py (7 hunks)
connector_builder_agents/src/evals/helpers.py (2 hunks)
connector_builder_agents/src/evals/task.py (2 hunks)
connector_builder_agents/src/guidance.py (1 hunks)
connector_builder_agents/src/run.py (2 hunks)
connector_builder_agents/src/tools.py (1 hunks)
connector_builder_agents/tests/test_evaluators.py (5 hunks)
connector_builder_agents/tests/test_task.py (3 hunks)
connector_builder_mcp/_util.py (2 hunks)
connector_builder_mcp/connector_builder.py (1 hunks)
connector_builder_mcp/manifest_scaffold.py (1 hunks)
connector_builder_mcp/validation_testing.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (7)

connector_builder_agents/src/run.py (1)

connector_builder_agents/src/tools.py (3)

is_complete (50-52)

is_complete (213-215)

update_progress_log (225-249)

connector_builder_agents/tests/test_task.py (1)

connector_builder_agents/src/agents.py (1)

ManagerTaskOutput (49-57)

connector_builder_agents/src/evals/helpers.py (1)

connector_builder_agents/src/agents.py (1)

ManagerTaskOutput (49-57)

connector_builder_agents/src/evals/task.py (2)

connector_builder_agents/src/evals/helpers.py (3)

ConnectorBuilderEvalTaskOutput (13-18)

create_connector_builder_eval_task_output (33-49)

get_artifact (21-30)

connector_builder_agents/src/run.py (2)

get_workspace_dir (35-39)

run_connector_build (42-109)

connector_builder_agents/src/agents.py (1)

connector_builder_agents/src/tools.py (2)

SessionState (19-52)

update_progress_log (225-249)

connector_builder_agents/src/evals/evaluators.py (1)

connector_builder_agents/src/evals/helpers.py (1)

create_connector_builder_eval_task_output (33-49)

connector_builder_mcp/validation_testing.py (1)

connector_builder_mcp/_util.py (1)

is_valid_declarative_source_manifest (166-185)

🪛 GitHub Actions: Linters

connector_builder_mcp/validation_testing.py

[error] 264-264: Mypy type error: Argument 1 to 'append' of 'list' has incompatible type 'str | None'; expected 'str'.

[error] 1-1: Mypy: 1 error in 1 file (checked 8 source files).

🪛 LanguageTool

connector_builder_agents/prompts/phase-1-stream-read.md

[style] ~17-~17: In American English, abbreviations like “etc.” require a period.
Context: ....g. RecordSelector, DpathExtractor, etc) - find_connectors_by_class_name - Fi...

(ETC_PERIOD)

connector_builder_agents/prompts/root-prompt.md

[style] ~20-~20: Consider a more concise word here.
Context: ...efer to send a path and not the string, in order to speed up the process and to reduce cont...

(IN_ORDER_TO_PREMIUM)

[style] ~20-~20: Consider using a different verb to strengthen your wording.
Context: ... a path and not the string, in order to speed up the process and to reduce context and t...

(SPEED_UP_ACCELERATE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run Evals (Single Connector)

🔇 Additional comments (21)

connector_builder_mcp/manifest_scaffold.py (2)

236-237: Docstring update appropriately clarifies tool scope.

The added guidance clearly communicates that this tool is reserved for initial connector setup, which aligns with the new phase-driven workflow. This is helpful for tool consumers and prevents misuse.

211-270: Function logic remains sound with no behavioral changes.

The validation, manifest generation, and error handling flows are unchanged and appropriate. Exception handling with logging ensures observability of generation failures.

connector_builder_mcp/connector_builder.py (3)

238-258: Good documentation improvement.

Adding the "DefaultPaginator" example enhances clarity and provides a practical reference for users of the tool. The example aligns well with the existing format and documentation style.

259-296: Solid implementation with proper input validation and resource management.

The function logic is correct:

Defensive input validation (lines 259–264) handles empty/malformed inputs cleanly.

CSV parsing and feature indexing (lines 273–281) is straightforward.

Set intersection logic (lines 285–294) correctly computes connectors that match all requested class names.

Resource cleanup via with statement is proper.

102-115: Solid HTTP error handling and timeout management.

All HTTP calls employ a consistent pattern:

Explicit timeout=30 prevents request hangs.

Try-except blocks with logging capture failures gracefully.

User-facing error messages are informative.

This approach balances robustness with observability, reducing risk of silent failures or timeouts cascading through the system.

Also applies to: 127-155, 192-203, 224-235

connector_builder_agents/src/tools.py (1)

256-262: LGTM - Documentation clarification.

The docstring update improves clarity by distinguishing between "task" completion and the requirement for all three phases to succeed. The function logic remains unchanged.

connector_builder_agents/prompts/phase-3-remaining-streams.md (1)

3-3: LGTM - Wording aligns with manager role.

The change from "working on" to "managing" is consistent with the manager-driven workflow architecture introduced in this PR.

connector_builder_mcp/_util.py (1)

177-179: LGTM - Manifest preprocessing added before validation.

The addition of ManifestReferenceResolver.preprocess_manifest() introduces reference resolution before validation. This is a functional change that aligns with the broader manifest preprocessing improvements in the PR. The existing exception handling appropriately covers any preprocessing errors.

connector_builder_agents/tests/test_task.py (1)

28-38: LGTM - Test updated for structured output.

The test now correctly creates a mock output dictionary matching the ManagerTaskOutput schema and validates it properly at line 70. This aligns with the transition from plain string outputs to structured output models.

connector_builder_agents/src/constants.py (1)

13-15: LGTM - Phase prompt constants follow established patterns.

The new PHASE_*_PROMPT_FILE_PATH constants follow the same pattern as ROOT_PROMPT_FILE_PATH and properly reference the phase prompt files included in this PR.

connector_builder_agents/prompts/phase-2-pagination.md (2)

15-16: Tool references look good.

Adding get_connector_builder_docs and find_connectors_by_class_name aligns with the Phase 2 workflow and should help the agent find DefaultPaginator examples.

3-3: Clearer role phrasing — LGTM.

Rewording to “managing” better matches the manager prompt flow introduced elsewhere.

connector_builder_agents/tests/test_evaluators.py (2)

30-43: Manifest references and paginator/authenticator wiring — LGTM.

Using #/definitions/authenticator under requester.authenticator and #/definitions/paginator under retriever.paginator matches Declarative manifest structure.

Offsets with OffsetIncrement and _start parameter look consistent for simple paginated APIs.

Also applies to: 56-58, 63-65, 76-78, 83-85, 96-98, 103-105

286-294: Structured final_output in test helper — LGTM, but ensure factory path.

The structured final_output matches ManagerTaskOutput. Ensure the factory in code paths (create_connector_builder_eval_task_output) correctly parses this dict into a model instance when used by evaluators.

If not already covered by tests, please add a small assertion that create_connector_builder_eval_task_output(...).final_output is a ManagerTaskOutput with the expected flags.

connector_builder_agents/src/guidance.py (1)

29-31: Manager prompt updates — LGTM.

Explicit Phase 1–3 steps with start_phase_* tools and validation/test read loops are clear and align with the new phase prompts.

connector_builder_mcp/validation_testing.py (1)

706-706: Summary line improvement — LGTM.

“Streams Tested: X tested out of Y total available streams” is clearer for partial runs.

connector_builder_agents/src/evals/evaluators.py (3)

69-76: Manifest preprocessing flow LGTM.

Reference resolution + type/param propagation before DeclarativeSource.parse_obj is correct.

134-145: Preprocess+parse LGTM; fallback on root is correct.

190-201: Parsing block LGTM; error path OK.

connector_builder_agents/src/agents.py (2)

68-88: Developer agent construction LGTM.

Output type set to ManagerHandoffInput and toolset wiring looks correct.

118-119: Manager agent output_type LGTM.

Matches ManagerTaskOutput model fields.

connector_builder_agents/prompts/phase-1-stream-read.md

connector_builder_agents/prompts/root-prompt.md

connector_builder_agents/src/evals/helpers.py

connector_builder_agents/src/run.py

connector_builder_mcp/validation_testing.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

connector_builder_mcp/validation_testing.py (2)

626-645: Critical: UnboundLocalError and incorrect primary_key type handling.

This code has multiple critical issues:

UnboundLocalError: If stream_config is None (stream not found), primary_key is never defined, causing a crash at line 644.
Type handling: Line 636 assumes primary_key is an iterable of strings, but:
- If it's a string like "id", iteration yields characters: ["i", "d"]
- If it's a nested list like [["id", "type"]] (composite key), iteration doesn't flatten correctly
Display issues: str(primary_key) at line 644 produces "[]" instead of None for empty keys, and "None" (string) instead of actual None.

Apply this diff to normalize primary_key and prevent crashes:

                 stream_config = next(
                     (s for s in available_streams if s.get("name") == stream_name),
                     None,
                 )
+                primary_key_display = None
+                primary_key_list: list[str] = []
                 if stream_config:
-                    primary_key = stream_config.get("primary_key", [])
+                    primary_key_raw = stream_config.get("primary_key", [])
+                    # Normalize to flat list of strings
+                    if isinstance(primary_key_raw, str):
+                        primary_key_list = [primary_key_raw]
+                    elif isinstance(primary_key_raw, list):
+                        flat: list[str] = []
+                        for item in primary_key_raw:
+                            if isinstance(item, str):
+                                flat.append(item)
+                            elif isinstance(item, list):
+                                # Composite key: join with dot notation
+                                flat.append(".".join(str(x) for x in item))
+                        primary_key_list = flat
                     if not primary_key_list:
                         field_count_warnings.append("No primary key defined in manifest")
                     elif result.record_stats:
                         properties = result.record_stats.get("properties", {})
-                        missing_pk_fields = [pk for pk in primary_key if pk not in properties]
+                        missing_pk_fields = [pk for pk in primary_key_list if pk not in properties]
                         if missing_pk_fields:
                             field_count_warnings.append(
                                 f"Primary key field(s) missing from records: {', '.join(missing_pk_fields)}"
                             )
+                primary_key_display = ", ".join(primary_key_list) if primary_key_list else None
 
                 smoke_test_result = StreamSmokeTest(
                     stream_name=stream_name,
-                    primary_key=str(primary_key),
+                    primary_key=primary_key_display,
                     success=True,

261-269: Handle the case when validation fails without an error message.

If is_valid_declarative_source_manifest returns (False, None), the current code neither appends an error nor returns early, allowing the function to continue despite validation failure.

Apply this diff:

         try:
             is_valid, error = is_valid_declarative_source_manifest(manifest_dict)
-            if not is_valid and error:
-                errors.append(error)
+            if not is_valid:
+                errors.append(error or "Manifest is not a valid DeclarativeSource.")
                 return ManifestValidationResult(is_valid=False, errors=errors, warnings=warnings)
         except Exception as e:

🧹 Nitpick comments (1)

connector_builder_mcp/validation_testing.py (1)
737-737: Improve primary key display for missing values.

When primary_key is None, the report displays None as the value. Consider using a more user-friendly placeholder.

Apply this diff:
-                    f"- **Primary Key**: {smoke_result.primary_key}",
+                    f"- **Primary Key**: {smoke_result.primary_key or '—'}",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9dd3dc8 and d9b0be4.

📒 Files selected for processing (1)

connector_builder_mcp/validation_testing.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

connector_builder_mcp/validation_testing.py (1)

connector_builder_mcp/_util.py (1)

is_valid_declarative_source_manifest (166-185)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)

🔇 Additional comments (1)

connector_builder_mcp/validation_testing.py (1)

706-706: LGTM: Improved clarity.

The updated summary text is more descriptive and clearly distinguishes tested streams from total available streams.

connector_builder_mcp/validation_testing.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

connector_builder_mcp/validation_testing.py (1)

642-648: Use normalized primary_key_list for display.

After normalizing primary_key in the previous block, use the normalized list for display instead of the raw value.

Apply this diff:

                 smoke_test_result = StreamSmokeTest(
                     stream_name=stream_name,
-                    primary_key=str(primary_key) if primary_key else None,
+                    primary_key=", ".join(primary_key_list) if primary_key_list else None,
                     success=True,
                     records_read=records_read,
                     duration_seconds=stream_duration,
                 )

♻️ Duplicate comments (2)

connector_builder_mcp/validation_testing.py (2)

261-269: Handle the case where is_valid=False but error=None.

The condition if not is_valid and error: will skip appending an error if the validation fails but error is None. According to the function's type signature, error is str | None, so this case must be handled.

Apply this diff to ensure validation failures are always reported:

         try:
             is_valid, error = is_valid_declarative_source_manifest(manifest_dict)
-            if not is_valid and error:
-                errors.append(error)
+            if not is_valid:
+                errors.append(error or "Manifest is not a valid DeclarativeSource.")
                 return ManifestValidationResult(is_valid=False, errors=errors, warnings=warnings)
         except Exception as e:
             errors.append(f"Error validating manifest: {e}")
             return ManifestValidationResult(is_valid=False, errors=errors, warnings=warnings)

626-641: Fix UnboundLocalError and normalize primary_key types.

Multiple issues:

UnboundLocalError: If stream_config is None, primary_key is undefined, causing a crash at line 644.
Type handling: primary_key can be a string, list, or nested list. Line 636's membership check assumes it's an iterable of strings.
Display: Line 644's str(primary_key) renders [] as "[]" and treats the string "None" as a value.

Apply this diff to fix all issues:

                 stream_config = next(
                     (s for s in available_streams if s.get("name") == stream_name),
                     None,
                 )
+                primary_key_list: list[str] = []
                 if stream_config:
                     primary_key = stream_config.get("primary_key", [])
-                    if not primary_key:
+                    # Normalize to flat list of strings
+                    if isinstance(primary_key, str):
+                        primary_key_list = [primary_key]
+                    elif isinstance(primary_key, list):
+                        for item in primary_key:
+                            if isinstance(item, str):
+                                primary_key_list.append(item)
+                            elif isinstance(item, list):
+                                primary_key_list.extend(str(x) for x in item)
+                    
+                    if not primary_key_list:
                         field_count_warnings.append("No primary key defined in manifest")
                     elif result.record_stats:
                         properties = result.record_stats.get("properties", {})
-                        missing_pk_fields = [pk for pk in primary_key if pk not in properties]
+                        missing_pk_fields = [pk for pk in primary_key_list if pk not in properties]
                         if missing_pk_fields:
                             field_count_warnings.append(
                                 f"Primary key field(s) missing from records: {', '.join(missing_pk_fields)}"
                             )

🧹 Nitpick comments (1)

connector_builder_mcp/validation_testing.py (1)
737-737: Improve display formatting for missing primary keys.

When primary_key is None, display a placeholder like — or "(none)" for better readability.

Apply this diff:
-                    f"- **Primary Key**: {smoke_result.primary_key}",
+                    f"- **Primary Key**: {smoke_result.primary_key or '—'}",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9b0be4 and 41ef93f.

📒 Files selected for processing (3)

connector_builder_agents/src/evals/helpers.py (2 hunks)
connector_builder_agents/src/run.py (2 hunks)
connector_builder_mcp/validation_testing.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

connector_builder_mcp/validation_testing.py (1)

connector_builder_mcp/_util.py (1)

is_valid_declarative_source_manifest (166-185)

connector_builder_agents/src/run.py (1)

connector_builder_agents/src/tools.py (3)

is_complete (50-52)

is_complete (213-215)

update_progress_log (225-249)

connector_builder_agents/src/evals/helpers.py (1)

connector_builder_agents/src/agents.py (1)

ManagerTaskOutput (49-57)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (6)

connector_builder_agents/src/run.py (1)

9-9: LGTM! Import added for retry logic.

The ModelHTTPError import is correctly added to support the new retry mechanism.

connector_builder_agents/src/evals/helpers.py (2)

4-19: LGTM! Well-structured evaluation output model.

The imports and ConnectorBuilderEvalTaskOutput model are cleanly defined with appropriate types. The use of Mapping[str, Any] for artifacts provides immutability guarantees.

33-53: LGTM! Past review addressed correctly.

The factory function now properly addresses the concerns from the previous review:

Creates a shallow copy to prevent input mutation (line 37)

Always pops final_output to avoid duplicate kwarg errors (line 40)

Correctly handles falsy final_output values (line 42)

Gracefully handles unexpected types by falling back to None

The implementation is clean and defensive.

connector_builder_mcp/validation_testing.py (3)

38-38: LGTM: Import correctly added.

The import of is_valid_declarative_source_manifest is appropriate and used correctly at line 262.

82-82: LGTM: Field addition is appropriate.

The primary_key field is correctly typed as optional and properly integrated into the report flow.

706-706: LGTM: Improved summary text clarity.

The updated summary text clearly distinguishes between tested streams and total available streams.

connector_builder_agents/src/run.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

connector_builder_agents/prompts/root-prompt-headless.md (1)
27-27: Optional style improvement: simplify "in order to" phrasing.

The guidance is clear and functional. Optionally, consider tightening the wording for conciseness. Per static analysis, "in order to" can be replaced with simpler constructions like "to".

Apply this optional diff:
-    - Many of your tools will accept either a manifest.yaml path or a yaml string. You should prefer to send a path and not the string, in order to speed up the process and to reduce context and token usage.
+    - Many of your tools will accept either a manifest.yaml path or a yaml string. You should prefer to send a path to speed up the process and reduce context and token usage.
connector_builder_agents/prompts/root-prompt.md (1)
20-20: Typo fix confirmed; optional style improvement.

The "except" → "accept" typo has been corrected (addressing prior review feedback). Additionally, manifest.yaml is now wrapped in backticks for consistency with formatting conventions.

Optionally, consider simplifying "in order to" as suggested by static analysis:
-    - Many of your tools will accept either a `manifest.yaml` path or a yaml string. You should prefer to send a path and not the string, in order to speed up the process and to reduce context and token usage.
+    - Many of your tools will accept either a `manifest.yaml` path or a yaml string. You should prefer to send a path to speed up the process and reduce context and token usage.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41ef93f and 7b7fffc.

📒 Files selected for processing (4)

connector_builder_agents/prompts/phase-1-stream-read.md (2 hunks)
connector_builder_agents/prompts/root-prompt-headless.md (1 hunks)
connector_builder_agents/prompts/root-prompt.md (1 hunks)
connector_builder_agents/src/run.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

connector_builder_agents/src/run.py

🧰 Additional context used

🪛 LanguageTool

connector_builder_agents/prompts/root-prompt-headless.md

[style] ~27-~27: Consider a more concise word here.
Context: ...efer to send a path and not the string, in order to speed up the process and to reduce cont...

(IN_ORDER_TO_PREMIUM)

[style] ~27-~27: Consider using a different verb to strengthen your wording.
Context: ... a path and not the string, in order to speed up the process and to reduce context and t...

(SPEED_UP_ACCELERATE)

connector_builder_agents/prompts/root-prompt.md

[style] ~20-~20: Consider a more concise word here.
Context: ...efer to send a path and not the string, in order to speed up the process and to reduce cont...

(IN_ORDER_TO_PREMIUM)

[style] ~20-~20: Consider using a different verb to strengthen your wording.
Context: ... a path and not the string, in order to speed up the process and to reduce context and t...

(SPEED_UP_ACCELERATE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (6)

connector_builder_agents/prompts/root-prompt.md (1)

19-19: Clear new instruction on file naming.

Line 19 adds helpful clarity about using exact file names—good addition to reduce ambiguity.

connector_builder_agents/prompts/phase-1-stream-read.md (5)

3-3: Clarified Phase 1 scope in opening.

Updated phrasing improves clarity about the phase management context.

8-9: Primary key and API path criteria now explicit.

Lines 8–9 add clear objectives about defining the primary key (if applicable) and identifying the correct API response path. These are good clarifications that align with the PR's structured handoff models.

17-18: Punctuation fixed; tool references expanded.

Line 17: "etc.)" punctuation has been corrected (addressing prior review feedback). Lines 17–18 add two new tool references (get_connector_builder_docs, find_connectors_by_class_name) that provide concrete examples and documentation lookups—useful expansions for Phase 1 guidance.

23-27: Success criteria strengthened with primary key and validation checks.

Added two new success criteria:

Line 23: "Primary key is defined, if applicable"

Line 27: "Manifest validates successfully"

These align well with the PR's focus on structured output models and validation. Together with the existing criteria, they form a more complete Phase 1 checklist.

38-38: Phase 2 handoff wording updated.

Clarified that "you will begin Phase 2 to add pagination support," making the phase progression more explicit and supporting the PR's phased workflow structure.

add phase prompt tools, fix pydantic validation, add output models

1dd8789

github-actions bot added the enhancement New feature or request label Oct 28, 2025

add pk to readiness report

9dd3dc8

pnilan marked this pull request as ready for review October 28, 2025 21:08

chore: type check

d9b0be4

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

connector_builder_mcp/validation_testing.py Outdated Show resolved Hide resolved

fix issues per comments

41ef93f

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

connector_builder_agents/src/run.py Outdated Show resolved Hide resolved

fix retry loop

7b7fffc

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

pnilan merged commit 272b1f2 into main Oct 28, 2025
17 checks passed

pnilan deleted the pnilan/agents/re-add-output-models branch October 28, 2025 21:46

This was referenced Nov 7, 2025

feat: Add session-based manifest editing tools and MCP resource #168

Merged

feat: Add manifest history tracking with versioning, checkpointing, recall, and diff #176

Merged

feat: add phase prompt tools, fix pydantic validation, add output models #159

feat: add phase prompt tools, fix pydantic validation, add output models #159

Uh oh!

Conversation

pnilan commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Summary by CodeRabbit

Uh oh!

github-actions bot commented Oct 28, 2025

👋 Welcome to the Airbyte Connector Builder MCP!

Testing This Branch via MCP

Testing This Branch via CLI

PR Slash Commands

AI Builder Evaluations

Helpful Resources

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pnilan commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading