-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add primary_key eval #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds manifest-aware evaluations (DeclarativeSource parsing, manifest validation, primary-key checks), refactors task output to a Pydantic model, extends manifest utilities and tests, updates guidance and tool retry config, and adds/adjusts test and runtime dependencies and test tooling. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Runner as Test / Runner
participant Task as run_connector_build_task
participant Build as run_connector_build()
participant Art as Artifact store
participant EvalMgr as EVALUATORS
participant DSrc as DeclarativeSource
Runner->>Task: invoke(dataset_row)
Task->>Build: run_connector_build(...)
Build-->>Task: final_result, turn_count, artifacts
Task->>Art: read readiness_report, manifest
alt manifest present & valid
Task->>DSrc: DeclarativeSource.parse_obj(manifest)
DSrc-->>EvalMgr: parsed manifest model
EvalMgr->>EvalMgr: manifest_validation_eval()
EvalMgr->>EvalMgr: streams_eval(expected, model)
EvalMgr->>EvalMgr: primary_key_eval(expected, model)
else manifest missing/invalid
EvalMgr->>EvalMgr: fallback evaluations (readiness-only / diagnostics)
end
Task->>Task: assemble ConnectorBuilderEvalTaskOutput
Task-->>Runner: ConnectorBuilderEvalTaskOutput
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes • Areas to focus on:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
👋 Welcome to the Airbyte Connector Builder MCP!Thank you for your contribution! Here are some helpful tips and reminders for your convenience. Testing This Branch via MCPTo test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration: {
"mcpServers": {
"connector-builder-mcp-dev": {
"command": "uvx",
"args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pnilan/evals/primary-key", "connector-builder-mcp"]
}
}
}Testing This Branch via CLIYou can test this version of the MCP Server using the following CLI snippet: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pnilan/evals/primary-key#egg=airbyte-connector-builder-mcp' --helpPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
AI Builder EvaluationsAI builder evaluations run automatically under the following conditions:
A set of standardized evaluations also run on a schedule (Mon/Wed/Fri at midnight UTC) and can be manually triggered via workflow dispatch. Helpful ResourcesIf you have any questions, feel free to ask in the PR comments or join our Slack community. |
PyTest Results (Full)0 tests 0 ✅ 0s ⏱️ Results for commit 361954b. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
connector_builder_agents/src/evals/evaluators.py (1)
141-144: Apply vacuous-truth when no expectations are defined.When
expected_stream_namesis empty, return1.0instead of0.0to avoid penalizing missing expectations. Based on learnings.- if not expected_stream_names: - logger.warning("No expected streams found") - return 0.0 + if not expected_stream_names: + logger.info("No expected streams defined; treating as pass (vacuous truth)") + return 1.0
🧹 Nitpick comments (11)
connector_builder_mcp/_guidance.py (1)
183-189: Consider removing "please" for consistency.The new Rules section provides valuable guidance. However, line 185 uses "please" ("please pass it without parsing secrets yourself"), which is inconsistent with the imperative tone established elsewhere in this file (e.g., line 168 where "please" was removed).
Apply this diff for consistency:
-- All MCP tools support receiving .env file path - please pass it without parsing secrets yourself. +- All MCP tools support receiving .env file path - pass it without parsing secrets yourself.connector_builder_mcp/_util.py (1)
163-180: Consider the breadth of exception handling.The function catches all exceptions with
except Exception as e, which is quite broad. While this ensures that any parsing error is caught and reported, it may also hide programming errors or unexpected issues.For validation purposes, this broad catch may be intentional to handle any failure gracefully. However, consider whether specific exception types from
DeclarativeSource.parse_obj()should be caught instead.Additionally, the docstring format (opening triple quotes on a separate line) differs from other functions in this file. Consider aligning with the single-line opening style used elsewhere for consistency.
connector_builder_agents/src/tools.py (1)
82-82: Reduced logging may impact debugging.The removal of tool arguments from the log message reduces verbosity but may make debugging tool failures more difficult. Consider whether argument details are needed for troubleshooting.
connector_builder_agents/tests/test_evaluators.py (1)
385-385: Remove duplicate import.
pytestis already imported at line 8. The duplicate import at line 385 is redundant.Apply this diff:
-import pytest - - @pytest.mark.parametrize(connector_builder_agents/src/evals/task.py (2)
48-50: Preserve stack traces on failures.Use
logger.exception(...)to capture the traceback; re-raise unchanged.- except Exception as e: - logger.error(f"Failed to build connector '{connector_name}': {e}") - raise + except Exception: + logger.exception(f"Failed to build connector '{connector_name}'") + raise
55-66: Success computation and log message clarity.
- Consider
success = bool(build_result)so an empty list isn’t treated as success.- Avoid unconditional “successfully” wording; include the status in the log.
- final_result = build_result[-1] if build_result else None - success = build_result is not None - num_turns = len(build_result) if build_result else 0 - - logger.info(f"Build completed - Success: {success}, Turns: {num_turns}") + final_result = build_result[-1] if build_result else None + success = bool(build_result) + num_turns = len(build_result) if build_result else 0 + + logger.info(f"Build completed - success={success}, turns={num_turns}") @@ - logger.info(f"Task completed successfully for connector '{connector_name}'") + logger.info(f"Task completed for connector '{connector_name}' (success={success})")connector_builder_agents/src/evals/evaluators.py (5)
74-77: Keep return types consistent (ints for readiness_eval).Function is annotated to return int; return
0instead of0.0.- if output is None: - logger.warning("No output provided - build task likely failed") - return 0.0 + if output is None: + logger.warning("No output provided - build task likely failed") + return 0
113-116: Return float type consistently.
streams_evalreturnsfloat; use0.0.- if not manifest_str: - logger.warning("No manifest found or manifest is empty") - return 0 + if not manifest_str: + logger.warning("No manifest found or manifest is empty") + return 0.0
190-191: Avoid print; use logger and lower log level.Replace prints with
logger.debugto keep logs structured.- print(f"Actual primary key: {actual_pk}") - print(f"Expected primary key: {expected_by_name[name]}") + logger.debug(f"Actual primary key for {name}: {actual_pk}") + logger.debug(f"Expected primary key for {name}: {expected_by_name[name]}")
186-195: Normalize primary key shapes before comparing.Treat
str,List[str], andList[List[str]]forms equivalently and compare on a canonical representation (order-insensitive). This avoids false negatives due to formatting differences.- correct = 0 - for name in matched_names: - pk = getattr(available[name], "primary_key", None) - actual_pk = getattr(pk, "__root__", pk) - logger.debug(f"Actual primary key for {name}: {actual_pk}") - logger.debug(f"Expected primary key for {name}: {expected_by_name[name]}") - if actual_pk == expected_by_name[name]: - correct += 1 + def _canon(pk): + if pk is None: + return None + # Coerce to list-of-lists + if isinstance(pk, str): + pk = [[pk]] + elif isinstance(pk, list) and pk and all(isinstance(x, str) for x in pk): + pk = [pk] + elif not isinstance(pk, list): + return pk + # Sort inner lists and outer list for order-insensitive compare + return tuple(sorted(tuple(sorted(inner)) for inner in pk)) + + correct = 0 + for name in matched_names: + pk = getattr(available[name], "primary_key", None) + actual_pk = getattr(pk, "__root__", pk) + if _canon(actual_pk) == _canon(expected_by_name[name]): + correct += 1If you prefer strict order-sensitive equality, skip the sorting and only coerce shapes.
120-125: Optional: future-proof DeclarativeSource parsing across Pydantic v1/v2.Airbyte CDK models may migrate to Pydantic v2 (
model_validate). Consider a helper that tries both.def _parse_declarative_source(obj): try: return DeclarativeSource.parse_obj(obj) except AttributeError: # Pydantic v2 return DeclarativeSource.model_validate(obj)Use
_parse_declarative_source(manifest_dict)in bothstreams_evalandmanifest_validation_eval.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
connector_builder_agents/uv.lockis excluded by!**/*.lockuv.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
connector_builder_agents/pyproject.toml(1 hunks)connector_builder_agents/src/evals/data/connectors.yaml(1 hunks)connector_builder_agents/src/evals/evaluators.py(4 hunks)connector_builder_agents/src/evals/phoenix_run.py(4 hunks)connector_builder_agents/src/evals/task.py(3 hunks)connector_builder_agents/src/tools.py(4 hunks)connector_builder_agents/tests/test_evaluators.py(1 hunks)connector_builder_agents/tests/test_task.py(1 hunks)connector_builder_mcp/_guidance.py(3 hunks)connector_builder_mcp/_util.py(4 hunks)poe_tasks.toml(1 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T05:20:31.622Z
Learnt from: aaronsteers
PR: airbytehq/connector-builder-mcp#136
File: connector_builder_agents/src/evals/evaluators.py:0-0
Timestamp: 2025-10-11T05:20:31.622Z
Learning: In connector builder evaluator functions (e.g., primary_keys_eval, records_eval, stream_names_eval), when there are zero expectations to validate against (total_expected_streams == 0), the evaluator should return 1.0 (representing vacuous truth: all zero tests passed) rather than 0.0 (failure). This prevents penalizing scenarios where no validation criteria are defined.
Applied to files:
connector_builder_agents/tests/test_evaluators.pyconnector_builder_agents/src/evals/evaluators.py
🧬 Code graph analysis (5)
connector_builder_agents/tests/test_evaluators.py (2)
connector_builder_agents/src/evals/task.py (1)
ConnectorBuilderEvalTaskOutput(23-28)connector_builder_agents/src/evals/evaluators.py (4)
manifest_validation_eval(42-68)primary_key_eval(153-195)readiness_eval(71-100)streams_eval(103-150)
connector_builder_agents/src/evals/task.py (2)
connector_builder_agents/src/run.py (1)
get_workspace_dir(34-38)connector_builder_agents/src/evals/helpers.py (1)
get_artifact(4-13)
connector_builder_agents/src/evals/evaluators.py (1)
connector_builder_agents/src/evals/task.py (1)
ConnectorBuilderEvalTaskOutput(23-28)
connector_builder_agents/tests/test_task.py (1)
connector_builder_agents/src/evals/task.py (2)
ConnectorBuilderEvalTaskOutput(23-28)run_connector_build_task(31-76)
connector_builder_agents/src/evals/phoenix_run.py (1)
connector_builder_agents/src/evals/task.py (1)
run_connector_build_task(31-76)
🔇 Additional comments (13)
connector_builder_mcp/_guidance.py (2)
14-14: Grammar improvement looks good.The change from "Not include" to "Do not include" corrects the grammar and improves clarity.
168-168: Imperative tone is appropriate for checklist instructions.The removal of "please" makes the instruction more direct and consistent with standard checklist formatting.
connector_builder_mcp/_util.py (2)
114-117: LGTM!The try-except block improves error handling by providing a clearer error message when YAML parsing fails. The exception chaining with
from eis appropriate and maintains the stack trace.
11-12: ✓ airbyte_cdk dependency is properly declared.The import of
DeclarativeSourcefromairbyte_cdkis supported. The rootpyproject.tomlincludes"airbyte-cdk>=6.61.6,<7.0"in its dependencies, which governs theconnector_builder_mcppackage.connector_builder_agents/pyproject.toml (1)
17-17: LGTM! Dependency additions support new evaluation features.The new dependencies appropriately support:
airbyte-cdk>=7.3.8,<8.0: Enables DeclarativeSource parsing for manifest validationpytest>=8.4.2andpytest-asyncio>=0.23.0: Support the new test files added in this PRAlso applies to: 24-25
pyproject.toml (1)
12-12: LGTM! Runtime dependency additions are appropriate.The dependencies support core agent functionality:
emoji>=2.0.0,<3.0: Used for emoji detection in progress loggingpydantic-ai-slim[openai,duckduckgo]>=0.0.14,<1.0: Provides agent framework capabilitiesAlso applies to: 17-17
connector_builder_agents/src/tools.py (1)
121-121: LGTM! Retry configuration improves resilience.Adding
max_retries=5to MCP server configurations provides appropriate resilience against transient failures in tool execution.Also applies to: 150-150, 176-176
poe_tasks.toml (1)
20-20: LGTM! New test task enables agent test execution.The
test-agentstask provides a convenient way to run the new agent tests with verbose output, complementing the existingtesttask for the main codebase.connector_builder_agents/src/evals/data/connectors.yaml (1)
7-18: LGTM! Stream structure enhanced to support primary key evaluation.The transition from plain string stream names to structured objects with
nameandprimary_keyfields enables the new primary key evaluator. The primary key values are appropriate:
- JSONPlaceholder and Rick & Morty APIs correctly use
["id"]- StarWars API correctly uses
["url"]as the unique identifierAlso applies to: 24-35, 41-46
connector_builder_agents/src/evals/phoenix_run.py (2)
28-28: LGTM! Refactoring improves maintainability.The centralization of evaluators into a single
EVALUATORScollection from the evaluators module improves maintainability and makes it easier to add new evaluators in the future.Also applies to: 62-62, 70-70
44-50: LGTM! Improved documentation of Phoenix instrumentation.The added comments clearly explain the automatic instrumentation mechanism and the environment variables used for configuration, improving developer understanding.
connector_builder_agents/tests/test_task.py (1)
17-77: LGTM! Comprehensive test coverage for task execution.The test effectively validates:
- Correct construction of
ConnectorBuilderEvalTaskOutput- Proper handling of task artifacts (readiness report and manifest)
- Correct propagation of session ID, API name, and model configurations
- Expected interaction with mocked dependencies
The test structure is well-organized with clear mocking and assertions.
connector_builder_agents/tests/test_evaluators.py (1)
300-327: LGTM! Comprehensive test coverage for evaluators.The test suite thoroughly validates:
- readiness_eval: Both passing and failing scenarios with mocked LLM classification
- streams_eval: Perfect match, partial match, no match, and extra streams scenarios
- primary_key_eval: Various primary key formats (None, string, single list, nested list, composite keys), no match, and partial match cases
The parametrized tests for primary key formats effectively cover the different formats supported by DeclarativeStream.
Based on learnings
Also applies to: 330-382, 388-437
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
connector_builder_agents/tests/test_evaluators.py (1)
11-16: Import path inconsistency (already flagged).The imports use a relative path
src.evals.evaluatorsinstead of the full pathconnector_builder_agents.src.evals.evaluators. This inconsistency has been flagged in previous reviews.connector_builder_mcp/_util.py (1)
142-160: Improved error reporting with tuple return.The updated signature returning
tuple[bool, str | None]is a good improvement that provides detailed error messages to callers. Note: There's a typo "stucture" on line 158 that was flagged in a previous review.connector_builder_agents/src/evals/evaluators.py (1)
178-180: Apply vacuous truth when no expected streams (already flagged).When there are zero expected streams, the function should return
1.0(vacuous truth: all zero tests passed) rather than0.0. This issue was flagged in a previous review.Based on learnings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
connector_builder_agents/src/evals/evaluators.py(4 hunks)connector_builder_agents/src/evals/task.py(3 hunks)connector_builder_agents/tests/test_evaluators.py(1 hunks)connector_builder_mcp/_util.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T05:20:31.622Z
Learnt from: aaronsteers
PR: airbytehq/connector-builder-mcp#136
File: connector_builder_agents/src/evals/evaluators.py:0-0
Timestamp: 2025-10-11T05:20:31.622Z
Learning: In connector builder evaluator functions (e.g., primary_keys_eval, records_eval, stream_names_eval), when there are zero expectations to validate against (total_expected_streams == 0), the evaluator should return 1.0 (representing vacuous truth: all zero tests passed) rather than 0.0 (failure). This prevents penalizing scenarios where no validation criteria are defined.
Applied to files:
connector_builder_agents/tests/test_evaluators.pyconnector_builder_agents/src/evals/evaluators.py
🧬 Code graph analysis (3)
connector_builder_agents/tests/test_evaluators.py (1)
connector_builder_agents/src/evals/evaluators.py (4)
manifest_validation_eval(42-68)primary_key_eval(157-199)readiness_eval(71-100)streams_eval(103-154)
connector_builder_agents/src/evals/task.py (2)
connector_builder_agents/src/run.py (1)
get_workspace_dir(34-38)connector_builder_agents/src/evals/helpers.py (1)
get_artifact(4-13)
connector_builder_agents/src/evals/evaluators.py (1)
connector_builder_agents/src/evals/task.py (1)
ConnectorBuilderEvalTaskOutput(23-28)
🔇 Additional comments (11)
connector_builder_mcp/_util.py (3)
11-11: LGTM!The import of
DeclarativeSourceis appropriate for the new validation function added below.
114-117: LGTM!Adding try-except around YAML parsing with a clear error message improves error handling and user experience.
163-180: LGTM!The new validation function properly leverages
DeclarativeSource.parse_objfor schema validation and provides helpful error guidance. The return signature is consistent withvalidate_manifest_structure.connector_builder_agents/src/evals/task.py (3)
6-10: LGTM!The imports are correct, including
Pathfrom the standard librarypathlibmodule. The necessary types for the newConnectorBuilderEvalTaskOutputmodel are properly imported.
23-28: LGTM!The
ConnectorBuilderEvalTaskOutputmodel provides a clean, well-typed structure for task results. UsingMapping[str, Any]for artifacts provides flexibility while maintaining type safety.
31-76: LGTM!The refactored function provides better error handling and returns a structured, type-safe output. The try-except block with re-raise preserves the exception context while allowing for logging.
connector_builder_agents/src/evals/evaluators.py (5)
13-14: LGTM!The new imports are necessary for DeclarativeSource-based manifest parsing and the structured task output model.
42-68: LGTM!The manifest validation function properly handles all error cases with appropriate try-except blocks for both YAML parsing and DeclarativeSource parsing. This addresses concerns from previous reviews.
71-100: LGTM!The
readiness_evalfunction properly integrates with the newConnectorBuilderEvalTaskOutputmodel and maintains appropriate error handling throughout.
103-154: LGTM!The
streams_evalfunction now includes proper YAML error handling (as requested in previous reviews) and correctly parses the manifest as a DeclarativeSource to extract available streams.
202-202: LGTM!All evaluators, including the newly added
primary_key_eval, are properly registered in theEVALUATORSlist. This resolves concerns from previous reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
connector_builder_agents/src/evals/evaluators.py (1)
145-147: Apply vacuous-truth when no expected streams.When there are zero expectations to validate against, the evaluator should return
1.0(all zero tests passed) rather than0.0(failure). Based on learnings.Apply this diff:
if not expected_stream_names: - logger.warning("No expected streams found") - return 0.0 + logger.info("No expected streams defined; treating as pass (vacuous truth)") + return 1.0
♻️ Duplicate comments (1)
connector_builder_agents/src/evals/evaluators.py (1)
178-180: Apply vacuous-truth when no expected streams.When there are zero expectations to validate against, the evaluator should return
1.0(all zero tests passed) rather than0.0(failure). Based on learnings.Apply this diff:
if not expected_streams: - logger.warning("No expected streams found") - return 0.0 + logger.info("No expected streams defined; treating as pass (vacuous truth)") + return 1.0
🧹 Nitpick comments (2)
tests/unit/test_util.py (1)
71-71: Consider verifying error messages in negative test cases.While the boolean checks are correct, these negative test cases could also verify that appropriate error messages are returned in the second tuple element. This would ensure the error reporting is working as expected.
For example:
def test_missing_required_fields(self): """Test validation fails for missing required fields.""" manifest = {"version": "0.1.0", "type": "DeclarativeSource"} - assert validate_manifest_structure(manifest)[0] is False + is_valid, error_msg = validate_manifest_structure(manifest) + assert is_valid is False + assert error_msg is not None + assert "required fields" in error_msg.lower()Also applies to: 77-77
connector_builder_agents/src/evals/evaluators.py (1)
117-129: Consider extracting manifest parsing into a helper function.The manifest parsing logic (YAML parse → DeclarativeSource parse →
__root__unwrap) is duplicated in bothstreams_evalandprimary_key_eval. Consider extracting this into a reusable helper function to reduce duplication and improve maintainability.Example:
def parse_manifest_to_declarative_source(manifest_str: str) -> DeclarativeSource | None: """Parse manifest string to DeclarativeSource, handling __root__ unwrapping.""" try: manifest_dict = yaml.safe_load(manifest_str) declarative_source = DeclarativeSource.parse_obj(manifest_dict) return getattr(declarative_source, "__root__", declarative_source) except Exception as e: logger.error(f"Failed to parse manifest: {e}") return NoneAlso applies to: 169-175
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
connector_builder_agents/src/evals/evaluators.py(4 hunks)pyproject.toml(1 hunks)tests/unit/test_util.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T05:20:31.622Z
Learnt from: aaronsteers
PR: airbytehq/connector-builder-mcp#136
File: connector_builder_agents/src/evals/evaluators.py:0-0
Timestamp: 2025-10-11T05:20:31.622Z
Learning: In connector builder evaluator functions (e.g., primary_keys_eval, records_eval, stream_names_eval), when there are zero expectations to validate against (total_expected_streams == 0), the evaluator should return 1.0 (representing vacuous truth: all zero tests passed) rather than 0.0 (failure). This prevents penalizing scenarios where no validation criteria are defined.
Applied to files:
connector_builder_agents/src/evals/evaluators.py
🧬 Code graph analysis (2)
connector_builder_agents/src/evals/evaluators.py (1)
connector_builder_agents/src/evals/task.py (1)
ConnectorBuilderEvalTaskOutput(23-28)
tests/unit/test_util.py (1)
connector_builder_mcp/_util.py (1)
validate_manifest_structure(142-160)
🔇 Additional comments (4)
tests/unit/test_util.py (1)
65-65: LGTM!The test correctly handles the new tuple return type from
validate_manifest_structure.pyproject.toml (1)
74-75: This review comment is incorrect.The
norecursedirs = ["connector_builder_agents"]setting is not redundant. The directoryconnector_builder_agents/tests/exists and contains actual test files (test_evaluators.py,test_task.py).While
testpaths = ["tests"]restricts pytest discovery to the top-leveltests/directory under normal circumstances,norecursedirsserves as a necessary safeguard to prevent pytest from discovering and running tests inconnector_builder_agents/tests/if:
- The configuration is bypassed or changed
- pytest is invoked with different arguments
- The testpaths constraint is not applied in certain CI contexts
This is a legitimate and defensive configuration choice.
Likely an incorrect or invalid review comment.
connector_builder_agents/src/evals/evaluators.py (2)
42-68: LGTM! Manifest validation is well-structured.The function correctly handles all error cases (missing output, missing manifest, YAML parse errors, DeclarativeSource parse errors) and returns consistent integer values (0 or 1).
200-200: LGTM! All evaluators properly exported.The
EVALUATORSlist now correctly includes the newly addedprimary_key_evalfunction. This addresses a previous review comment.
| if hasattr(declarative_source, "__root__"): | ||
| declarative_source = declarative_source.__root__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with root. Does this jump dump as dict? I think model_dump() will do the same.
https://docs.pydantic.dev/latest/concepts/serialization/#python-mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our DeclarativeSource model is defined as a Union of DeclarativeSource1 and DeclarativeSource2. It’s a bit awkward, but __root__ is just how you access the underlying declarative source model. This pattern was auto-generated when dynamic streams were added to the CDK.
aaronsteers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pnilan - Looks great. A couple non-blocking comments. Feel free to merge when ready!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
connector_builder_agents/src/evals/evaluators.py (1)
145-147: Apply vacuous truth when no expected streams defined.When
expected_stream_namesis empty (zero expectations), the evaluator should return 1.0 (vacuous truth: all zero tests passed) rather than 0.0. Based on learnings.Apply this diff:
if not expected_stream_names: - logger.warning("No expected streams found") - return 0.0 + logger.info("No expected streams defined; treating as pass (vacuous truth)") + return 1.0
♻️ Duplicate comments (1)
connector_builder_agents/src/evals/evaluators.py (1)
177-179: Apply vacuous truth when no expected streams defined.When
expected_streamsis empty (zero expectations), the evaluator should return 1.0 (vacuous truth: all zero tests passed) rather than 0.0. This aligns with the vacuous-truth pattern used in other evaluators and prevents penalizing scenarios where no validation criteria are defined. Based on learnings.Apply this diff:
if not expected_streams: - logger.warning("No expected streams found") - return 0.0 + logger.info("No expected streams defined; treating as pass (vacuous truth)") + return 1.0
🧹 Nitpick comments (2)
connector_builder_agents/src/evals/evaluators.py (2)
125-126: Prefer consistentgetattr()pattern for__root__handling.For consistency with
primary_key_eval(line 171), consider using the more concisegetattr()pattern instead ofhasattr()check.Apply this diff for consistency:
- if hasattr(declarative_source, "__root__"): - declarative_source = declarative_source.__root__ + declarative_source = getattr(declarative_source, "__root__", declarative_source)
112-129: Consider extracting manifest parsing to reduce duplication.The manifest parsing logic (YAML loading + DeclarativeSource parsing +
__root__handling) is duplicated betweenstreams_evalandprimary_key_eval. Extracting this to a helper function would improve maintainability.Example helper function:
def _parse_manifest_as_declarative_source(output: dict) -> DeclarativeSource | None: """Parse manifest from output into DeclarativeSource, returning None on error.""" connector_builder_eval_task_output = ConnectorBuilderEvalTaskOutput(**output) manifest_str = connector_builder_eval_task_output.artifacts.get("manifest") if not manifest_str: logger.warning("No manifest found or manifest is empty") return None try: manifest_dict = yaml.safe_load(manifest_str) except yaml.YAMLError as e: logger.warning(f"Failed to parse manifest YAML: {e}") return None try: declarative_source = DeclarativeSource.parse_obj(manifest_dict) return getattr(declarative_source, "__root__", declarative_source) except Exception as e: logger.error(f"Failed to parse DeclarativeSource from manifest: {e}") return NoneAlso applies to: 163-174
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
connector_builder_agents/src/evals/evaluators.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T05:20:31.622Z
Learnt from: aaronsteers
PR: airbytehq/connector-builder-mcp#136
File: connector_builder_agents/src/evals/evaluators.py:0-0
Timestamp: 2025-10-11T05:20:31.622Z
Learning: In connector builder evaluator functions (e.g., primary_keys_eval, records_eval, stream_names_eval), when there are zero expectations to validate against (total_expected_streams == 0), the evaluator should return 1.0 (representing vacuous truth: all zero tests passed) rather than 0.0 (failure). This prevents penalizing scenarios where no validation criteria are defined.
Applied to files:
connector_builder_agents/src/evals/evaluators.py
🧬 Code graph analysis (1)
connector_builder_agents/src/evals/evaluators.py (1)
connector_builder_agents/src/evals/task.py (1)
ConnectorBuilderEvalTaskOutput(23-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (3)
connector_builder_agents/src/evals/evaluators.py (3)
42-68: LGTM! Manifest validation implementation is solid.The function properly validates the manifest by parsing it as a DeclarativeSource with comprehensive error handling for YAML parsing and DeclarativeSource instantiation.
71-100: LGTM! Readiness evaluation properly returns float.The function signature and return values are now type-consistent (all return float), and the LLM-based classification approach is appropriate for readiness evaluation.
199-199: LGTM! All evaluators properly exported.The
EVALUATORSlist now includes all four evaluator functions, including the newly addedprimary_key_eval. This addresses the previous review comment about missing exports.
Summary
Introduces a primary key evaluation workflow for declarative connectors and expands evaluator coverage in the Phoenix evaluation suite.
Changes
expected_primary_keyfield toconnectors.yaml.airbyte-cdkdependency toconnector-builder-agents.ConnectorBuilderEvalTaskOutputmodel to manage build artifacts and metadata.DeclarativeStream.outputargument (required by Phoenix) as a dictionary.Implementation Details
DeclarativeStreamprimary key formats as defined indeclarative_component_schema.yaml:NonestrList[str]List[List[str]]Scope
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
Tests