fix: (CDK) (Connector Builder) - refactor the MessageGrouper > TestRead#332
fix: (CDK) (Connector Builder) - refactor the MessageGrouper > TestRead#332Baz (bazarnov) merged 11 commits intomainfrom
MessageGrouper > TestRead#332Conversation
📝 WalkthroughWalkthroughThis pull request refactors the code for test reading and message grouping. The main changes include updating dataclass import statements and replacing the Changes
Sequence Diagram(s)sequenceDiagram
participant TS as Test Suite
participant TR as TestReader
participant DS as DeclarativeSource
participant UH as Utility Helpers
TS->>TR: run_test_read(source, config, catalog, state, limit)
TR->>DS: _read_stream(...)
DS-->>TR: Iterator of AirbyteMessages
TR->>UH: Invoke helper functions (_categorise_groups, schema inference, etc.)
UH-->>TR: Processed message groups
TR-->>TS: Returns StreamRead object with slices, log messages, and auxiliary requests
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
airbyte_cdk/connector_builder/test_reader/helpers.py (1)
556-582: 🛠️ Refactor suggestionPotential optimization for frequent record fields.
_handle_record_messageaccumulates both schema and datetime formats on every record. If a stream is large, we might slow performance. Any interest in offering a caching approach or a sampling strategy (e.g., only inferring every N records)? Wdyt?unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py (1)
19-31: 🛠️ Refactor suggestionConsider consolidating duplicate fixtures, wdyt?
This fixture is identical to the one in
test_json_decoder.py. Consider moving it to a shared test utilities module to avoid duplication:# test_utils.py @pytest.fixture(name="large_events_response") def large_event_response_fixture(): # ... shared implementation ...Then import and use it in both test files.
🧹 Nitpick comments (9)
airbyte_cdk/connector_builder/test_reader/message_grouper.py (2)
40-82: Docstring clarity check.This docstring is very thorough, but would you like to clarify in the docstring how partial slices/pages are handled if the loop ends early (e.g., due to record limit)? It might help future readers. Wdyt?
118-124: Consolidate duplicate slice parsing logic?We call
_parse_slice_description(message.log.message)at lines 119 and 124. Would you consider extracting that repeated call into a small helper to avoid duplication? Wdyt?- yield _processed_slice() - current_slice_descriptor = _parse_slice_description(message.log.message) # type: ignore + yield _processed_slice() + current_slice_descriptor = _parse_current_slice_descriptor(message)airbyte_cdk/connector_builder/test_reader/reader.py (4)
71-80: Constructor param naming.The constructor uses
_max_pages_per_slice,_max_slices,_max_record_limit. Any interest in clarifying them as well in the docstring or inline comment? For example, clarifying if they apply to all streams, or only a single stream read. Wdyt?
301-336: Typo in function name.The function
_get_infered_schemahas a small spelling mismatch ("infered" vs. "inferred"). Would you consider renaming it to_get_inferred_schemafor clarity? Wdyt?-def _get_infered_schema( +def _get_inferred_schema(
239-300: Add test coverage for unknown message group types.In
_categorise_groups, an unknown message group type raises aValueError. Would you consider adding a unit test to confirm that scenario is handled gracefully and the error is raised as expected? Wdyt?
410-442: Validate multiple limits more explicitly.
_has_reached_limitchecks slices, pages, and record counts. Is it worth adding a short docstring note or logging line to clarify which limit was triggered if the function returns True? For large codebases, that might simplify debugging. Wdyt?airbyte_cdk/connector_builder/test_reader/helpers.py (2)
174-200: Consider merging_close_pageand_close_current_page.Currently,
_close_pageand_close_current_pagecompose similar operations. Would you like to merge them into a single function that closes a page and returns reset objects, avoiding duplication? Wdyt?
474-521: Refine approach for unknown log messages.In
_handle_log_message, only HTTP-based logs are processed in detail, while everything else is appended as a raw log. Would you consider capturing more metadata for non-HTTP logs (e.g., partial parsing) to aid debugging? Wdyt?unit_tests/sources/declarative/decoders/test_json_decoder.py (1)
51-63: Consider extracting magic numbers into constants, wdyt?The fixture uses magic numbers for the file size. Consider extracting these into named constants for better maintainability:
+LINES_IN_RESPONSE = 2_000_000 # ≈ 58 MB of response +TEST_EMAIL = "email1@example.com" @pytest.mark.slow @pytest.fixture(name="large_events_response") def large_event_response_fixture(): - data = {"email": "email1@example.com"} - jsonl_string = f"{json.dumps(data)}\n" - lines_in_response = 2_000_000 # ≈ 58 MB of response + data = {"email": TEST_EMAIL} + jsonl_string = f"{json.dumps(data)}\n"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
airbyte_cdk/connector_builder/connector_builder_handler.py(3 hunks)airbyte_cdk/connector_builder/message_grouper.py(0 hunks)airbyte_cdk/connector_builder/test_reader/__init__.py(1 hunks)airbyte_cdk/connector_builder/test_reader/helpers.py(1 hunks)airbyte_cdk/connector_builder/test_reader/message_grouper.py(1 hunks)airbyte_cdk/connector_builder/test_reader/reader.py(1 hunks)airbyte_cdk/connector_builder/test_reader/types.py(1 hunks)unit_tests/connector_builder/test_connector_builder_handler.py(2 hunks)unit_tests/connector_builder/test_message_grouper.py(32 hunks)unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py(1 hunks)unit_tests/sources/declarative/decoders/test_json_decoder.py(1 hunks)
💤 Files with no reviewable changes (1)
- airbyte_cdk/connector_builder/message_grouper.py
✅ Files skipped from review due to trivial changes (1)
- airbyte_cdk/connector_builder/test_reader/init.py
🔇 Additional comments (8)
airbyte_cdk/connector_builder/test_reader/message_grouper.py (2)
106-116: Consider verifying the page's partial content.Inside the
while records_count < limitloop, we close the page unconditionally once_need_to_close_pageis true. Would you like to add an extra check to handle a scenario where the page might have zero records? Or is that an intentional design choice? Wdyt?
157-166: Ensure final slice is yielded if no records are processed.In the
elseblock, we only finalize the page and slice ifcurrent_page_request or current_page_response or current_page_recordsis truthy. Could a slice with zero records and no HTTP logs be missed? Would you like to confirm that scenario? Wdyt?airbyte_cdk/connector_builder/test_reader/helpers.py (1)
263-283: Graceful error handling for slice JSON parse.
_parse_slice_descriptioncould potentially fail if the log message is malformed or missing the prefix. Do you want to catch JSON errors here to yield a more explanatory log message or fallback? Wdyt?airbyte_cdk/connector_builder/test_reader/types.py (1)
1-76: Great job on the type definitions and documentation!The type aliases are well-organized, comprehensive, and thoroughly documented. The docstrings provide clear explanations of each type's purpose and structure.
airbyte_cdk/connector_builder/connector_builder_handler.py (1)
6-9: Clean refactoring of imports and TestReader integration!The changes improve code organization through better import structure and the transition to TestReader is clean and well-implemented.
Also applies to: 77-86
unit_tests/connector_builder/test_message_grouper.py (2)
19-20: LGTM! The imports look good.The imports are correctly updated to use the new
TestReaderclass and its helper functions.
148-148: LGTM! The test cases are correctly updated.The test cases are properly updated to use
TestReaderinstead ofMessageGrouper, and the method calls are changed fromget_message_groupstorun_test_read. The test coverage is maintained.Also applies to: 206-207, 290-292, 343-343, 446-449, 504-506, 617-619, 661-663, 689-691, 708-709, 733-734, 763-764, 791-792, 808-809, 822-823, 848-850, 877-879
unit_tests/connector_builder/test_connector_builder_handler.py (1)
553-553: LGTM! The mock patch paths are correctly updated.The mock patch paths are properly updated to use
TestReader.run_test_readinstead ofMessageGrouper.get_message_groups. The test coverage is maintained.Also applies to: 1172-1172
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (8)
airbyte_cdk/connector_builder/test_reader/helpers.py (5)
65-83: Consider removing nested keys with__prefixes as well?Currently,
clean_configonly deletes top-level keys that start with__. Would you like to remove nested keys with__prefixes too, ensuring thorough cleanup of internal or meta-data fields? wdyt?
86-128: Default to an empty dictionary for headers?
create_request_from_log_messagesetsheadersto whateverrequest.get("headers")provides. For safety, especially if headers can be absent, would you consider defaulting it to an empty dict to avoidNoneTypeissues? wdyt?
151-173: Log malformed JSON errors for visibility?
parse_jsonsilently returnsNoneonJSONDecodeError. Would you consider logging a debug/info message to indicate that JSON parsing failed (for easier troubleshooting)? wdyt?
198-228: Fix docstring reference to_is_page_http_request?The docstring in
should_close_pagereferences_is_page_http_request, but the actual function isis_page_http_request. Want to update it for clarity? wdyt?
434-464: Return a more descriptive structure than(None, None)?
handle_current_pagereturns(None, None)to clear references. Would you consider returning a small named container (e.g., a tuple with fields) or an explicit indicator to improve readability? wdyt?airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)
133-146: Ensure final slice is yielded on manual break.In the
else:block after thewhile, we only finalize pages if the loop finishes naturally. If we decide to break early in the future, we might skip yielding. Is that acceptable or do we need a fallback ensuring the final slice is yielded? wdyt?unit_tests/connector_builder/test_message_grouper.py (2)
206-207: Refactored usage ofTestReaderin test functions.All these line changes consistently replace
MessageGrouperusage withTestReadercalls. This aligns with the new code structure.
Would you like to renameconnector_builder_handlertotest_reader_handlerin tests for clarity? wdyt?Also applies to: 290-292, 314-314, 343-344, 422-423, 458-459, 504-507
647-671: General test flow and structure.The series of tests confirm boundary conditions (maximum slices, pages, no slices, error streams, multiple control messages, etc.). Everything appears logically consistent.
Would you like to unify repeated test scaffolding (e.g., source set up) with a fixture or helper? wdyt?Also applies to: 673-699, 701-719, 722-772, 774-800, 802-817, 819-888
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/connector_builder/test_reader/helpers.py(1 hunks)airbyte_cdk/connector_builder/test_reader/message_grouper.py(1 hunks)airbyte_cdk/connector_builder/test_reader/reader.py(1 hunks)unit_tests/connector_builder/test_message_grouper.py(32 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/connector_builder/test_reader/reader.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)
78-78: Confirm processing logic forlimit == 0.The loop condition is
while records_count < limit and (message := next(messages, None)):. Iflimitis 0, the loop never runs. Is that intended behavior, or should we handle that differently? wdyt?unit_tests/connector_builder/test_message_grouper.py (3)
19-20: Imports look correct.The new import references
TestReaderfromairbyte_cdk.connector_builder.test_reader. This aligns with the recent refactor. All good here!
148-219: Use@patchconsistently for the test?The test at lines 148-219 correctly patches
AirbyteEntrypoint.read. All logic appears valid. Perhaps confirm if we need additional patches or mocks for external network calls? wdyt?
592-645: Spot check large test coverage.The test from lines 592-645 adds multiple slices verifying correct grouping. The scenario thoroughly tests partial slices, multiple slices, and the final state message. Looks solid!
There was a problem hiding this comment.
Baz (@bazarnov) - This is a very big improvement for readability and maintainability. Thank you!! I added a couple questions inline, and one suggestion regarding module design in helpers.py - but nothing I could see that should be blocking.
|
Since Baz tested locally, me happy. Ship it! |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
airbyte_cdk/connector_builder/test_reader/helpers.py (4)
67-86: Consider deeper key removal?Right now,
clean_config()only removes keys at the top-level. Would you consider recursively removing keys starting with__, if nested? wdyt?
88-130: Validate the HTTP method?Currently, we assume the method is correct. Would you consider normalizing or validating the method (e.g., matching known HTTP verbs)? wdyt?
515-563: Introduce a typed return object?We return a 4-tuple for (page_flag, request, response, log). What if we used a NamedTuple or dataclass for better readability? wdyt?
310-331: Additional fallback logic?We rely on
http.is_auxiliary. Ifmessage.get("http")is missing these keys, we default toFalse. Would you prefer logging a warning if the structure is unexpected? wdyt?airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)
148-161: Reduce duplication?The logic here mirrors the page-closing and slice-handling code found earlier in the loop. Would you consider extracting it into a reusable function to keep it DRY? wdyt?
unit_tests/connector_builder/test_message_grouper.py (1)
646-647: Consider test clarity?We use
[slice_message(), request_response_log_message(...)] * maximum_number_of_slicesto replicate slices. Would you consider constructing them more explicitly for clarity and future debugging? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/connector_builder/test_reader/helpers.py(1 hunks)airbyte_cdk/connector_builder/test_reader/message_grouper.py(1 hunks)airbyte_cdk/connector_builder/test_reader/reader.py(1 hunks)unit_tests/connector_builder/test_message_grouper.py(32 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/connector_builder/test_reader/reader.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (7)
airbyte_cdk/connector_builder/test_reader/helpers.py (1)
467-513: Verify the message fields?The code forcibly casts
titleanddescriptionto strings. Should we handle scenarios where they might be missing or of a different type, instead of defaulting silently? wdyt?airbyte_cdk/connector_builder/test_reader/message_grouper.py (2)
93-93: Validate negative or zero limit?In the
whileloop, we rely onrecords_count < limit. What iflimitis zero or negative? Should we raise an early error here to fail fast? wdyt?
132-134: Confirm the error logging approach?When encountering a trace with error, we simply yield
message.trace. Would you like to also log or handle the error scenario for better debugging? wdyt?unit_tests/connector_builder/test_message_grouper.py (4)
148-149: Mock side effect coverage?We patch
AirbyteEntrypoint.readbut don't assert side effects onmock_entrypoint_read. Would you consider verifying call arguments for thorough test coverage? wdyt?
313-314: Negative record limits?We test some record limits, but not negative ones. Would you consider adding a negative limit test to confirm correct error handling? wdyt?
421-422: Good coverage for zero limit.It's great we confirm that limit=0 raises a ValueError, ensuring consistent usage. This test is well-defined.
802-814: Test partial auxiliary data?We only test well-formed auxiliary requests. Are you open to adding a scenario with missing or invalid data to ensure correct behavior? wdyt?
What
Resolving:
How
The pull request refactors the
airbyte_cdk/connector_builder/message_grouper.pymodule by replacing theMessageGrouperclass with the newTestReaderclass. Key changes include:Imports and Class Replacements:
airbyte_cdk/connector_builder/test_readerdir holds theTestReadand related logic implementation:readerholds the logic of how we actually make a test read.message_grouperholds the logic of how we group the emitted source messages.helpersholds a bunch of functions to re-use inreaderandmessage_groupertypesholds the specific type-collection used to mark the output from certain methods used inreaderandmessage_grouperdataclassand field from thedataclasses.Unit Tests Updates:
MessageGroupertoTestReader.New Files Added:
test_reader, including:__init__.py,helpers.py,message_grouper.py,reader.py, andtypes.py.Test Markers:
Added
@pytest.mark.slowto some test cases to indicate they may take longer to run, when running locally.These changes aim to improve the code's readability and maintainability by introducing the TestReader class, which has enhanced functionality and updated testing procedures.
User Impact
No impact is expected; the logic and behavior remain the same.
Local Testing using this guide:
Passed:
Summary by CodeRabbit
Summary by CodeRabbit
New Features
TestReaderfor performing test reads and validating configurations.Refactor
MessageGrouperclass to streamline message processing.Tests