Skip to content

Comments

feat: enable async job configurability via config#435

Merged
Patrick Nilan (pnilan) merged 4 commits intomainfrom
pnilan/async/enable-job-limit-configurability
Mar 19, 2025
Merged

feat: enable async job configurability via config#435
Patrick Nilan (pnilan) merged 4 commits intomainfrom
pnilan/async/enable-job-limit-configurability

Conversation

@pnilan
Copy link
Contributor

@pnilan Patrick Nilan (pnilan) commented Mar 19, 2025

Summary by CodeRabbit

  • New Features

    • Asynchronous job settings now support both numeric and string inputs for job limits, offering more flexible configuration options.
    • Enhanced schema clarity with updated examples for job limit configurations.
  • Documentation

    • Schema examples and formatting for asynchronous job configurations have been updated to clearly illustrate valid values.
  • Tests

    • Additional test coverage has been implemented to ensure reliable handling of diverse job limit inputs, particularly for string representations.

@github-actions github-actions bot added the enhancement New feature or request label Mar 19, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2025

📝 Walkthrough

Walkthrough

This pull request introduces modifications to the job tracking and component schema logic. In the JobTracker data class, the limit attribute now supports an interpolated string in addition to integers, with added configuration support and improved error handling via a revamped __post_init__ method. The component schema YAML and model files are updated to allow the max_concurrent_async_job_count to accept both integer and string values and include enhanced examples and formatting adjustments. New unit tests validate this updated behavior.

Changes

File(s) Summary
airbyte_cdk/sources/declarative/async_job/job_tracker.py Converted JobTracker to a dataclass; updated limit from int to Union[int, str]; added a config attribute; revamped initialization with error handling.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Modified max_concurrent_async_job_count to accept both integer and string; added examples; updated formatting for lazy_read_pointer and enum definitions.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Updated max_concurrent_async_job_count in DeclarativeSource1 and DeclarativeSource2 classes to Optional[Union[int, str]] with example values.
unit_tests/sources/declarative/async_job/test_job_tracker.py Added parameterized tests for JobTracker to verify behavior when limit is provided as a string, ensuring fallback to 1 upon evaluation errors.

Sequence Diagram(s)

sequenceDiagram
    participant JT as JobTracker
    participant IS as InterpolatedString
    participant Log as Logger

    JT->>JT: __post_init__() invocation
    alt limit is string
        JT->>IS: Evaluate limit
        alt Evaluation successful
            IS-->>JT: Return numeric limit
            JT->>JT: Set _limit (ensure >= 1)
        else Evaluation fails
            IS-->>JT: Throw exception
            JT->>Log: Log warning message
            JT->>JT: Set _limit to 1
        end
    else limit is an integer
        JT->>JT: Validate and assign _limit (ensure >= 1)
    end
Loading

Possibly related PRs

Suggested reviewers

  • brianjlai: Do you think involving Brian for his insights on the JobTracker changes would be beneficial, wdyt?
  • maxi297: Would Maxi’s review on the updated configuration and schema types be helpful, wdyt?
  • bazarnov: How about Bazarnov’s input regarding the unit tests and error handling in __post_init__, wdyt?

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
unit_tests/sources/declarative/async_job/test_job_tracker.py (1)

62-67: Test name matches behavior, but consider making it more specific.

This test correctly verifies that when interpolation fails, the limit is set to 1. The test name could be more specific about what makes interpolation fail (in this case, a non-integer value). wdyt?

-def test_given_interpolated_limit_and_empty_config_when_init_then_set_to_1():
+def test_given_non_integer_interpolated_value_when_init_then_set_to_1():
     tracker = JobTracker(
         "{{ config['max_concurrent_async_job_count'] }}",
         {"max_concurrent_async_job_count": "hello"},
     )
     assert tracker._limit == 1
airbyte_cdk/sources/declarative/async_job/job_tracker.py (1)

24-41: Well-implemented string interpolation with appropriate error handling.

The implementation for handling string limits and interpolation looks good. You're gracefully handling errors and providing clear warning messages. Small suggestion: consider adding a return type hint for __post_init__ for completeness, wdyt?

-    def __post_init__(self):
+    def __post_init__(self) -> None:
         self._jobs: Set[str] = set()
         self._lock = threading.Lock()
         if isinstance(self.limit, str):
             try:
                 self.limit = int(
                     InterpolatedString(self.limit, parameters={}).eval(config=self.config)
                 )
             except Exception as e:
                 LOGGER.warning(
                     f"Error interpolating max job count: {self.limit}. Setting to 1. {e}"
                 )
                 self.limit = 1
         if self.limit < 1:
             LOGGER.warning(
                 f"The `max_concurrent_async_job_count` property is less than 1: {self.limit}. Setting to 1. Please update the source manifest to set a valid value."
             )
         self._limit = self.limit if self.limit >= 1 else 1
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

50-55: Enhanced Flexibility for max_concurrent_async_job_count
You've updated the property to accept both integer and string types and added examples (including an interpolated configuration value), which nicely enables dynamic async job configurability. Would you consider adding some guidance or constraints (such as expected ranges or value formats) to further help users understand acceptable inputs? wdyt?


2902-2902: Consistent Default Formatting for lazy_read_pointer
The default value has been reformatted from [ ] to [], which improves consistency with our YAML style guidelines. Was this reformatting aimed at resolving any noted discrepancies in our defaults? wdyt?


3207-3207: Clean Enum Formatting in StateDelegatingStream
The enum formatting has been tightened to remove extra spaces (now [StateDelegatingStream]), which enhances readability and adheres to best practices. Do you think we should perform a similar review on other enum definitions to ensure uniform styling? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b1f325 and f2b6121.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/async_job/job_tracker.py (2 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2 hunks)
  • unit_tests/sources/declarative/async_job/test_job_tracker.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
unit_tests/sources/declarative/async_job/test_job_tracker.py (1)
airbyte_cdk/sources/declarative/async_job/job_tracker.py (1) (1)
  • JobTracker (20-95)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/async_job/job_tracker.py

[error] 3-3: Ruff: Import block is un-sorted or un-formatted. Organize imports.

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1893-1897: LGTM! Good addition of string support for max_concurrent_async_job_count.

The change to support both integer and string types for the max_concurrent_async_job_count field along with the examples is well-implemented and aligns with the changes in the JobTracker class.


1926-1930: LGTM! Consistent implementation for DeclarativeSource2.

The changes are consistently applied to both DeclarativeSource1 and DeclarativeSource2 classes, which is good for maintaining code consistency.

airbyte_cdk/sources/declarative/async_job/job_tracker.py (1)

19-23: LGTM! Good use of dataclass for simplifying the code.

Converting JobTracker to a dataclass and adding type hints for the attributes is a great improvement. The default_factory for the config parameter ensures backward compatibility.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
unit_tests/sources/declarative/async_job/test_job_tracker.py (1)

62-67: Great error handling test case!

This test correctly verifies the fallback behavior when an interpolated string doesn't evaluate to a valid integer. The implementation will set _limit to 1 in this case, which is what you're testing.

As a suggestion - would it be worth adding another test case where the config is completely empty? Something like:

def test_given_interpolated_limit_and_missing_config_when_init_then_set_to_1():
    tracker = JobTracker("{{ config['max_concurrent_async_job_count'] }}", {})
    assert tracker._limit == 1

Just to cover that scenario too. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4477538 and cc8ea41.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/async_job/test_job_tracker.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
unit_tests/sources/declarative/async_job/test_job_tracker.py (1)
airbyte_cdk/sources/declarative/async_job/job_tracker.py (1) (1)
  • JobTracker (20-95)
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/async_job/test_job_tracker.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
unit_tests/sources/declarative/async_job/test_job_tracker.py (2)

66-71: Nice error handling test!

This test properly validates the fallback behavior when interpolation fails to resolve to a valid integer. It aligns perfectly with the __post_init__ implementation that sets limit to 1 when an exception occurs during interpolation.

One suggestion: would it be valuable to add another test case where the interpolated string resolves to a negative value? This would verify that both direct validation and interpolation fallback to 1 when the value is invalid. wdyt?


50-63: Verify type annotations

The parameters don't have type annotations. Do you think adding type hints like (limit: str, config: dict, expected_limit: int) would enhance code readability? This would align with the typing convention used in the existing test at line 45.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc8ea41 and c12df2c.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/async_job/test_job_tracker.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
unit_tests/sources/declarative/async_job/test_job_tracker.py (1)
airbyte_cdk/sources/declarative/async_job/job_tracker.py (1) (1)
  • JobTracker (20-95)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/async_job/test_job_tracker.py (1)

50-63: Good work on the updated parameterized test!

The test now correctly validates that string limits properly interpolate using config values, with clear parameters and assertions. This matches the implementation in job_tracker.py where string values are interpolated and converted to integers.

@pnilan Patrick Nilan (pnilan) merged commit 274d1f2 into main Mar 19, 2025
25 of 26 checks passed
@pnilan Patrick Nilan (pnilan) deleted the pnilan/async/enable-job-limit-configurability branch March 19, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant