Skip to content

feat(cdk): add option to make empty cells None when parsing csv with CsvParser#575

Merged
Aldo Gonzalez (aldogonzalez8) merged 2 commits intomainfrom
ac8/add-option-to-set-empty-cells-to-none
Jun 3, 2025
Merged

feat(cdk): add option to make empty cells None when parsing csv with CsvParser#575
Aldo Gonzalez (aldogonzalez8) merged 2 commits intomainfrom
ac8/add-option-to-set-empty-cells-to-none

Conversation

@aldogonzalez8
Copy link
Contributor

@aldogonzalez8 Aldo Gonzalez (aldogonzalez8) commented Jun 2, 2025

Option to default values to None is an empty cell, when migrating to manifest and validating records, it is just easier to work with

{"type":"RECORD","record":{"stream":"labels","data":{"Type":"Label","Status":"Active","Id":"some_id","Modified Time":"04/27/2023 17:16:34.610","Description":"my label new label","Label":"integration-test-label","Color":"#F89CAF"},"emitted_at":1748901881365}}

Rather:

{"type":"RECORD","record":{"stream":"labels","data":{"Type":"Label","Status":"Active","Id":"some_id","Parent Id":"","Campaign Id":"","Sub Type":"","Campaign":"","Ad Group":"","Asset Group":"","Website":"","Sync Time":"","Client Id":"","Modified Time":"04/27/2023 17:16:34.610","MSCLKID Auto Tagging Enabled":"","Include View Through Conversions":"","Profile Expansion Enabled":"","Features":"","Tracking Template":"","Final Url Suffix":"","Custom Parameter":"","Final Url":"","Mobile Final Url":"","Ad Click Parallel Tracking":"","Verified Tracking Setting":"","Verified Tracking Settings":"","Third Party Measurement Settings":"","Auto Apply Recommendations":"","Allow Image Auto Retrieve":"","Business Attributes":"","Blocked Segment Ids":"","Time Zone":"","Budget Id":"","Budget Name":"","Budget":"","Budget Type":"","Bid Strategy Id":"","Bid Strategy Name":"","Bid Strategy Type":"","Bid Strategy MaxCpc":"","Bid Strategy TargetCpa":"","Bid Strategy TargetRoas":"","Bid Strategy TargetAdPosition":"","Bid Strategy TargetImpressionShare":"","Bid Strategy MaxCpm":"","Inherited Bid Strategy Type":"","KeywordVariantMatchEnabled":"","Campaign Type":"","Campaign Sub Type":"","Priority":"","LocalInventoryAdsEnabled":"","Campaign Goal":"","Is Lead Gen Campaign":"","ShoppableAdsEnabled":"","RSA Auto Generated Assets Enabled":"","Predictive Targeting Enabled":"","Automated Call To Action Opt Out":"","Call To Action Opt Out":"","Destination Channel":"","Is Multi Channel Campaign":"","Is Broad Match Only Campaign":"","Is Deal Campaign":"","Use Campaign Level Dates":"","Should Serve On MSAN":"","Campaign Objective Type":"","Vanity Pharma Display URL Mode":"","Vanity Pharma Website Description":"","Enabled External Channel Sync":"","Start Date":"","End Date":"","Network Distribution":"","Ad Rotation":"","Cpc Bid":"","Cpm Bid":"","Cpv Bid":"","Mcpa Bid":"","Language":"","Target Setting":"","Privacy Status":"","Bid Option":"","Bid Boost Value":"","Ad Group Type":"","Hotel Ad Group Type":"","Percent Cpc Bid":"","Commission Rate":"","Placement":"","Canvas":"","Lead Gen SOV":"","Use Optimized Targeting":"","Use Predictive Targeting":"","Boost Publisher IDs":"","Boost Account IDs":"","Boost AdUnit IDs":"","Boost Trigger IDs":"","Frequency Cap Settings":"","First Party Bundles":"","Title":"","Text":"","Display Url":"","Domain":"","Destination Url":"","Business Name":"","Phone Number":"","Promotion":"","Editorial Status":"","Editorial Location":"","Editorial Term":"","Editorial Reason Code":"","Editorial Appeal Status":"","Editorial Entity Id":"","Device Preference":"","Ad Format Preference":"","Title Part 1":"","Title Part 2":"","Title Part 3":"","Text Part 2":"","Path 1":"","Path 2":"","Source Ad Id":"","Keyword":"","Match Type":"","Bid":"","Param1":"","Param2":"","Param3":"","Target":"","Physical Intent":"","Bid Adjustment":"","Radius Target Id":"","Name":"","OS Names":"","Radius":"","Unit":"","Business Id":"","From Hour":"","From Minute":"","To Hour":"","To Minute":"","Min Target Value":"","Max Target Value":"","Version":"","Ad Schedule":"","Use Searcher Time Zone":"","Sitelink Extension Order":"","Sitelink Extension Link Text":"","Sitelink Extension Destination Url":"","Sitelink Extension Description1":"","Sitelink Extension Description2":"","Geo Code Status":"","Map Icon":"","Business Icon":"","Address Line 1":"","Address Line 2":"","Postal Code":"","City":"","State Or Province Code":"","Province Name":"","Latitude":"","Longitude":"","StoreCode":"","SundayHours":"","MondayHours":"","TuesdayHours":"","WednesdayHours":"","ThursdayHours":"","FridayHours":"","SaturdayHours":"","SpecialHours":"","LogoPhotoURL":"","GoogleIdentifier":"","Country Code":"","Call Only":"","Call Tracking Enabled":"","Toll Free":"","Alternative Text":"","Media Ids":"","Display Text":"","Layouts":"","Publisher Countries":"","Store Id":"","Product Operator 1":"","Product Operator 2":"","Product Operator 3":"","Product Operator 4":"","Product Operator 5":"","Product Operator 6":"","Product Operator 7":"","Product Operator 8":"","Product Condition 1":"","Product Value 1":"","Product Condition 2":"","Product Value 2":"","Product Condition 3":"","Product Value 3":"","Product Condition 4":"","Product Value 4":"","Product Condition 5":"","Product Value 5":"","Product Condition 6":"","Product Value 6":"","Product Condition 7":"","Product Value 7":"","Action Text":"","Callout Text":"","Feed Id":"","Feed Type Id":"","Flyer Name":"","Media Urls":"","Action Name":"","Action Description":"","Corporate Image":"","Media Url":"","Form Headline":"","Form Business Name":"","Form Description":"","Form Policy Url":"","Form Questions":"","Confirmation Message":"","Confirmation Description":"","Confirmation Action":"","Confirmation Url":"","Lead Delivery":"","Lead Emails":"","Lead Webhook Url":"","Lead Webhook Key":"","Price Extension Type":"","Header 1":"","Header 2":"","Header 3":"","Header 4":"","Header 5":"","Header 6":"","Header 7":"","Header 8":"","Price Description 1":"","Price Description 2":"","Price Description 3":"","Price Description 4":"","Price Description 5":"","Price Description 6":"","Price Description 7":"","Price Description 8":"","Final Url 1":"","Final Url 2":"","Final Url 3":"","Final Url 4":"","Final Url 5":"","Final Url 6":"","Final Url 7":"","Final Url 8":"","Final Mobile Url 1":"","Final Mobile Url 2":"","Final Mobile Url 3":"","Final Mobile Url 4":"","Final Mobile Url 5":"","Final Mobile Url 6":"","Final Mobile Url 7":"","Final Mobile Url 8":"","Price 1":"","Price 2":"","Price 3":"","Price 4":"","Price 5":"","Price 6":"","Price 7":"","Price 8":"","Currency Code 1":"","Currency Code 2":"","Currency Code 3":"","Currency Code 4":"","Currency Code 5":"","Currency Code 6":"","Currency Code 7":"","Currency Code 8":"","Price Unit 1":"","Price Unit 2":"","Price Unit 3":"","Price Unit 4":"","Price Unit 5":"","Price Unit 6":"","Price Unit 7":"","Price Unit 8":"","Price Qualifier 1":"","Price Qualifier 2":"","Price Qualifier 3":"","Price Qualifier 4":"","Price Qualifier 5":"","Price Qualifier 6":"","Price Qualifier 7":"","Price Qualifier 8":"","Promotion Target":"","Discount Modifier":"","Percent Off":"","Money Amount Off":"","Promotion Code":"","Orders Over Amount":"","Occasion":"","Promotion Start":"","Promotion End":"","Currency Code":"","Is Exact":"","Video Id":"","Thumbnail Id":"","Video Status":"","Video Url":"","Business Logo":"","Domain Name":"","Structured Snippet Header":"","Structured Snippet Values":"","AdExtension Header Type":"","Texts":"","FeedLabel":"","Spend":"","Impressions":"","Clicks":"","CTR":"","Avg CPC":"","Avg CPM":"","Avg position":"","Conversions":"","CPA":"","Quality Score":"","Keyword Relevance":"","Landing Page Relevance":"","Landing Page User Experience":"","App Platform":"","App Id":"","Tracking Enabled":"","App Status":"","Error":"","Error Number":"","Field Path":"","Error Detail":"","Is Excluded":"","Parent Criterion Id":"","Audience":"","Audience Id":"","Scope":"","Membership Duration":"","UET Tag Id":"","Description":"my label new label","Remarketing Rule":"","Audience Search Size":"","Audience Network Size":"","Product Audience Type":"","Supported Campaign Types":"","Source Id":"","Domain Language":"","Source":"","Dynamic Description Enabled":"","Dynamic Ad Target Condition 1":"","Dynamic Ad Target Condition 2":"","Dynamic Ad Target Condition 3":"","Dynamic Ad Target Condition Operator 1":"","Dynamic Ad Target Condition Operator 2":"","Dynamic Ad Target Condition Operator 3":"","Dynamic Ad Target Value 1":"","Dynamic Ad Target Value 2":"","Dynamic Ad Target Value 3":"","Label":"integration-test-label","Color":"#F89CAF","Microsoft Click Id":"","Conversion Name":"","Conversion Value":"","Conversion Time":"","Conversion Currency Code":"","Adjustment Value":"","Adjustment Time":"","Adjustment Currency Code":"","Adjustment Type":"","External Attribution Credit":"","External Attribution Model":"","Hashed Phone Number":"","Hashed Email Address":"","Transaction Id":"","Maximum Bid":"","Bid Multiplier Source":"","Call To Action":"","Call To Action Language":"","Call To Action Text":"","Headline":"","Long Headline":"","Landscape Image Media Id":"","Square Image Media Id":"","Landscape Logo Media Id":"","Square Logo Media Id":"","Images":"","Impression Tracking Urls":"","Videos":"","Ad Sub Type":"","Ad Strength":"","Ad Strength Action Items":"","Headlines":"","Long Headlines":"","Descriptions":"","Search Theme":"","HotSpots":"","Boost Anchors":"","Asset AI Enhancement Optout":"","Profile":"","Profile Id":"","Traffic Split Percent":"","Base Campaign Id":"","Experiment Campaign Id":"","Experiment Id":"","Experiment Type":"","Asset Group Target Condition 1":"","Asset Group Target Condition 2":"","Asset Group Target Condition 3":"","Asset Group Target Condition Operator 1":"","Asset Group Target Condition Operator 2":"","Asset Group Target Condition Operator 3":"","Asset Group Target Value 1":"","Asset Group Target Value 2":"","Asset Group Target Value 3":"","Category Id":"","Target Option Id":"","Feed Name":"","Custom Attributes":"","Page Feed Ids":"","Target Campaign Id":"","Target Ad Group Id":"","Schedule":"","Disclaimer Ads Enabled":"","Disclaimer Title":"","Disclaimer Name":"","Disclaimer Layout":"","Disclaimer Popup Text":"","Disclaimer Line Text":"","Ad Schedule Use Searcher Time Zone":"","Action Type":"","Combination Rule":"","Url":"","Height":"","Width":"","Aspect Ratio":"","Source Url":"","Thumbnail Url":"","Duration In Milliseconds":"","Video Bit Rate":"","Video File Size":"","Video Format":"","Cashback Percent":"","Cashback Monthly Budget":"","Cashback Scope":"","Personalized Offers Enabled":"","Personalized Coupons Enabled":"","Is Promotions For Brands":"","App Store":"","Multi Media Ad Bid Adjustment":"","Bid Strategy TargetCostPerSale":"","Bid Strategy PercentMaxCpc":"","Bid Strategy CommissionRate":"","Bid Strategy ManualCpi":"","Bid Strategy ManualCpc":"","Goal Id":"","AdCustomizer DataType":"","AdCustomizer AttributeValue":"","Smart Listing":"","Url Expansion Opt Out":"","Use MaxClicks":"","Auto Generated Text Assets Opt Out":"","Auto Generated Image Assets Opt Out":"","Cost Per Sale Opt Out":"","Condition 1":"","Condition 2":"","Condition 3":"","Value 1":"","Value 2":"","Value 3":"","Condition Operator 1":"","Condition Operator 2":"","Condition Operator 3":"","Is BSR Enabled":"","BSR Ad Distribution":"","Audience Group Id":"","Audience Group Name":"","Audiences":"","Age Ranges":"","Gender Types":"","Negative Audiences":"","Company Name":"","Job Function":"","Industry":"","Percent Bid":"","Hotel Attribute":"","Hotel Attribute Value":"","Parent Listing Group Id":"","Brand Id":"","Brand Name":"","Brand Url":"","Editorial Status Date":"","Attribution Model Type":"","Conversion Window In Minutes":"","Count Type":"","Exclude From Bidding":"","Goal Category":"","Is Enhanced Conversions Enabled":"","Revenue Type":"","Revenue Value":"","Tracking Status":"","View Through Conversion Window In Minutes":"","Minimum Duration In Second":"","Action Expression":"","Action Operator":"","Category Expression":"","Category Operator":"","Label Expression":"","Label Operator":"","Event Value":"","Event Value Operator":"","Is Externally Attributed":"","Minimum Pages Viewed":"","URL Expression":"","URL Operator":"","Seasonality Adjustment":"","Data Exclusion":"","Device Type":"","Campaign Associations":"","Impression Campaign Id":"","Impression Ad Group Id":"","Entity Type":"","Additional Conversion Value":"","New Customer Acquisition Goal Id":"","New Customer Acquisition Bid Only Mode":"","Conversion Value Rule Value":"","Conversion Value Rule Operator":"","Included Locations":"","Excluded Locations":"","Included Location Intent":"","Excluded Location Intent":"","Square Logos":"","Landscape Logos":"","Palettes":"","Fonts":"","Site List Item Url":"","Brand Color":"","Brand Logo":""},"emitted_at":1748907915040}}

Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/13199

Summary by CodeRabbit

  • New Features
    • Added an option to treat empty cells in CSV files as null values during data import. Users can now configure whether empty strings in CSV data are interpreted as empty strings or as nulls.
  • Tests
    • Introduced new tests to verify correct handling of empty cells in CSV data based on the new configuration option.

@github-actions github-actions bot added the enhancement New feature or request label Jun 2, 2025
@aldogonzalez8 Aldo Gonzalez (aldogonzalez8) changed the title feat(cdk): add option to make empty cells None when reading csv feat(cdk): add option to make empty cells None when parsing csv with CsvParser Jun 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 2, 2025

📝 Walkthrough

Walkthrough

A new boolean property, set_empty_cell_to_none, was introduced to control whether empty cells in CSV data are interpreted as None or as empty strings. This property is added to the schema, model, parser, and is now tested to ensure correct behavior based on its value.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added set_empty_cell_to_none boolean property (default: false) to CsvDecoder schema.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added set_empty_cell_to_none optional boolean field (default: False) to CsvDecoder class.
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py Added set_empty_cell_to_none to CsvParser; updated parsing logic to handle empty strings.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated _get_parser to pass set_empty_cell_to_none from model to CsvParser.
unit_tests/sources/declarative/decoders/test_composite_decoder.py Enhanced generate_csv to add empty strings; added tests for set_empty_cell_to_none logic.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CsvDecoder
    participant CsvParser

    User->>CsvDecoder: Configure set_empty_cell_to_none (True/False)
    CsvDecoder->>CsvParser: Instantiate with set_empty_cell_to_none
    CsvParser->>CsvParser: Parse CSV rows
    alt set_empty_cell_to_none is True
        CsvParser->>CsvParser: Convert empty strings to None
    else set_empty_cell_to_none is False
        CsvParser->>CsvParser: Keep empty strings as is
    end
    CsvParser-->>User: Yield parsed rows
Loading

Would you like me to create a similar sequence diagram illustrating the test flow as well, or does this cover your needs? Wdyt?

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)

54-62: ⚠️ Potential issue

Fix the CSV generation logic to include the gender field.

When add_empty_strings=True, the gender field is added to the data but not to the fieldnames in the CSV writer, causing a ValueError during CSV writing. Would you like to update the fieldnames dynamically? wdyt?

-    output = StringIO()
-    writer = csv.DictWriter(output, fieldnames=["id", "name", "age"], delimiter=delimiter)
-    writer.writeheader()
-    for row in data:
-        writer.writerow(row)
+    output = StringIO()
+    fieldnames = ["id", "name", "age"]
+    if add_empty_strings:
+        fieldnames.append("gender")
+    writer = csv.DictWriter(output, fieldnames=fieldnames, delimiter=delimiter)
+    writer.writeheader()
+    for row in data:
+        writer.writerow(row)
🧰 Tools
🪛 GitHub Actions: Pytest (Fast)

[error] 62-62: ValueError: dict contains fields not in fieldnames: 'gender' during CSV writing in test_composite_raw_decoder_parse_empty_strings

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

3633-3635: Add description for set_empty_cell_to_none and update CsvDecoder description
I noticed the new flag doesn’t have its own description and isn’t mentioned in the top-level CsvDecoder description. Could we add both for clarity? wdyt?

Example diff:

@@ -3620,7 +3620,8 @@
   description: "Select 'CSV' for response data that is formatted as CSV (comma-separated values). Can specify an encoding (default: 'utf-8') and a delimiter (default: ',')."
-  type: object
+  description: "Select 'CSV' for response data that is formatted as CSV (comma-separated values). Can specify an encoding (default: 'utf-8'), a delimiter (default: ','), and optionally treat empty cells as None when `set_empty_cell_to_none` is enabled."
+  type: object
   required:
     - type
@@ -3632,3 +3632,6 @@
     delimiter:
       type: string
       default: ","
+    set_empty_cell_to_none:
+      type: boolean
+      description: Interpret empty CSV cells as null values instead of empty strings.
+      default: false
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2649-2653: Great integration of set_empty_cell_to_none flag!

The new boolean is correctly passed from the CsvDecoderModel to CsvParser, enabling the option to treat empty CSV cells as None.

Would you consider adding a brief docstring or inline comment above this branch in _get_parser to highlight the new behavior for future maintainers? wdyt?

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)

274-274: Consider adding timeout to requests.get call.

Static analysis flagged that the requests.get call is missing a timeout argument, which could cause the test to hang. Would you like to add a reasonable timeout for test stability? wdyt?

-    response = requests.get("https://airbyte.io/", stream=True)
+    response = requests.get("https://airbyte.io/", stream=True, timeout=10)
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 274-274: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely

(W3101)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2841384 and f2ff9fa.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (2 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
unit_tests/sources/declarative/decoders/test_composite_decoder.py

[convention] 268-268: Missing function or method docstring

(C0116)


[warning] 274-274: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely

(W3101)


[error] 285-285: Undefined variable 'expected_recordp'

(E0602)


[warning] 284-284: Unused variable 'expected_record'

(W0612)

🪛 GitHub Actions: Pytest (Fast)
unit_tests/sources/declarative/decoders/test_composite_decoder.py

[error] 62-62: ValueError: dict contains fields not in fieldnames: 'gender' during CSV writing in test_composite_raw_decoder_parse_empty_strings

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1386-1386:

Details

✅ Verification successful

Ensure persistence of generated models via YAML schema update
Since this file is generated by datamodel-codegen, manual edits here will be overwritten on the next generation. Have we updated the source declarative_component_schema.yaml to include the new set_empty_cell_to_none property and re-run the generator? wdyt?


🏁 Script executed:

#!/bin/bash
set -eo pipefail

echo "=== Python file header (first 20 lines) ==="
head -n 20 airbyte_cdk/sources/declarative/models/declarative_component_schema.py || true
echo

echo "=== Locate YAML schema files named declarative_component_schema.yaml/yml ==="
find airbyte_cdk -type f \( -iname "declarative_component_schema.yaml" -o -iname "declarative_component_schema.yml" \) || echo "No declarative_component_schema YAML files found"
echo

echo "=== Searching for 'set_empty_cell_to_none' in all YAML files ==="
rg -n "set_empty_cell_to_none" -g "*.yaml" -g "*.yml" || echo "Property 'set_empty_cell_to_none' not found in any YAML schemas"

Length of output: 1313


🏁 Script executed:

#!/bin/bash
set -eo pipefail

echo "=== Context around 'set_empty_cell_to_none' in YAML schema ==="
rg -n -C5 "set_empty_cell_to_none" airbyte_cdk/sources/declarative/declarative_component_schema.yaml

Length of output: 596


No further action needed – schema already updated
The declarative_component_schema.yaml (around line 3633) already defines set_empty_cell_to_none, so regenerating the models will persist this change.

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (2)

106-106: LGTM! Good backward compatibility consideration.

The new set_empty_cell_to_none attribute is well-designed with a sensible default value of False to maintain backward compatibility. Nice work!


125-127: Clean and efficient implementation!

The conditional logic to convert empty strings to None is well-implemented using a dictionary comprehension. The approach is both readable and efficient. The logic correctly preserves all non-empty values while converting only empty strings to None when the flag is enabled.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)

269-291: Great test coverage for the new feature! Could you address a couple of minor improvements?

The test logic is well-structured and correctly validates both behaviors of the set_empty_cell_to_none flag. I noticed the syntax error from the previous review has been fixed - nice work!

However, there are a couple of improvements that would be helpful:

  1. Could you add a docstring explaining what this test does? Something like documenting that it tests whether empty CSV cells are correctly handled based on the flag, wdyt?

  2. The static analysis tool suggests adding a timeout to the requests.get call to prevent hanging - would you mind adding that for robustness?

@pytest.mark.parametrize("set_empty_cell_to_none", [True, False])
def test_composite_raw_decoder_parse_empty_strings(requests_mock, set_empty_cell_to_none: bool):
+    """Test that empty CSV cells are correctly handled based on the set_empty_cell_to_none flag.
+    
+    When set_empty_cell_to_none is True, empty cells should be parsed as None.
+    When False, they should remain as empty strings.
+    """
    requests_mock.register_uri(
        "GET",
        "https://airbyte.io/",
        content=generate_csv(should_compress=False, add_empty_strings=True),
    )
-    response = requests.get("https://airbyte.io/", stream=True)
+    response = requests.get("https://airbyte.io/", stream=True, timeout=30)
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 270-270: Missing function or method docstring

(C0116)


[warning] 276-276: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely

(W3101)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2ff9fa and af60064.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
unit_tests/sources/declarative/decoders/test_composite_decoder.py

[convention] 270-270: Missing function or method docstring

(C0116)


[warning] 276-276: Missing timeout argument for method 'requests.get' can cause your program to hang indefinitely

(W3101)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Build Python Package
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)

44-61: Nice implementation of the CSV generation enhancement!

The addition of the add_empty_strings parameter and its implementation looks solid. The logic correctly adds empty "gender" fields to the data and updates the fieldnames accordingly. Good use of a sensible default to maintain backward compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@aldogonzalez8 Aldo Gonzalez (aldogonzalez8) merged commit 8b534b0 into main Jun 3, 2025
32 checks passed
@aldogonzalez8 Aldo Gonzalez (aldogonzalez8) deleted the ac8/add-option-to-set-empty-cells-to-none branch June 3, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants