Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jul 30, 2025

Add disable_samples column configuration flag

Summary

This PR implements a new disable_samples configuration flag that allows users to prevent sample collection for specific columns at the column level. When disable_samples: true is set on a column in schema.yml, that column will not be sampled during test failures, regardless of any PII tags or other configuration.

Key Changes:

  • Added is_sampling_disabled_for_column() macro to check column-level configuration
  • Modified query_test_result_rows() to skip sampling when disable_samples: true is configured
  • Added comprehensive integration tests covering prevent sampling, normal sampling, and PII tag override scenarios

Usage Example:

models:
  - name: user_data
    columns:
      - name: password_hash
        config:
          disable_samples: true  # Never sample this column
      - name: email
        config:
          tags: ['pii']  # Sample exclusion via PII tags
      - name: login_count
        # Will be sampled normally

Review & Testing Checklist for Human

  • Verify column config access pattern works - Test with a real schema.yml file to ensure the graph node access path parent_model.get('columns', {}).get(test_column_name, {}).get('config', {}) is correct
  • End-to-end functionality test - Create a failing test with disable_samples: true and verify no samples are collected in test_result_rows table
  • Regression testing - Verify existing sampling functionality still works for columns without the disable_samples flag
  • Integration test validation - Run the new integration tests to ensure they pass and actually test the intended functionality
  • Edge case testing - Test behavior with missing column names, non-existent models, and different test types beyond not_null

Recommended Test Plan:

  1. Create a model with mixed column configurations (some with disable_samples: true, some without)
  2. Run tests that fail and verify only the appropriate columns have samples collected
  3. Test with various dbt test types (not_null, unique, relationships, etc.)
  4. Verify the feature works with both generic and singular tests

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    Schema["schema.yml<br/>disable_samples: true"]:::context
    TestMat["macros/edr/materializations/<br/>test/test.sql"]:::major-edit
    QueryRows["query_test_result_rows()"]:::major-edit
    CheckDisabled["is_sampling_disabled_for_column()"]:::major-edit
    GraphNodes["dbt graph.nodes<br/>column config access"]:::context
    IntTests["integration_tests/tests/<br/>test_disable_samples_config.py"]:::major-edit
    TestResults["test_result_rows table<br/>(sample storage)"]:::context

    Schema --> GraphNodes
    TestMat --> QueryRows
    QueryRows --> CheckDisabled
    CheckDisabled --> GraphNodes
    QueryRows --> TestResults
    IntTests --> TestMat
    
    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit  
        L3[Context/No Edit]:::context
    end

    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • Testing Limitation: Local integration tests failed due to environment setup issues, so the implementation couldn't be fully verified locally. CI testing will be critical to validate functionality.
  • Implementation Risk: The column configuration access pattern assumes a specific structure in dbt's graph nodes that may need adjustment based on actual dbt behavior.
  • Session Info: Implemented by Devin AI for @arbiv in session https://app.devin.ai/sessions/3838f375f01b48338b9937dbd22776e3

Link to Devin run: https://app.devin.ai/sessions/3838f375f01b48338b9937dbd22776e3
Requested by: Yosef Arbiv (@arbiv)

Summary by CodeRabbit

  • New Features

    • Added support for disabling sample row collection in test results when the "disable_samples" configuration is enabled for specific columns.
    • Sample rows are no longer collected or displayed for tests on columns with "disable_samples" set to true, including columns tagged as "pii".
  • Tests

    • Introduced new integration tests to verify correct behavior of the "disable_samples" configuration in various scenarios.

- Add is_sampling_disabled_for_column macro to check column config
- Modify query_test_result_rows to skip sampling when disable_samples=true
- Add integration tests for disable_samples functionality
- Test prevents sampling, allows normal sampling, and overrides PII tags

Co-Authored-By: Yosef Arbiv <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link

coderabbitai bot commented Jul 30, 2025

Walkthrough

A new integration test module and supporting macro logic were introduced to verify and enforce the "disable_samples" configuration for dbt test columns. The tests check that sample row collection is correctly enabled or disabled based on this setting, including when columns are tagged as PII. Macro logic now conditionally skips sampling queries when the configuration is set.

Changes

Cohort / File(s) Change Summary
Integration Tests for Sampling Configuration
integration_tests/tests/test_disable_samples_config.py
Introduced three pytest integration tests to verify "disable_samples" config behavior, including interaction with PII-tagged columns.
Macro Logic for Disabling Sampling
macros/edr/materializations/test/test.sql
Added macro is_sampling_disabled_for_column and updated sampling logic to conditionally skip sampling queries based on config.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner as Test Runner
    participant DBT as dbt
    participant Macro as query_test_result_rows
    participant Config as is_sampling_disabled_for_column

    TestRunner->>DBT: Run test with disable_samples config
    DBT->>Macro: Call query_test_result_rows
    Macro->>Config: Check is_sampling_disabled_for_column(flattened_test)
    alt Sampling disabled
        Config-->>Macro: true
        Macro-->>DBT: Skip sample query, log debug
    else Sampling enabled
        Config-->>Macro: false
        Macro->>DBT: Execute sample query
    end
    DBT-->>TestRunner: Return test results (with/without samples)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Poem

In fields of tests, where samples grow,
A toggle tells them when to show.
If "disable_samples" whispers "no,"
The rows stay hidden, none to show.
But flip the switch, let samples free—
Five nulls parade for all to see!
A rabbit cheers this clever flow. 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc8fba5 and 6841531.

📒 Files selected for processing (2)
  • integration_tests/tests/test_disable_samples_config.py (1 hunks)
  • macros/edr/materializations/test/test.sql (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
integration_tests/tests/test_disable_samples_config.py (2)
integration_tests/tests/conftest.py (2)
  • dbt_project (88-89)
  • test_id (108-111)
integration_tests/tests/dbt_project.py (2)
  • DbtProject (42-279)
  • run_query (51-58)
🔇 Additional comments (7)
macros/edr/materializations/test/test.sql (2)

114-118: LGTM! Clean implementation of the disable_samples feature.

The conditional logic correctly checks for the disable_samples configuration and returns early with appropriate debug logging. The placement after the existing ignore_passed_tests check maintains logical flow.


129-145: is_sampling_disabled_for_column macro validated against DBT graph structure

Verified that elementary.get_node returns a node whose columns map contains entries with a config dict (as seen in flatten_column in macros/edr/dbt_artifacts/upload_dbt_columns.sql). The path

parent_model.get('columns', {}).get(test_column_name, {}).get('config', {})

and use of safe_get_with_default align with existing, tested patterns. No changes required.

integration_tests/tests/test_disable_samples_config.py (5)

8-20: LGTM! Well-structured query for retrieving test sample data.

The query correctly joins the elementary test results with sample rows and uses proper dbt ref() macros for table references. The logic to get the latest test result by created_at is appropriate.


23-54: Excellent test coverage for the primary disable_samples functionality.

The test effectively validates that when disable_samples: True is configured, no sample rows are collected even when the test fails. The test data setup with null values ensures the not_null test will fail, making this a robust end-to-end test.


57-87: Good regression test ensuring normal sampling behavior is preserved.

This test validates that when disable_samples: False, the sampling functionality works as expected, collecting the configured number of sample rows with correct content. The assertion on sample content adds valuable verification.


90-121: Important test for feature interaction and precedence.

This test effectively validates that the disable_samples configuration correctly overrides PII tag behavior, ensuring the new feature takes precedence as intended. This covers a critical edge case mentioned in the PR objectives.


1-122: Comprehensive test coverage with well-structured integration tests.

The test file provides excellent coverage of the disable_samples feature with three focused tests covering the main use case, regression testing, and feature interaction. The consistent test structure and data setup make the tests maintainable and reliable.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1722340304-disable-samples-column-config

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@arbiv arbiv closed this Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants