Skip to content

Conversation

@gmuskan95
Copy link

When values containing single quotes (e.g., "peter's") were used in validValues or missingValues, Soda would generate invalid SQL like WHERE field IN ('peter's'), causing syntax errors.

This fix escapes single quotes by doubling them (SQL standard) before generating SodaCL YAML, resulting in valid SQL: WHERE field IN ('peter''s').

Checklist

  • Tests pass - Added 2 new tests and all 9 tests in test_data_contract_checks.py pass
  • ruff format - Applied formatting for the files modified only
  • README.md updated - Not needed (internal bug fix)
  • CHANGELOG.md entry added - Added under "Unreleased > Fixed"

Changes

  • Modified check_property_invalid_values() to escape single quotes in validValues
  • Modified check_property_missing_values() to escape single quotes in missingValues
  • Added tests to verify the fix

@Peterdha
Copy link

Hi Muskan, thanks for taking this up :)
However, as we are using databricks, escaping single quote requires a backslash so'peter's'should be, in my case, 'peter\'s', see also https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal
Also wondering if the yaml quote escaping rules are not going to interfere

@Peterdha
Copy link

unless ofc Soda takes care of translating to sql dialects...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants