Skip to content

Conversation

@richardwooding
Copy link
Contributor

Summary

Fixes #21 - Adds comprehensive field name validation to prevent SQL injection vulnerabilities in the CEL-to-SQL conversion pipeline.

Changes Made

1. Enhanced Field Name Validation (utils.go)

  • ✅ Enforces PostgreSQL's 63-character identifier length limit
  • ✅ Validates identifier format (must start with letter/underscore, alphanumeric + underscore only)
  • ✅ Rejects 60+ reserved SQL keywords (case-insensitive)
  • ✅ Prevents empty field names
  • ✅ Provides clear, actionable error messages

2. Added Validation to Conversion Pipeline (cel2sql.go)

  • visitSelect(): Validates field names in select expressions (e.g., obj.fieldname)
  • visitIdent(): Validates identifier names to prevent reserved keywords in SQL output

3. Comprehensive Test Coverage

  • utils_test.go: 43 unit tests covering all validation rules
  • field_validation_test.go: Integration tests through full CEL-to-SQL pipeline
  • ✅ Tests for: valid names, SQL injection attempts, reserved keywords, format violations, length limits, edge cases

Security Impact

This fix provides defense-in-depth protection against SQL injection:

  1. Blocks malicious patterns: Rejects field names containing semicolons, quotes, comments, etc.
  2. Prevents keyword confusion: Reserved SQL keywords like SELECT, DROP, UNION cannot be used as unquoted identifiers
  3. Enforces PostgreSQL limits: Prevents identifier truncation issues
  4. Multiple validation points: Validates at both identifier and field selection stages

Test Results

✅ All tests passing
✅ 56.5% code coverage in main package
✅ 88.6% coverage in pg package
✅ 43 validation test cases
✅ 7 integration test scenarios

Examples

Before (vulnerable):

// Would accept dangerous field names
celExpr := `obj.select == "test"`  // Reserved keyword
celExpr := `obj.field; DROP TABLE users`  // SQL injection attempt

After (protected):

// Now properly rejected with clear errorsfield name "select" is a reserved SQL keyword and cannot be used without quotinginvalid field name "field; DROP": must start with a letter or underscore and contain only alphanumeric characters and underscores

Testing Checklist

  • Unit tests pass (go test ./...)
  • Integration tests pass
  • Code coverage maintained
  • All linting checks pass
  • No breaking changes to public API

🤖 Generated with Claude Code

richardwooding and others added 2 commits October 22, 2025 10:54
This commit addresses issue #21 by implementing comprehensive field name
validation at multiple points in the CEL-to-SQL conversion pipeline.

Changes:
- Enhanced validateFieldName() in utils.go with proper PostgreSQL limits:
  * Maximum identifier length: 63 characters (PostgreSQL NAMEDATALEN-1)
  * Format validation: must start with letter/underscore, alphanumeric+underscore only
  * Reserved keyword checking: 60+ SQL keywords now rejected
  * Empty string validation

- Added validation calls in cel2sql.go:
  * visitSelect(): validates field names in select expressions
  * visitIdent(): validates identifier names to prevent reserved keywords in SQL

- Comprehensive test coverage:
  * utils_test.go: 43 unit tests for validateFieldName()
  * field_validation_test.go: Integration tests for full CEL-to-SQL pipeline
  * Tests cover: valid names, SQL injection attempts, reserved keywords,
    format violations, length limits, and edge cases

Security Impact:
- Prevents SQL injection through malicious field names
- Blocks use of reserved SQL keywords as unquoted identifiers
- Enforces PostgreSQL identifier naming conventions
- Defense-in-depth approach with validation at multiple pipeline stages

All tests passing with 56.5% coverage in main package.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive field name validation to prevent SQL injection vulnerabilities in the CEL-to-SQL conversion pipeline. The changes implement defense-in-depth security by validating field names and identifiers at multiple points in the conversion process.

Key changes:

  • Added validateFieldName() function with PostgreSQL identifier rules, length limits, and reserved keyword checking
  • Integrated validation into visitSelect() and visitIdent() conversion methods
  • Added 43 unit tests and 7 integration tests covering validation rules, SQL injection patterns, and edge cases

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
utils.go Implements field name validation with PostgreSQL identifier rules, 63-character limit enforcement, and reserved keyword checking
cel2sql.go Integrates validation into identifier and select expression conversion to block invalid names early
utils_test.go Comprehensive unit tests covering valid names, SQL injection attempts, format violations, length limits, and reserved keywords
field_validation_test.go Integration tests verifying validation works through the full CEL-to-SQL conversion pipeline


var (
// fieldNameRegexp validates PostgreSQL identifier format
fieldNameRegexp = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*$`)
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The regexp is compiled at package initialization time but could be replaced with a more efficient character-by-character validation. For a security-critical validation function called frequently, consider replacing regex with explicit character validation loops to avoid regex engine overhead and improve performance.

Copilot uses AI. Check for mistakes.
Comment on lines +1081 to +1084
fieldName := sel.GetField()
if err := validateFieldName(fieldName); err != nil {
return fmt.Errorf("invalid field name in select expression: %w", err)
}
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name is retrieved and validated, but sel.GetField() is still called multiple times later in the function (lines 1119, 1133). Consider reusing the fieldName variable throughout to avoid redundant method calls and improve code clarity.

Copilot uses AI. Check for mistakes.
Comment on lines 1123 to 1125
con.str.WriteString("->>'")
con.str.WriteString(fieldName)
con.str.WriteString("'")
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names are written directly into SQL strings without escaping single quotes. While validateFieldName() prevents most injection patterns, if a field name containing a single quote somehow passes validation, it could break the SQL syntax. Consider adding explicit quote escaping or verifying the validation regex prevents single quotes.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@richardwooding richardwooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed review! I'll address each point:

Comment 1 (Regex Performance):
Good suggestion for future optimization. The current regex is compiled once at package initialization and the pattern is simple (^[a-zA-Z_][a-zA-Z0-9_]*$), so the overhead is minimal. Since this is a security-critical fix being backported, I'd prefer to keep the clear, maintainable regex implementation. If profiling shows this is a bottleneck, we can optimize to character-by-character validation in a follow-up PR.

Comment 2 (Reuse fieldName variable):
Already done! The fieldName variable extracted at line 1081 is reused throughout the function at lines 1093, 1104, 1119, 1124, 1126, and 1133. The code doesn't call sel.GetField() again after the initial extraction and validation.

Comment 3 (Single quote escaping):
The validation regex ^[a-zA-Z_][a-zA-Z0-9_]*$ only allows letters, digits, and underscores, so single quotes are already prevented by the validation. The regex explicitly rejects any character that isn't alphanumeric or underscore, making quote injection impossible. This is by design - PostgreSQL identifiers that need special characters (including quotes) must be quoted with double quotes, which we intentionally don't support here to maintain security.

@richardwooding richardwooding merged commit b7cb500 into main Oct 22, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing Field Name Validation in visitSelect() and visitIdent() Creates SQL Injection Risk

1 participant