Skip to content

fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#97

Open
tomerqodo wants to merge 4 commits intogreptile_combined_20260121_qodo_grep_cursor_copilot_1_base_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_speciafrom
greptile_combined_20260121_qodo_grep_cursor_copilot_1_head_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_specia
Open

fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#97
tomerqodo wants to merge 4 commits intogreptile_combined_20260121_qodo_grep_cursor_copilot_1_base_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_speciafrom
greptile_combined_20260121_qodo_grep_cursor_copilot_1_head_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_specia

Conversation

@tomerqodo
Copy link

Benchmark PR from qodo-benchmark#431

NeatGuyCoding and others added 4 commits January 21, 2026 15:54
…ralized

utility function, ensuring consistent and secure handling of special characters
across all database queries.

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
@greptile-apps
Copy link

greptile-apps bot commented Jan 21, 2026

Greptile Summary

Refactored SQL LIKE pattern escaping to use a centralized escape_like_pattern() utility function in api/libs/helper.py, replacing inline escaping logic across 13 service files. The utility properly escapes backslash, percent, and underscore characters to prevent SQL injection via LIKE wildcards.

Major changes:

  • Added escape_like_pattern() utility with comprehensive documentation
  • Applied escaping to search queries in annotation, app, conversation, dataset, tag, and workflow services
  • Added unit tests covering all special characters and edge cases
  • Added integration tests verifying SQL injection prevention
  • Updated all LIKE queries to include escape="\\" parameter (except where inconsistent)

Issues found:

  • iris_vector.py uses wrong ESCAPE character ('|' instead of '\')
  • dataset_retrieval.py missing escape parameter on one notlike() call
  • helper.py uses camelCase variable name instead of snake_case
  • workflow_app_service.py removed unicode escape encoding without explanation

Confidence Score: 3/5

  • This PR has critical bugs that will cause incorrect query behavior in production
  • Score reflects two critical logical errors that will cause runtime failures: wrong ESCAPE character in iris_vector.py means backslash escaping won't work, and missing escape parameter in dataset_retrieval.py creates inconsistent behavior. Additionally, naming convention violations require fixes. The core refactoring approach is sound and test coverage is excellent, but the implementation errors need correction before merging.
  • Pay close attention to api/core/rag/datasource/vdb/iris/iris_vector.py and api/core/rag/retrieval/dataset_retrieval.py - both contain logical errors that will cause incorrect query behavior

Important Files Changed

Filename Overview
api/libs/helper.py Added centralized escape_like_pattern() utility function with comprehensive documentation and proper escaping order
api/core/rag/datasource/vdb/iris/iris_vector.py Uses incorrect ESCAPE character '
api/core/rag/retrieval/dataset_retrieval.py Missing escape parameter on notlike() call at line 1207 - other similar calls include it
api/services/workflow_app_service.py Removed unicode escape logic and applied proper LIKE escaping, but removed encoding/decoding without clear justification
api/tests/unit_tests/libs/test_helper.py Comprehensive unit tests cover all special characters, edge cases, and correct escaping order
api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py Applied escaping but uses quadruple backslash ESCAPE '\\' - verify this matches Clickzetta SQL syntax

Sequence Diagram

sequenceDiagram
    participant Client
    participant Controller
    participant Service
    participant Helper
    participant Database

    Client->>Controller: Search request with keyword
    Controller->>Service: search(keyword)
    Service->>Helper: escape_like_pattern(keyword)
    Note over Helper: Escape backslash first<br/>Then escape % and _
    Helper-->>Service: escaped_keyword
    Service->>Service: Build SQL query with<br/>escaped_keyword and<br/>ESCAPE '\\'
    Service->>Database: Execute LIKE query
    Note over Database: Database interprets<br/>\ as escape char<br/>Treats %, _ as literals
    Database-->>Service: Query results
    Service-->>Controller: Filtered results
    Controller-->>Client: Response
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
WHERE text LIKE ?
WHERE text LIKE ? ESCAPE '|'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Wrong ESCAPE character - should be '\' not '|'

The escape character must match what the utility function uses (backslash). Using '|' means the backslash escapes won't be recognized.

Suggested change
WHERE text LIKE ? ESCAPE '|'
WHERE text LIKE ? ESCAPE '\\'
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/core/rag/datasource/vdb/iris/iris_vector.py
Line: 298:298

Comment:
**logic:** Wrong ESCAPE character - should be '\\' not '|'

The escape character must match what the utility function uses (backslash). Using '|' means the backslash escapes won't be recognized.

```suggestion
                    WHERE text LIKE ? ESCAPE '\\'
```

How can I resolve this? If you propose a fix, please make it concise.

case "not contains":
filters.append(json_field.notlike(f"%{value}%"))
escaped_value = escape_like_pattern(str(value))
filters.append(json_field.notlike(f"%{escaped_value}%"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing escape parameter - inconsistent with other cases

All other LIKE calls include escape="\\", but this notlike() call doesn't.

Suggested change
filters.append(json_field.notlike(f"%{escaped_value}%"))
filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\"))
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/core/rag/retrieval/dataset_retrieval.py
Line: 1207:1207

Comment:
**logic:** Missing `escape` parameter - inconsistent with other cases

All other LIKE calls include `escape="\\"`, but this `notlike()` call doesn't.

```suggestion
                filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\"))
```

How can I resolve this? If you propose a fix, please make it concise.

if not pattern:
return pattern
# Escape backslash first, then percent and underscore
escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Variable name violates snake_case convention

Per compliance rule #16, Python variables should use snake_case naming.

Suggested change
escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
escaped_pattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")

Context Used: Context from dashboard - AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: api/libs/helper.py
Line: 64:64

Comment:
**syntax:** Variable name violates snake_case convention

Per compliance rule #16, Python variables should use snake_case naming.

```suggestion
    escaped_pattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
```

**Context Used:** Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=1028e8c2-7220-42d3-94ac-c7ba65fc7240))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

return pattern
# Escape backslash first, then percent and underscore
escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
return escapedPattern
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Variable name violates snake_case convention

Per compliance rule #16, Python variables should use snake_case naming.

Suggested change
return escapedPattern
return escaped_pattern

Context Used: Context from dashboard - AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: api/libs/helper.py
Line: 65:65

Comment:
**syntax:** Variable name violates snake_case convention

Per compliance rule #16, Python variables should use snake_case naming.

```suggestion
    return escaped_pattern
```

**Context Used:** Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=1028e8c2-7220-42d3-94ac-c7ba65fc7240))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +89 to +93
from libs.helper import escape_like_pattern

# Escape special characters in keyword to prevent SQL injection via LIKE wildcards
keyword_trimmed = keyword[:30]
escaped_keyword = escape_like_pattern(keyword_trimmed)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Removed unicode escape encoding without clear justification

The old code used .encode('unicode_escape').decode('utf-8') to handle unicode characters. Verify this removal doesn't break searches with unicode/non-ASCII characters in workflow inputs/outputs. Was unicode escaping intentionally removed, or should it be preserved for non-ASCII character handling?

Prompt To Fix With AI
This is a comment left during a code review.
Path: api/services/workflow_app_service.py
Line: 89:93

Comment:
**style:** Removed unicode escape encoding without clear justification

The old code used `.encode('unicode_escape').decode('utf-8')` to handle unicode characters. Verify this removal doesn't break searches with unicode/non-ASCII characters in workflow inputs/outputs. Was unicode escaping intentionally removed, or should it be preserved for non-ASCII character handling?

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants