fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#137
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries.#137
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Code Review by Qodo
1. escapedPattern violates snake_case
|
| escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") | ||
| return escapedPattern |
There was a problem hiding this comment.
1. escapedpattern violates snake_case 📘 Rule violation ✓ Correctness
• escape_like_pattern() introduces a local variable named escapedPattern using camelCase, violating the required Python naming conventions. • This reduces consistency/readability and is likely to be flagged by repository lint rules. • Rename the variable to escaped_pattern to comply with the standard.
Agent prompt
## Issue description
`escape_like_pattern()` uses a camelCase local variable (`escapedPattern`), which violates required Python naming conventions.
## Issue Context
Repository standards require snake_case for variables and functions.
## Fix Focus Areas
- api/libs/helper.py[61-65]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Escape special characters for LIKE clause to prevent SQL injection | ||
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_query = escape_like_pattern(query) | ||
| query_pattern = f"%{escaped_query}%" | ||
| sql = f""" | ||
| SELECT TOP {top_k} id, text, meta | ||
| FROM {self.schema}.{self.table_name} | ||
| WHERE text LIKE ? | ||
| WHERE text LIKE ? ESCAPE '|' | ||
| """ | ||
| cursor.execute(sql, (query_pattern,)) |
There was a problem hiding this comment.
2. Iris like escape mismatch 📘 Rule violation ⛨ Security
• iris_vector.search_by_full_text() builds query_pattern using escape_like_pattern() (backslash-based escaping) but the SQL uses ESCAPE '|'. • Because the escape character in the SQL does not match the escape sequences produced by the helper, %/_ may still behave as wildcards, defeating the intended sanitization and causing incorrect matches. • Align the escaping strategy by using the same escape character in both the pattern escaping and the SQL ESCAPE clause.
Agent prompt
## Issue description
IRIS fallback LIKE query uses `ESCAPE '|'` while `escape_like_pattern()` escapes using backslashes, so wildcard escaping may not work.
## Issue Context
`escape_like_pattern()` is documented to be used with `ESCAPE '\\'`. The IRIS SQL currently uses a different escape character.
## Fix Focus Areas
- api/core/rag/datasource/vdb/iris/iris_vector.py[290-300]
- api/libs/helper.py[35-65]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| case "not contains": | ||
| filters.append(json_field.notlike(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.notlike(f"%{escaped_value}%")) |
There was a problem hiding this comment.
3. notlike() missing escape 📘 Rule violation ⛨ Security
• In process_metadata_filter_func(), the not contains branch uses
json_field.notlike(f"%{escaped_value}%") without specifying escape="\\".
• Because escaped_value contains backslash escapes, omitting the explicit escape clause can
cause the DB to treat %/_ as wildcards (depending on backend), weakening the intended input
sanitization.
• Add escape="\\" to the notlike() call for parity with the other LIKE cases.
Agent prompt
## Issue description
The `not contains` metadata filter uses backslash-escaped patterns but does not specify the escape character for `NOT LIKE`, which can make escaping ineffective.
## Issue Context
Other branches (`contains`, `start with`, `end with`) pass `escape="\\"`, but `not contains` does not.
## Fix Focus Areas
- api/core/rag/retrieval/dataset_retrieval.py[1201-1208]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def test_get_annotation_list_by_app_id_with_special_characters_in_keyword( | ||
| self, db_session_with_containers, mock_external_service_dependencies | ||
| ): |
There was a problem hiding this comment.
4. New tests lack type hints 📘 Rule violation ✓ Correctness
• Newly added pytest test functions/methods omit type annotations for parameters and return values. • This violates the requirement that all Python function definitions include parameter and return type annotations (as enforced by basedpyright). • Add explicit annotations (use Any for pytest fixtures when concrete types are impractical) and -> None return types for tests.
Agent prompt
## Issue description
New pytest test functions/methods are missing type annotations for parameters and return values.
## Issue Context
The project requires type annotations for all Python function definitions to satisfy basedpyright.
## Fix Focus Areas
- api/tests/test_containers_integration_tests/services/test_annotation_service.py[447-449]
- api/tests/unit_tests/libs/test_helper.py[68-113]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def escape_like_pattern(pattern: str) -> str: | ||
| """ | ||
| Escape special characters in a string for safe use in SQL LIKE patterns. | ||
|
|
||
| This function escapes the special characters used in SQL LIKE patterns: | ||
| - Backslash (\\) -> \\ | ||
| - Percent (%) -> \\% | ||
| - Underscore (_) -> \\_ | ||
|
|
||
| The escaped pattern can then be safely used in SQL LIKE queries with the | ||
| ESCAPE '\\' clause to prevent SQL injection via LIKE wildcards. | ||
|
|
||
| Args: | ||
| pattern: The string pattern to escape | ||
|
|
||
| Returns: | ||
| Escaped string safe for use in SQL LIKE queries | ||
|
|
||
| Examples: | ||
| >>> escape_like_pattern("50% discount") | ||
| '50\\% discount' | ||
| >>> escape_like_pattern("test_data") | ||
| 'test\\_data' | ||
| >>> escape_like_pattern("path\\to\\file") | ||
| 'path\\\\to\\\\file' | ||
| """ | ||
| if not pattern: | ||
| return pattern | ||
| # Escape backslash first, then percent and underscore | ||
| escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") | ||
| return escapedPattern |
There was a problem hiding this comment.
5. Helper type contract broken 🐞 Bug ✓ Correctness
• escape_like_pattern is annotated as str -> str, but it returns the input as-is for falsy values, meaning it can return None. • Unit tests explicitly call escape_like_pattern(None), which will be flagged by basedpyright. • The repo’s make type-check runs basedpyright, so this can break CI even if runtime tests pass.
Agent prompt
### Issue description
`escape_like_pattern` is typed as `str -> str` but can return `None`, and tests call it with `None`. This will likely fail `basedpyright` during `make type-check`.
### Issue Context
The function currently returns `pattern` on falsy input. That includes `None`, violating the current type annotations.
### Fix Focus Areas
- api/libs/helper.py[35-65]
- api/tests/unit_tests/libs/test_helper.py[68-113]
### Suggested fix
- Update signature to `def escape_like_pattern(pattern: str | None) -> str | None:`.
- (Optional) Rename `escapedPattern` to `escaped_pattern` for style consistency.
- Ensure tests and all call sites align with the new signature (or, alternatively, remove the None test and raise on None).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Benchmark PR from qodo-benchmark#431