-
Notifications
You must be signed in to change notification settings - Fork 0
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devin_pr_code_review_bench_100_devin2_base_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_acr
Are you sure you want to change the base?
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #25
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <[email protected]>
…logic Signed-off-by: NeatGuyCoding <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| case "not contains": | ||
| filters.append(json_field.notlike(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.notlike(f"%{escaped_value}%")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Missing escape parameter in notlike() call causes ineffective LIKE pattern escaping
The notlike() call in the "not contains" case is missing the escape="\\" parameter, making the LIKE pattern escaping ineffective.
Click to expand
Issue Details
In api/core/rag/retrieval/dataset_retrieval.py, the process_metadata_filter_func method properly escapes special characters using escape_like_pattern() for all LIKE operations. However, the "not contains" case at line 1207 is missing the escape="\\" parameter:
case "not contains":
escaped_value = escape_like_pattern(str(value))
filters.append(json_field.notlike(f"%{escaped_value}%")) # Missing escape parameter!Compare with the "contains" case which correctly includes the escape parameter:
case "contains":
escaped_value = escape_like_pattern(str(value))
filters.append(json_field.like(f"%{escaped_value}%", escape="\\")) # Correct!Actual vs Expected Behavior
Actual: When filtering with "not contains" condition and a value like 50%, the escaped pattern 50\% is passed to notlike() without specifying the escape character. The database will interpret \% as a literal backslash followed by a wildcard %, not as an escaped percent sign.
Expected: The notlike() call should include escape="\\" so the database correctly interprets \% as a literal percent character.
Impact
Metadata filtering with "not contains" condition will not work correctly when the filter value contains special SQL LIKE characters (%, _, \). This could lead to:
- Incorrect query results (filtering out wrong documents)
- Potential security issues if user input contains LIKE wildcards
Recommendation: Add the escape="\\" parameter to the notlike() call:
filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\"))Was this helpful? React with 👍 or 👎 to provide feedback.
| SELECT TOP {top_k} id, text, meta | ||
| FROM {self.schema}.{self.table_name} | ||
| WHERE text LIKE ? | ||
| WHERE text LIKE ? ESCAPE '|' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 IRIS vector LIKE search uses wrong escape character causing ineffective escaping
The IRIS vector implementation uses ESCAPE '|' (pipe) in the SQL query but escape_like_pattern() escapes using backslash (\), making the escaping completely ineffective.
Click to expand
Issue Details
In api/core/rag/datasource/vdb/iris/iris_vector.py, the search_by_full_text method escapes the query using escape_like_pattern() which uses backslash as the escape character (see api/libs/helper.py:64):
# libs/helper.py:64
escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")However, the SQL query at line 298 specifies pipe (|) as the escape character:
escaped_query = escape_like_pattern(query) # Escapes with backslash
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
WHERE text LIKE ? ESCAPE '|' # Expects pipe as escape character!
"""Actual vs Expected Behavior
Actual: When searching for 50%:
escape_like_pattern("50%")returns50\%- The SQL query uses
ESCAPE '|' - The database interprets
\%as a literal backslash followed by a wildcard% - The search matches any text containing
50\followed by any characters
Expected: The escape character in the SQL should match the one used by escape_like_pattern(), which is backslash (\).
Impact
Full-text search in IRIS vector database will not correctly handle queries containing special SQL LIKE characters (%, _, \). This could lead to:
- Incorrect search results
- Potential SQL injection via LIKE wildcards (the very issue this PR aims to fix)
Recommendation: Change the ESCAPE clause to use backslash to match escape_like_pattern():
WHERE text LIKE ? ESCAPE '\\'Note: In raw SQL strings, you may need to double the backslash depending on how IRIS handles escape characters.
Was this helpful? React with 👍 or 👎 to provide feedback.
Benchmark PR from qodo-benchmark#431