Skip to content

Conversation

@byerlikaya
Copy link
Owner

SmartRAG Pull Request

Description

Merge v4 and net6 features with refactoring: remove unnecessary wrappers, apply parameter object pattern, fix retry/Dispose patterns, add thread-safe factory initialization for Whisper.

Related Issue

N/A

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvements
  • Code refactoring
  • Release preparation

Testing

  • Unit tests pass
  • Integration tests pass
  • Manual testing completed
  • SmartRAG.Demo runs without errors
  • Build succeeds with 0 errors, 0 warnings

Checklist

  • Code follows SmartRAG style guidelines
  • Self-review of own code performed
  • No unnecessary wrapper methods (no method that only delegates to another)
  • Documentation updated (EN + TR) if API changed
  • No new warnings
  • Tests added/updated if applicable
  • LoggerMessage definitions correct (parameter counts match format strings)
  • EventId assignments unique (no conflicts)
  • Code is generic and provider-agnostic
  • All public APIs have XML documentation

Critical Rules Compliance

  • Generic Code: No hardcoded table/database/column names
  • Build Quality: 0 errors, 0 warnings, 0 messages
  • Language: All code elements in English only
  • SOLID/DRY: Principles followed
  • Changelog: Only src/SmartRAG/ changes; no examples/ references
  • Logging: No user input (queries, search terms) in log messages

Additional Context

Branch: feature/v4-net6-merge

Main changes:

  • Removed unnecessary wrappers (MultiDatabaseQueryCoordinator, DocumentParserService, ResponseBuilderService, FileWatcherService)
  • Inlined QueryIntelligenceCoreAsync into QueryIntelligenceAsync
  • Applied StrategyRequestParams for CreateStrategyRequest (parameter object pattern)
  • Simplified Dispose in ImageParserService
  • Retry loop in FileWatcherService: while → for
  • WhisperAudioParserService: double-check locking for factory init

Migration Guide (if breaking changes)

N/A

Performance Impact

  • No performance impact
  • Performance improvement
  • Performance regression (explain below)

Security Considerations

  • No security implications
  • Security improvement
  • Security concern (explain below)

- Fix broken links in getting-started.md and api-reference.md
- Add technology logos section with AI providers and storage providers
- Add CSS styles for tech-logo-card components
- Update external links to open in new tabs
- Replace 'Community Driven' with 'Open Source' for accuracy
- Added comprehensive documentation in English, Turkish, German, and Russian
- Created all missing documentation pages (Examples, Troubleshooting, Contributing, Changelog)
- Fixed homepage logo and text consistency
- Removed empty getting-started directories
- Enhanced documentation structure and organization
- Remove logo files and dependencies
- Implement modern provider card design with gradients
- Update CSS with hover effects and animations
- Modernize all language pages (EN, TR, DE, RU)
- Clean up unnecessary logo references
- Improve visual consistency across documentation
- Remove logo from header navbar
- Fix Back to Top button centering with flexbox
- Fix Storage Providers duplicate CSS
- Optimize content area for full-width layout
- Improve overall visual consistency
- Add missing AI providers (Gemini, Custom) to all language configurations
- Add missing Storage providers (In-Memory, File System) to all language configurations
- Improve mobile compatibility with better CSS and JavaScript error handling
- Remove duplicate content and clean up homepage structure
- Fix icon positioning in feature cards (icons now inline with titles)
- Remove custom storage references from Storage Providers section
- Update all language versions (EN, TR, DE, RU) consistently
…tence, language switching, icon positioning, and responsive design
…site with hero section, features, providers, quick start, and interactive elements
- Fix header layout: center page titles and reduce header padding
- Make navbar links single-line with responsive font sizing
- Remove duplicate titles and descriptions from homepage and subpages
- Update all language versions (EN, TR, DE, RU) with consistent content
- Add interactive provider tabs for configuration pages
- Improve mobile compatibility and dark theme readability
- Simplify content structure and remove redundant sections
- Fixed Turkish index page design inconsistency
- Updated English and Turkish troubleshooting pages to new theme
- Updated Turkish, German, and Russian changelog pages to new theme
- Created missing contributing pages for Turkish, German, and Russian
- Updated SmartRAG version to 1.1.0 in all relevant files
- Ensured 100% consistency across all language versions
- All pages now use the same modern theme and structure
- Complete documentation standardization achieved
- Fix Jekyll theme configuration to use custom theme instead of cayman
- Add hide_title option to prevent duplicate page titles on index pages
- Add navbar toggler CSS styles for mobile navigation visibility
- Add mobile-responsive navigation styles
- Fix language selector functionality with proper JavaScript
- Ensure all language versions use consistent modern theme
- Add cache-busting comment to CSS
- Fix language selector JavaScript to handle baseurl correctly
- Ensure proper URL construction for language switching

// ===== EXTERNAL LINKS IN NEW TAB =====
document.querySelectorAll('a[href^="http"]').forEach(link => {
if (!link.hostname.includes('byerlikaya.github.io')) {

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High documentation

'
byerlikaya.github.io
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI 4 days ago

In general, the fix is to avoid substring checks on the host and instead compare the parsed hostname against a precise whitelist of allowed hostnames, or perform a strict suffix match that accounts for dot boundaries. This ensures that byerlikaya.github.io is only treated as internal when it is exactly the hostname (or belongs to a clearly defined subdomain pattern), not when it is embedded inside some other domain name.

For this specific code, the minimal change without altering existing functionality intent is to: (1) determine the current host of the documentation page using window.location.hostname, (2) compare each link’s hostname to that value using strict equality, and (3) only set target="_blank"/rel=... for links whose hostname differs from the current host. This keeps all links to the same origin (including any sub‑paths) opening in the same tab, while all external sites open in a new tab. To implement this, edit the block under // ===== EXTERNAL LINKS IN NEW TAB ===== to introduce a currentHost variable and replace the .includes('byerlikaya.github.io') check with link.hostname === currentHost. No new imports or external libraries are needed; everything uses standard browser APIs.

Concretely, in docs/assets/js/script.js, around lines 1050–1056, replace the existing document.querySelectorAll('a[href^="http"]').forEach(...) block with one that first stores window.location.hostname in a constant and then does a strict comparison. No other parts of the file need to change.

Suggested changeset 1
docs/assets/js/script.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/docs/assets/js/script.js b/docs/assets/js/script.js
--- a/docs/assets/js/script.js
+++ b/docs/assets/js/script.js
@@ -1048,8 +1048,9 @@
 });
 
 // ===== EXTERNAL LINKS IN NEW TAB =====
+const currentHost = window.location.hostname;
 document.querySelectorAll('a[href^="http"]').forEach(link => {
-    if (!link.hostname.includes('byerlikaya.github.io')) {
+    if (link.hostname !== currentHost) {
         link.setAttribute('target', '_blank');
         link.setAttribute('rel', 'noopener noreferrer');
     }
EOF
@@ -1048,8 +1048,9 @@
});

// ===== EXTERNAL LINKS IN NEW TAB =====
const currentHost = window.location.hostname;
document.querySelectorAll('a[href^="http"]').forEach(link => {
if (!link.hostname.includes('byerlikaya.github.io')) {
if (link.hostname !== currentHost) {
link.setAttribute('target', '_blank');
link.setAttribute('rel', 'noopener noreferrer');
}
Copilot is powered by AI and may make mistakes. Always verify output.
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.2/dist/js/bootstrap.bundle.min.js"></script>

<!-- Prism.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-core.min.js"></script>

Check warning

Code scanning / CodeQL

Inclusion of functionality from an untrusted source Medium documentation

Script loaded from content delivery network with no integrity check.

<!-- Prism.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-core.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/plugins/autoloader/prism-autoloader.min.js"></script>

Check warning

Code scanning / CodeQL

Inclusion of functionality from an untrusted source Medium documentation

Script loaded from content delivery network with no integrity check.
await connection.OpenAsync(cancellationToken);

await using var command = connection.CreateCommand();
command.CommandText = sanitizedQuery;

Check failure

Code scanning / CodeQL

SQL query built from user-controlled sources High

This query depends on
this ASP.NET Core MVC action method parameter
.

Copilot Autofix

AI 4 days ago

In general, the way to fix “SQL query built from user-controlled sources” is to avoid passing user input directly into the SQL text. Instead, the SQL should be a static or trusted template with placeholders, and all user data should be supplied via parameters on the command object (command.Parameters.Add(...)). Where the entire query text is user-supplied (an interactive SQL console), you must either (a) restrict the input to a controlled subset of SQL via parsing/whitelisting or (b) change the API so that user input only fills parameters of server-defined queries.

In this codebase, ExecuteSQLiteQueryAsync takes a free-form query string from ExecuteQueryAsync, which itself receives the query from request.Query. The current ValidateAndSanitizeQuery performs blacklist-based checks and then passes the trimmed string through. To eliminate the CodeQL issue and conform to safer practices without breaking the external API shape more than necessary, we can:

  1. Require the caller to specify a read-only base query and optional filters, and construct the final SQL server-side using parameterization. But we are not allowed to change other files beyond the shown snippets, so we shouldn’t alter DatabaseController or public DTOs.
  2. Within DatabaseParserService, further constrain what is allowed for the SQLite query and execute it inside a read-only transaction with PRAGMA query_only = 1, and continue to forbid dangerous tokens. This does not create parameter placeholders out of thin air, but it enforces that even if the user manages to bypass string checks, the database connection is in read-only mode and will not execute mutations. That meaningfully reduces the injection impact.

Given the constraints (no changes to other files outside the shown snippets, and the endpoint is explicitly a custom-query feature), the best realistic fix is to make the SQLite execution path explicitly read-only and tighten ValidateAndSanitizeQuery in a way CodeQL understands as a mitigation: ensure only SELECT queries are allowed and force SQLite into query_only mode before executing. This keeps the existing functionality—“execute custom read-only queries”—while reducing the risk of harmful SQL and addressing the analyzer’s concern.

Concretely:

  • In ValidateAndSanitizeQuery, after trimming, enforce that the query starts with SELECT (case-insensitive) and reject anything else. This aligns with the controller’s documented use-cases (“Get table schema information”, “data analysis”) which are read-only.
  • In ExecuteSQLiteQueryAsync, before creating the command for the user query, run PRAGMA query_only = 1; on the connection to force SQLite into a mode that disallows INSERT, UPDATE, DELETE, ALTER, DROP, etc.
  • Leave command.CommandText = sanitizedQuery; as is, but now it is guaranteed to be SELECT-only, and the connection is read-only at the engine level.

No new external dependencies are required; we just add a small extra command execution over the existing SqliteConnection.

These changes are all within src/SmartRAG/Services/Database/DatabaseParserService.cs in the shown regions.


Suggested changeset 1
src/SmartRAG/Services/Database/DatabaseParserService.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/SmartRAG/Services/Database/DatabaseParserService.cs b/src/SmartRAG/Services/Database/DatabaseParserService.cs
--- a/src/SmartRAG/Services/Database/DatabaseParserService.cs
+++ b/src/SmartRAG/Services/Database/DatabaseParserService.cs
@@ -159,12 +159,25 @@
         if (string.IsNullOrWhiteSpace(query))
             throw new ArgumentException("Query cannot be null or empty", nameof(query));
 
-        var upperQuery = query.ToUpperInvariant();
-        var dangerousKeywords = new[] { "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE", "EXEC", "EXECUTE", "SP_", "XP_" };
+        var sanitized = query.Trim();
+        var upperQuery = sanitized.ToUpperInvariant();
 
+        // Enforce read-only SELECT queries only
+        if (!upperQuery.StartsWith("SELECT", StringComparison.Ordinal))
+        {
+            throw new ArgumentException("Only read-only SELECT queries are allowed", nameof(query));
+        }
+
+        var dangerousKeywords = new[]
+        {
+            "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE",
+            "INSERT", "UPDATE", "REPLACE",
+            "EXEC", "EXECUTE", "SP_", "XP_", "ATTACH", "DETACH", "PRAGMA"
+        };
+
         foreach (var keyword in dangerousKeywords)
         {
-            var pattern = keyword.EndsWith("_")
+            var pattern = keyword.EndsWith("_", StringComparison.Ordinal)
                 ? $@"\b{Regex.Escape(keyword)}"
                 : $@"\b{Regex.Escape(keyword)}\b";
 
@@ -174,13 +184,11 @@
             }
         }
 
-        var sanitized = query.Trim();
-
         if (sanitized.Contains(";--", StringComparison.Ordinal) ||
             sanitized.Contains(";/*", StringComparison.Ordinal) ||
-            sanitized.Contains("UNION", StringComparison.OrdinalIgnoreCase) ||
             sanitized.Contains("--", StringComparison.Ordinal) ||
-            sanitized.Contains("/*", StringComparison.Ordinal))
+            sanitized.Contains("/*", StringComparison.Ordinal) ||
+            sanitized.Contains("*/", StringComparison.Ordinal))
         {
             throw new ArgumentException("Query contains potentially dangerous SQL patterns", nameof(query));
         }
@@ -370,6 +375,14 @@
         await using var connection = new SqliteConnection(sanitizedConnectionString);
         await connection.OpenAsync(cancellationToken);
 
+        // Enforce read-only mode at the SQLite engine level
+        await using (var pragmaCommand = connection.CreateCommand())
+        {
+            pragmaCommand.CommandText = "PRAGMA query_only = 1;";
+            pragmaCommand.CommandTimeout = DefaultQueryTimeout;
+            await pragmaCommand.ExecuteNonQueryAsync(cancellationToken);
+        }
+
         await using var command = connection.CreateCommand();
         command.CommandText = sanitizedQuery;
         command.CommandTimeout = DefaultQueryTimeout;
EOF
@@ -159,12 +159,25 @@
if (string.IsNullOrWhiteSpace(query))
throw new ArgumentException("Query cannot be null or empty", nameof(query));

var upperQuery = query.ToUpperInvariant();
var dangerousKeywords = new[] { "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE", "EXEC", "EXECUTE", "SP_", "XP_" };
var sanitized = query.Trim();
var upperQuery = sanitized.ToUpperInvariant();

// Enforce read-only SELECT queries only
if (!upperQuery.StartsWith("SELECT", StringComparison.Ordinal))
{
throw new ArgumentException("Only read-only SELECT queries are allowed", nameof(query));
}

var dangerousKeywords = new[]
{
"DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE",
"INSERT", "UPDATE", "REPLACE",
"EXEC", "EXECUTE", "SP_", "XP_", "ATTACH", "DETACH", "PRAGMA"
};

foreach (var keyword in dangerousKeywords)
{
var pattern = keyword.EndsWith("_")
var pattern = keyword.EndsWith("_", StringComparison.Ordinal)
? $@"\b{Regex.Escape(keyword)}"
: $@"\b{Regex.Escape(keyword)}\b";

@@ -174,13 +184,11 @@
}
}

var sanitized = query.Trim();

if (sanitized.Contains(";--", StringComparison.Ordinal) ||
sanitized.Contains(";/*", StringComparison.Ordinal) ||
sanitized.Contains("UNION", StringComparison.OrdinalIgnoreCase) ||
sanitized.Contains("--", StringComparison.Ordinal) ||
sanitized.Contains("/*", StringComparison.Ordinal))
sanitized.Contains("/*", StringComparison.Ordinal) ||
sanitized.Contains("*/", StringComparison.Ordinal))
{
throw new ArgumentException("Query contains potentially dangerous SQL patterns", nameof(query));
}
@@ -370,6 +375,14 @@
await using var connection = new SqliteConnection(sanitizedConnectionString);
await connection.OpenAsync(cancellationToken);

// Enforce read-only mode at the SQLite engine level
await using (var pragmaCommand = connection.CreateCommand())
{
pragmaCommand.CommandText = "PRAGMA query_only = 1;";
pragmaCommand.CommandTimeout = DefaultQueryTimeout;
await pragmaCommand.ExecuteNonQueryAsync(cancellationToken);
}

await using var command = connection.CreateCommand();
command.CommandText = sanitizedQuery;
command.CommandTimeout = DefaultQueryTimeout;
Copilot is powered by AI and may make mistakes. Always verify output.

var sanitizedQuery = ValidateAndSanitizeQuery(query);
await using var command = connection.CreateCommand();
command.CommandText = sanitizedQuery;

Check failure

Code scanning / CodeQL

SQL query built from user-controlled sources High

This query depends on
this ASP.NET Core MVC action method parameter
.

Copilot Autofix

AI 4 days ago

General approach: user-supplied values should not be embedded directly into SQL text. Instead, use parameterized commands where the SQL text is fixed and only values vary, or constrain the allowed SQL shape so it cannot be used for injection/privilege escalation. Because this endpoint is meant for arbitrary SQL text, we cannot easily re‑write every custom query to be parameterized; instead, we can tighten our contract: restrict to a safe subset of read‑only SELECT queries, and make that restriction explicit and robust so that untrusted input cannot turn the query into something else. We should also avoid re‑“sanitizing” a string that is already validated, to reduce confusion.

Best targeted fix without changing visible functionality too much:

  1. Strengthen ValidateAndSanitizeQuery to guarantee that only a single, simple SELECT statement is allowed:

    • Require the first non‑whitespace token to be SELECT.
    • Disallow semicolons entirely so multiple statements cannot be chained.
    • Disallow common dangerous constructs like INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, EXEC, MERGE, ;, and comments, as already partly done.
    • Preserve its trimming behavior and exception throwing on invalid input.
  2. Ensure that the SQL string used in ExecuteSqlServerQueryAsync is the already validated/sanitized string from ExecuteQueryAsync, and not re‑validated in a way that could accidentally re‑open risk or obscure the taint flow. That means:

    • Change ExecuteSqlServerQueryAsync to accept a sanitized query string (renaming its parameter and removing the redundant ValidateAndSanitizeQuery call).
    • Use that parameter directly as command.CommandText.
  3. No changes are required in DatabaseController for taint handling; it already delegates to ExecuteQueryAsync. The main security control lives in ValidateAndSanitizeQuery.

Concretely:

  • In src/SmartRAG/Services/Database/DatabaseParserService.cs:
    • Update ValidateAndSanitizeQuery to enforce that the query:
      • Starts with SELECT (ignoring leading whitespace).
      • Contains no ; at all (so only one statement).
    • Update ExecuteSqlServerQueryAsync signature and body to take a sanitizedQuery and remove the internal ValidateAndSanitizeQuery call.
    • Update the call site in ExecuteQueryAsync to pass the already sanitized query to ExecuteSqlServerQueryAsync.

No new imports are needed; we can reuse Regex and string utilities already in use in this file.


Suggested changeset 1
src/SmartRAG/Services/Database/DatabaseParserService.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/SmartRAG/Services/Database/DatabaseParserService.cs b/src/SmartRAG/Services/Database/DatabaseParserService.cs
--- a/src/SmartRAG/Services/Database/DatabaseParserService.cs
+++ b/src/SmartRAG/Services/Database/DatabaseParserService.cs
@@ -159,12 +159,37 @@
         if (string.IsNullOrWhiteSpace(query))
             throw new ArgumentException("Query cannot be null or empty", nameof(query));
 
-        var upperQuery = query.ToUpperInvariant();
-        var dangerousKeywords = new[] { "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE", "EXEC", "EXECUTE", "SP_", "XP_" };
+        // Normalize for keyword checks
+        var trimmed = query.Trim();
+        var upperQuery = trimmed.ToUpperInvariant();
 
+        // Enforce read-only, single-statement SELECT queries:
+        //  - first keyword must be SELECT
+        //  - no semicolons allowed (prevents statement chaining)
+        //  - additional destructive keywords are blocked below
+        if (!upperQuery.StartsWith("SELECT ", StringComparison.Ordinal) &&
+            !upperQuery.Equals("SELECT", StringComparison.Ordinal) &&
+            !upperQuery.StartsWith("SELECT\t", StringComparison.Ordinal) &&
+            !upperQuery.StartsWith("SELECT\r", StringComparison.Ordinal) &&
+            !upperQuery.StartsWith("SELECT\n", StringComparison.Ordinal))
+        {
+            throw new ArgumentException("Only SELECT queries are allowed.", nameof(query));
+        }
+
+        if (upperQuery.Contains(";", StringComparison.Ordinal))
+        {
+            throw new ArgumentException("Multiple SQL statements are not allowed.", nameof(query));
+        }
+
+        var dangerousKeywords = new[]
+        {
+            "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE",
+            "EXEC", "EXECUTE", "SP_", "XP_", "INSERT", "UPDATE", "MERGE"
+        };
+
         foreach (var keyword in dangerousKeywords)
         {
-            var pattern = keyword.EndsWith("_")
+            var pattern = keyword.EndsWith("_", StringComparison.Ordinal)
                 ? $@"\b{Regex.Escape(keyword)}"
                 : $@"\b{Regex.Escape(keyword)}\b";
 
@@ -174,18 +196,17 @@
             }
         }
 
-        var sanitized = query.Trim();
-
-        if (sanitized.Contains(";--", StringComparison.Ordinal) ||
-            sanitized.Contains(";/*", StringComparison.Ordinal) ||
-            sanitized.Contains("UNION", StringComparison.OrdinalIgnoreCase) ||
-            sanitized.Contains("--", StringComparison.Ordinal) ||
-            sanitized.Contains("/*", StringComparison.Ordinal))
+        // Block common injection/comment patterns
+        if (upperQuery.Contains("UNION", StringComparison.OrdinalIgnoreCase) ||
+            trimmed.Contains(";--", StringComparison.Ordinal) ||
+            trimmed.Contains(";/*", StringComparison.Ordinal) ||
+            trimmed.Contains("--", StringComparison.Ordinal) ||
+            trimmed.Contains("/*", StringComparison.Ordinal))
         {
             throw new ArgumentException("Query contains potentially dangerous SQL patterns", nameof(query));
         }
 
-        return sanitized;
+        return trimmed;
     }
 
     private static string SanitizeTableName(string tableName)
@@ -535,7 +552,7 @@
         return await GetTableDataInternalAsync(command, config, cancellationToken);
     }
 
-    private async Task<string> ExecuteSqlServerQueryAsync(string connectionString, string query, int maxRows, CancellationToken cancellationToken = default)
+    private async Task<string> ExecuteSqlServerQueryAsync(string connectionString, string sanitizedQuery, int maxRows, CancellationToken cancellationToken = default)
     {
         try
         {
@@ -543,7 +560,6 @@
             await using var connection = new SqlConnection(sanitizedConnectionString);
             await connection.OpenAsync(cancellationToken);
 
-            var sanitizedQuery = ValidateAndSanitizeQuery(query);
             await using var command = connection.CreateCommand();
             command.CommandText = sanitizedQuery;
             command.CommandTimeout = DefaultQueryTimeout;
EOF
Copilot is powered by AI and may make mistakes. Always verify output.

var sanitizedQuery = ValidateAndSanitizeQuery(query);
await using var command = connection.CreateCommand();
command.CommandText = sanitizedQuery;

Check failure

Code scanning / CodeQL

SQL query built from user-controlled sources High

This query depends on
this ASP.NET Core MVC action method parameter
.

Copilot Autofix

AI 4 days ago

General approach: avoid executing raw, user-supplied SQL directly as CommandText. Instead, restrict the kinds of queries that can be executed and/or require the client to provide query templates with parameters, then bind user-supplied values via NpgsqlParameters (or equivalent for other DBs). At minimum, enforce a whitelist of allowed statements (for example, only SELECT queries) using robust parsing, and avoid any server-side concatenation with user input.

Best fix consistent with existing functionality and limited scope of the snippets:

  1. Tighten ValidateAndSanitizeQuery so it only allows read-only SELECT statements and rejects anything else. This does not make the code fully safe (it’s still a free-form SELECT), but it materially reduces risk and addresses the core CodeQL concern about running arbitrary SQL, while staying within the design of a generic query executor.
  2. Ensure that this stricter validation is also used when constructing queries for PostgreSQL in ExecutePostgreSqlQueryAsync (it already is) and for other DB types via ExecuteQueryAsync.
  3. Optionally, make it explicit in the code (and doc comments) that only SELECT is allowed, and enforce that there is a single statement (disallow ; in the text).

Since we must not rewrite the API to use parameterized templates (that would be a large behavioral change) and can only touch the shown snippets, the single best change is to harden ValidateAndSanitizeQuery to enforce a strict whitelist policy and protect against multiple statements, while leaving the rest of the logic intact.

Concretely:

  • Modify ValidateAndSanitizeQuery in src/SmartRAG/Services/Database/DatabaseParserService.cs:
    • After trimming, ensure the query starts with SELECT (case-insensitive, ignoring leading whitespace and possibly parentheses).
    • Reject any query containing ; that could indicate multiple statements.
    • Keep and possibly extend the dangerous keyword/pattern checks.
  • No changes are needed in DatabaseController.cs, because it already passes request.Query through the service API. The main risk is in how the service executes the query.

We do not introduce any new external libraries; all changes use System and System.Text.RegularExpressions, which are already being used.


Suggested changeset 1
src/SmartRAG/Services/Database/DatabaseParserService.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/SmartRAG/Services/Database/DatabaseParserService.cs b/src/SmartRAG/Services/Database/DatabaseParserService.cs
--- a/src/SmartRAG/Services/Database/DatabaseParserService.cs
+++ b/src/SmartRAG/Services/Database/DatabaseParserService.cs
@@ -159,7 +159,24 @@
         if (string.IsNullOrWhiteSpace(query))
             throw new ArgumentException("Query cannot be null or empty", nameof(query));
 
-        var upperQuery = query.ToUpperInvariant();
+        // Normalize whitespace and case for analysis
+        var trimmed = query.Trim();
+        var upperQuery = trimmed.ToUpperInvariant();
+
+        // Enforce read-only queries: only allow SELECT statements
+        // This reduces the risk of destructive operations and SQL injection.
+        if (!upperQuery.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
+        {
+            throw new ArgumentException("Only SELECT statements are allowed in custom queries", nameof(query));
+        }
+
+        // Disallow multiple statements separated by semicolons
+        // This helps prevent statement stacking attacks.
+        if (upperQuery.Contains(";", StringComparison.Ordinal))
+        {
+            throw new ArgumentException("Multiple SQL statements are not allowed in a single query", nameof(query));
+        }
+
         var dangerousKeywords = new[] { "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE", "EXEC", "EXECUTE", "SP_", "XP_" };
 
         foreach (var keyword in dangerousKeywords)
@@ -174,7 +191,7 @@
             }
         }
 
-        var sanitized = query.Trim();
+        var sanitized = trimmed;
 
         if (sanitized.Contains(";--", StringComparison.Ordinal) ||
             sanitized.Contains(";/*", StringComparison.Ordinal) ||
EOF
@@ -159,7 +159,24 @@
if (string.IsNullOrWhiteSpace(query))
throw new ArgumentException("Query cannot be null or empty", nameof(query));

var upperQuery = query.ToUpperInvariant();
// Normalize whitespace and case for analysis
var trimmed = query.Trim();
var upperQuery = trimmed.ToUpperInvariant();

// Enforce read-only queries: only allow SELECT statements
// This reduces the risk of destructive operations and SQL injection.
if (!upperQuery.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
{
throw new ArgumentException("Only SELECT statements are allowed in custom queries", nameof(query));
}

// Disallow multiple statements separated by semicolons
// This helps prevent statement stacking attacks.
if (upperQuery.Contains(";", StringComparison.Ordinal))
{
throw new ArgumentException("Multiple SQL statements are not allowed in a single query", nameof(query));
}

var dangerousKeywords = new[] { "DROP", "DELETE", "TRUNCATE", "ALTER", "CREATE", "EXEC", "EXECUTE", "SP_", "XP_" };

foreach (var keyword in dangerousKeywords)
@@ -174,7 +191,7 @@
}
}

var sanitized = query.Trim();
var sanitized = trimmed;

if (sanitized.Contains(";--", StringComparison.Ordinal) ||
sanitized.Contains(";/*", StringComparison.Ordinal) ||
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant