diff --git a/KNOWN-ISSUES.md b/KNOWN-ISSUES.md index d322d638c..fcc945e2d 100644 --- a/KNOWN-ISSUES.md +++ b/KNOWN-ISSUES.md @@ -2,14 +2,22 @@ ## Search Functionality -### 1. NOT Operator Doesn't Exclude Matches +### 1. NOT Operator Issues **Status:** Known bug, not yet fixed -**Issue:** Queries like `"foo NOT bar"` should find documents containing "foo" but not "bar". Currently, it returns documents containing both. +**Issue:** The NOT operator has two problems: -**Example:** +1. **Standalone NOT crashes:** `km search "NOT foo"` throws FTS5 syntax error +2. **NOT doesn't exclude:** `km search "foo NOT bar"` returns documents containing both instead of excluding "bar" + +**Examples:** ```bash +# Problem 1: Standalone NOT crashes +km search "NOT important" +# Error: SQLite Error 1: 'fts5: syntax error near "NOT"' + +# Problem 2: NOT doesn't exclude km put "foo and bar together" km put "only foo here" km search "foo NOT bar" @@ -18,17 +26,20 @@ km search "foo NOT bar" ``` **Root Cause:** -- FTS query extraction passes `"NOT (bar)"` to SQLite FTS5 -- SQLite FTS5's NOT operator support is limited/broken +- FTS5 requires NOT to have a left operand (e.g., `foo NOT bar`), standalone `NOT term` is invalid +- Even when valid, FTS query extraction passes `"NOT (bar)"` to SQLite FTS5 which doesn't work as expected - No LINQ post-filtering is applied to exclude NOT terms - The architecture assumes FTS handles all logic, but NOT needs LINQ filtering -**Workaround:** None currently. Avoid using NOT operator. +**Workaround:** +- For literal text containing "NOT", use quotes: `km search '"NOT important"'` +- Avoid using NOT as a boolean operator **Fix Required:** -1. Split query: extract positive terms for FTS, negative terms for filtering -2. Apply LINQ filter to FTS results using QueryLinqBuilder -3. Filter out documents matching NOT terms +1. Handle standalone NOT gracefully (either treat as literal or provide clear error) +2. Split query: extract positive terms for FTS, negative terms for filtering +3. Apply LINQ filter to FTS results using QueryLinqBuilder +4. Filter out documents matching NOT terms **Files Affected:** - `src/Core/Search/NodeSearchService.cs:190` - ExtractLogical NOT handling @@ -36,65 +47,62 @@ km search "foo NOT bar" --- -### 2. Quoted Phrases Don't Escape Operators +### 2. Field Queries with Quoted Values Fail **Status:** Known bug, not yet fixed -**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT". +**Issue:** Field-specific queries with quoted values containing special characters fail. **Example:** ```bash -km put "Meeting with Alice AND Bob" -km search '"Alice AND Bob"' +km put "user:password format" +km search 'content:"user:password"' # Expected: Find the document -# Actual: Parser error or incorrect results +# Actual: SQLite error "unknown special query" ``` **Root Cause:** -- Quoted strings should treat content literally -- Current parser/tokenizer doesn't properly handle operator escaping within quotes -- May be FTS query generation issue - -**Workaround:** Rephrase searches to avoid reserved words. +- Quoted values after field prefix (`content:"..."`) generate invalid FTS queries +- FTS syntax may not support this pattern +- Need investigation of FTS query generation -**Fix Required:** Investigate tokenizer and FTS query extraction for quoted phrases. +**Workaround:** Search without field prefix or without quotes. --- -### 3. Field Queries with Quoted Values Fail +## Resolved Issues -**Status:** Known bug, not yet fixed +### Quoted Phrases Don't Escape Operators (Resolved) -**Issue:** Field-specific queries with quoted values containing special characters fail. +**Status:** Fixed + +**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT". **Example:** ```bash -km put "user:password format" -km search 'content:"user:password"' -# Expected: Find the document -# Actual: SQLite error "unknown special query" +km put "Meeting with Alice AND Bob" +km search '"Alice AND Bob"' +# Now works correctly and finds the document ``` -**Root Cause:** -- Quoted values after field prefix (`content:"..."`) generate invalid FTS queries -- FTS syntax may not support this pattern -- Need investigation of FTS query generation - -**Workaround:** Search without field prefix or without quotes. +**Resolution:** +- The tokenizer correctly handles quoted strings and preserves them as literal text +- The FTS query extractor properly quotes phrases containing reserved words +- E2E tests added in `SearchEndToEndTests.cs` to prevent regression (tests: `KnownIssue2_*`) --- ## Testing Gaps These bugs were discovered through comprehensive E2E testing. Previous tests only verified: -- ✅ AST structure correctness -- ✅ LINQ expression building -- ✅ Direct FTS calls +- AST structure correctness +- LINQ expression building +- Direct FTS calls But did NOT test: -- ❌ Full pipeline: Parse → Extract FTS → Search → Filter → Rank -- ❌ Default settings (MinRelevance=0.3) -- ❌ Actual result verification +- Full pipeline: Parse -> Extract FTS -> Search -> Filter -> Rank +- Default settings (MinRelevance=0.3) +- Actual result verification **Lesson:** Exit code testing and structure testing are insufficient. Must test actual behavior with real data. diff --git a/tests/Core.Tests/Search/SearchEndToEndTests.cs b/tests/Core.Tests/Search/SearchEndToEndTests.cs index 24442d2a9..04e2f7c50 100644 --- a/tests/Core.Tests/Search/SearchEndToEndTests.cs +++ b/tests/Core.Tests/Search/SearchEndToEndTests.cs @@ -606,4 +606,143 @@ public async Task RegressionTest_FieldSpecificEqualOperator_ExtractsFtsQuery() } #endregion + + #region Known Issue 2: Quoted Phrases With Reserved Words + + [Fact] + public async Task KnownIssue2_QuotedPhraseWithAND_FindsExactPhrase() + { + // Known Issue 2: Quoted phrases don't escape operators + // This test verifies that searching for "Alice AND Bob" (as a phrase) + // finds documents containing that exact phrase, not documents with "Alice" AND "Bob" separately + + // Arrange + await this.InsertAsync("doc1", "Meeting with Alice AND Bob").ConfigureAwait(false); + await this.InsertAsync("doc2", "Alice went to lunch and Bob stayed").ConfigureAwait(false); + await this.InsertAsync("doc3", "Just Alice here").ConfigureAwait(false); + await this.InsertAsync("doc4", "Just Bob here").ConfigureAwait(false); + + // Act: Search for the exact phrase "Alice AND Bob" using quotes + var response = await this.SearchAsync("\"Alice AND Bob\"").ConfigureAwait(false); + + // Assert: Should find only doc1 which contains the exact phrase + Assert.Equal(1, response.TotalResults); + Assert.Single(response.Results); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + [Fact] + public async Task KnownIssue2_QuotedPhraseWithOR_FindsExactPhrase() + { + // Known Issue 2: Quoted phrases don't escape operators + // This test verifies that "this OR that" searches for the literal phrase + + // Arrange + await this.InsertAsync("doc1", "choose this OR that option").ConfigureAwait(false); + await this.InsertAsync("doc2", "this is one option or that is another").ConfigureAwait(false); + await this.InsertAsync("doc3", "just this").ConfigureAwait(false); + await this.InsertAsync("doc4", "just that").ConfigureAwait(false); + + // Act: Search for the exact phrase "this OR that" + var response = await this.SearchAsync("\"this OR that\"").ConfigureAwait(false); + + // Assert: Should find only doc1 with the exact phrase + Assert.Equal(1, response.TotalResults); + Assert.Single(response.Results); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + [Fact] + public async Task KnownIssue2_QuotedPhraseWithNOT_FindsExactPhrase() + { + // Known Issue 2: Quoted phrases don't escape operators + // This test verifies that "this is NOT important" searches for the literal phrase + + // Arrange + await this.InsertAsync("doc1", "this is NOT important notice").ConfigureAwait(false); + await this.InsertAsync("doc2", "this is definitely important").ConfigureAwait(false); + await this.InsertAsync("doc3", "NOT a problem").ConfigureAwait(false); + + // Act: Search for the exact phrase "this is NOT important" + var response = await this.SearchAsync("\"this is NOT important\"").ConfigureAwait(false); + + // Assert: Should find only doc1 with the exact phrase + Assert.Equal(1, response.TotalResults); + Assert.Single(response.Results); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + [Fact] + public async Task KnownIssue2_QuotedReservedWordAND_FindsDocumentsContainingAND() + { + // Known Issue 2: Searching for just the word "AND" should work when quoted + + // Arrange + await this.InsertAsync("doc1", "The word AND appears here").ConfigureAwait(false); + await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false); + + // Act: Search for the literal word "AND" + var response = await this.SearchAsync("\"AND\"").ConfigureAwait(false); + + // Assert: Should find doc1 containing "AND" + Assert.Equal(1, response.TotalResults); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + [Fact] + public async Task KnownIssue2_QuotedReservedWordOR_FindsDocumentsContainingOR() + { + // Known Issue 2: Searching for just the word "OR" should work when quoted + + // Arrange + await this.InsertAsync("doc1", "The word OR appears here").ConfigureAwait(false); + await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false); + + // Act: Search for the literal word "OR" + var response = await this.SearchAsync("\"OR\"").ConfigureAwait(false); + + // Assert: Should find doc1 containing "OR" + Assert.Equal(1, response.TotalResults); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + [Fact] + public async Task KnownIssue2_QuotedReservedWordNOT_FindsDocumentsContainingNOT() + { + // Known Issue 2: Searching for just the word "NOT" should work when quoted + + // Arrange + await this.InsertAsync("doc1", "The word NOT appears here").ConfigureAwait(false); + await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false); + + // Act: Search for the literal word "NOT" + var response = await this.SearchAsync("\"NOT\"").ConfigureAwait(false); + + // Assert: Should find doc1 containing "NOT" + Assert.Equal(1, response.TotalResults); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + [Fact] + public async Task KnownIssue2_MixedQuotedPhraseAndOperator_WorksCorrectly() + { + // Known Issue 2: Mixing quoted phrases with actual operators should work + // Search: "Alice AND Bob" AND kubernetes + // This should find documents containing both the exact phrase "Alice AND Bob" AND the word "kubernetes" + + // Arrange + await this.InsertAsync("doc1", "Meeting notes for Alice AND Bob about kubernetes").ConfigureAwait(false); + await this.InsertAsync("doc2", "Meeting notes for Alice AND Bob about docker").ConfigureAwait(false); + await this.InsertAsync("doc3", "Meeting notes for Alice about kubernetes").ConfigureAwait(false); + + // Act: Search for exact phrase AND another term + var response = await this.SearchAsync("\"Alice AND Bob\" AND kubernetes").ConfigureAwait(false); + + // Assert: Should find only doc1 + Assert.Equal(1, response.TotalResults); + Assert.Single(response.Results); + Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id); + } + + #endregion } diff --git a/tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs b/tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs index 1d90f6227..cd9db684d 100644 --- a/tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs +++ b/tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs @@ -215,11 +215,10 @@ await this.ExecAsync($"search \"title:api AND (content:rest OR content:graphql)\ #endregion - // NOTE: Escaping special characters tests disabled - Known limitations: - // 1. Quoted phrases like '"Alice AND Bob"' don't work - parser/FTS issues - // 2. Field queries with quoted values like 'content:"user:password"' fail with SQLite error - // 3. Literal reserved words like '"NOT"' cause parser errors - // These are known bugs that need investigation and fixes before examples can be shown to users + // NOTE: Some escaping special characters tests disabled - Known limitations: + // - Field queries with quoted values like 'content:"user:password"' fail with SQLite error + // FIXED: Quoted phrases like '"Alice AND Bob"' now work (see SearchEndToEndTests.KnownIssue2_*) + // FIXED: Literal reserved words like '"NOT"' now work correctly #region MongoDB JSON Format