Skip to content

Commit b596a17

Browse files
authored
test: add regression tests for quoted phrases with reserved words (#1108)
## Summary The known issue "Quoted Phrases Don't Escape Operators" was already fixed in a previous commit. This PR adds regression tests and documents the resolution. ## Changes - **`tests/Core.Tests/Search/SearchEndToEndTests.cs`**: Added 7 E2E regression tests: - `KnownIssue2_QuotedPhraseWithAND_FindsExactPhrase` - `KnownIssue2_QuotedPhraseWithOR_FindsExactPhrase` - `KnownIssue2_QuotedPhraseWithNOT_FindsExactPhrase` - `KnownIssue2_QuotedReservedWordAND_FindsDocumentsContainingAND` - `KnownIssue2_QuotedReservedWordOR_FindsDocumentsContainingOR` - `KnownIssue2_QuotedReservedWordNOT_FindsDocumentsContainingNOT` - `KnownIssue2_MixedQuotedPhraseAndOperator_WorksCorrectly` - **`KNOWN-ISSUES.md`**: Moved issue to "Resolved Issues" section - **`tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs`**: Updated comments ## How it works The tokenizer in `InfixQueryParser.cs` correctly identifies quoted strings and preserves their content as literal text. The FTS query extractor properly quotes phrases and escapes reserved words when generating FTS5 queries. ## Test plan - [x] All 537 tests pass (323 Core + 214 Main) - [x] Zero skipped tests - [x] Code coverage at 83.93% (above 80% threshold) - [x] `build.sh` passes with 0 warnings - [x] `format.sh` passes - [x] `coverage.sh` passes
1 parent 2dde39e commit b596a17

File tree

3 files changed

+190
-44
lines changed

3 files changed

+190
-44
lines changed

KNOWN-ISSUES.md

Lines changed: 47 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,22 @@
22

33
## Search Functionality
44

5-
### 1. NOT Operator Doesn't Exclude Matches
5+
### 1. NOT Operator Issues
66

77
**Status:** Known bug, not yet fixed
88

9-
**Issue:** Queries like `"foo NOT bar"` should find documents containing "foo" but not "bar". Currently, it returns documents containing both.
9+
**Issue:** The NOT operator has two problems:
1010

11-
**Example:**
11+
1. **Standalone NOT crashes:** `km search "NOT foo"` throws FTS5 syntax error
12+
2. **NOT doesn't exclude:** `km search "foo NOT bar"` returns documents containing both instead of excluding "bar"
13+
14+
**Examples:**
1215
```bash
16+
# Problem 1: Standalone NOT crashes
17+
km search "NOT important"
18+
# Error: SQLite Error 1: 'fts5: syntax error near "NOT"'
19+
20+
# Problem 2: NOT doesn't exclude
1321
km put "foo and bar together"
1422
km put "only foo here"
1523
km search "foo NOT bar"
@@ -18,83 +26,83 @@ km search "foo NOT bar"
1826
```
1927

2028
**Root Cause:**
21-
- FTS query extraction passes `"NOT (bar)"` to SQLite FTS5
22-
- SQLite FTS5's NOT operator support is limited/broken
29+
- FTS5 requires NOT to have a left operand (e.g., `foo NOT bar`), standalone `NOT term` is invalid
30+
- Even when valid, FTS query extraction passes `"NOT (bar)"` to SQLite FTS5 which doesn't work as expected
2331
- No LINQ post-filtering is applied to exclude NOT terms
2432
- The architecture assumes FTS handles all logic, but NOT needs LINQ filtering
2533

26-
**Workaround:** None currently. Avoid using NOT operator.
34+
**Workaround:**
35+
- For literal text containing "NOT", use quotes: `km search '"NOT important"'`
36+
- Avoid using NOT as a boolean operator
2737

2838
**Fix Required:**
29-
1. Split query: extract positive terms for FTS, negative terms for filtering
30-
2. Apply LINQ filter to FTS results using QueryLinqBuilder
31-
3. Filter out documents matching NOT terms
39+
1. Handle standalone NOT gracefully (either treat as literal or provide clear error)
40+
2. Split query: extract positive terms for FTS, negative terms for filtering
41+
3. Apply LINQ filter to FTS results using QueryLinqBuilder
42+
4. Filter out documents matching NOT terms
3243

3344
**Files Affected:**
3445
- `src/Core/Search/NodeSearchService.cs:190` - ExtractLogical NOT handling
3546
- Need to add LINQ filtering after line 89
3647

3748
---
3849

39-
### 2. Quoted Phrases Don't Escape Operators
50+
### 2. Field Queries with Quoted Values Fail
4051

4152
**Status:** Known bug, not yet fixed
4253

43-
**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT".
54+
**Issue:** Field-specific queries with quoted values containing special characters fail.
4455

4556
**Example:**
4657
```bash
47-
km put "Meeting with Alice AND Bob"
48-
km search '"Alice AND Bob"'
58+
km put "user:password format"
59+
km search 'content:"user:password"'
4960
# Expected: Find the document
50-
# Actual: Parser error or incorrect results
61+
# Actual: SQLite error "unknown special query"
5162
```
5263

5364
**Root Cause:**
54-
- Quoted strings should treat content literally
55-
- Current parser/tokenizer doesn't properly handle operator escaping within quotes
56-
- May be FTS query generation issue
57-
58-
**Workaround:** Rephrase searches to avoid reserved words.
65+
- Quoted values after field prefix (`content:"..."`) generate invalid FTS queries
66+
- FTS syntax may not support this pattern
67+
- Need investigation of FTS query generation
5968

60-
**Fix Required:** Investigate tokenizer and FTS query extraction for quoted phrases.
69+
**Workaround:** Search without field prefix or without quotes.
6170

6271
---
6372

64-
### 3. Field Queries with Quoted Values Fail
73+
## Resolved Issues
6574

66-
**Status:** Known bug, not yet fixed
75+
### Quoted Phrases Don't Escape Operators (Resolved)
6776

68-
**Issue:** Field-specific queries with quoted values containing special characters fail.
77+
**Status:** Fixed
78+
79+
**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT".
6980

7081
**Example:**
7182
```bash
72-
km put "user:password format"
73-
km search 'content:"user:password"'
74-
# Expected: Find the document
75-
# Actual: SQLite error "unknown special query"
83+
km put "Meeting with Alice AND Bob"
84+
km search '"Alice AND Bob"'
85+
# Now works correctly and finds the document
7686
```
7787

78-
**Root Cause:**
79-
- Quoted values after field prefix (`content:"..."`) generate invalid FTS queries
80-
- FTS syntax may not support this pattern
81-
- Need investigation of FTS query generation
82-
83-
**Workaround:** Search without field prefix or without quotes.
88+
**Resolution:**
89+
- The tokenizer correctly handles quoted strings and preserves them as literal text
90+
- The FTS query extractor properly quotes phrases containing reserved words
91+
- E2E tests added in `SearchEndToEndTests.cs` to prevent regression (tests: `KnownIssue2_*`)
8492

8593
---
8694

8795
## Testing Gaps
8896

8997
These bugs were discovered through comprehensive E2E testing. Previous tests only verified:
90-
- AST structure correctness
91-
- LINQ expression building
92-
- Direct FTS calls
98+
- AST structure correctness
99+
- LINQ expression building
100+
- Direct FTS calls
93101

94102
But did NOT test:
95-
- Full pipeline: Parse Extract FTS Search Filter Rank
96-
- Default settings (MinRelevance=0.3)
97-
- Actual result verification
103+
- Full pipeline: Parse -> Extract FTS -> Search -> Filter -> Rank
104+
- Default settings (MinRelevance=0.3)
105+
- Actual result verification
98106

99107
**Lesson:** Exit code testing and structure testing are insufficient. Must test actual behavior with real data.
100108

tests/Core.Tests/Search/SearchEndToEndTests.cs

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -606,4 +606,143 @@ public async Task RegressionTest_FieldSpecificEqualOperator_ExtractsFtsQuery()
606606
}
607607

608608
#endregion
609+
610+
#region Known Issue 2: Quoted Phrases With Reserved Words
611+
612+
[Fact]
613+
public async Task KnownIssue2_QuotedPhraseWithAND_FindsExactPhrase()
614+
{
615+
// Known Issue 2: Quoted phrases don't escape operators
616+
// This test verifies that searching for "Alice AND Bob" (as a phrase)
617+
// finds documents containing that exact phrase, not documents with "Alice" AND "Bob" separately
618+
619+
// Arrange
620+
await this.InsertAsync("doc1", "Meeting with Alice AND Bob").ConfigureAwait(false);
621+
await this.InsertAsync("doc2", "Alice went to lunch and Bob stayed").ConfigureAwait(false);
622+
await this.InsertAsync("doc3", "Just Alice here").ConfigureAwait(false);
623+
await this.InsertAsync("doc4", "Just Bob here").ConfigureAwait(false);
624+
625+
// Act: Search for the exact phrase "Alice AND Bob" using quotes
626+
var response = await this.SearchAsync("\"Alice AND Bob\"").ConfigureAwait(false);
627+
628+
// Assert: Should find only doc1 which contains the exact phrase
629+
Assert.Equal(1, response.TotalResults);
630+
Assert.Single(response.Results);
631+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
632+
}
633+
634+
[Fact]
635+
public async Task KnownIssue2_QuotedPhraseWithOR_FindsExactPhrase()
636+
{
637+
// Known Issue 2: Quoted phrases don't escape operators
638+
// This test verifies that "this OR that" searches for the literal phrase
639+
640+
// Arrange
641+
await this.InsertAsync("doc1", "choose this OR that option").ConfigureAwait(false);
642+
await this.InsertAsync("doc2", "this is one option or that is another").ConfigureAwait(false);
643+
await this.InsertAsync("doc3", "just this").ConfigureAwait(false);
644+
await this.InsertAsync("doc4", "just that").ConfigureAwait(false);
645+
646+
// Act: Search for the exact phrase "this OR that"
647+
var response = await this.SearchAsync("\"this OR that\"").ConfigureAwait(false);
648+
649+
// Assert: Should find only doc1 with the exact phrase
650+
Assert.Equal(1, response.TotalResults);
651+
Assert.Single(response.Results);
652+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
653+
}
654+
655+
[Fact]
656+
public async Task KnownIssue2_QuotedPhraseWithNOT_FindsExactPhrase()
657+
{
658+
// Known Issue 2: Quoted phrases don't escape operators
659+
// This test verifies that "this is NOT important" searches for the literal phrase
660+
661+
// Arrange
662+
await this.InsertAsync("doc1", "this is NOT important notice").ConfigureAwait(false);
663+
await this.InsertAsync("doc2", "this is definitely important").ConfigureAwait(false);
664+
await this.InsertAsync("doc3", "NOT a problem").ConfigureAwait(false);
665+
666+
// Act: Search for the exact phrase "this is NOT important"
667+
var response = await this.SearchAsync("\"this is NOT important\"").ConfigureAwait(false);
668+
669+
// Assert: Should find only doc1 with the exact phrase
670+
Assert.Equal(1, response.TotalResults);
671+
Assert.Single(response.Results);
672+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
673+
}
674+
675+
[Fact]
676+
public async Task KnownIssue2_QuotedReservedWordAND_FindsDocumentsContainingAND()
677+
{
678+
// Known Issue 2: Searching for just the word "AND" should work when quoted
679+
680+
// Arrange
681+
await this.InsertAsync("doc1", "The word AND appears here").ConfigureAwait(false);
682+
await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false);
683+
684+
// Act: Search for the literal word "AND"
685+
var response = await this.SearchAsync("\"AND\"").ConfigureAwait(false);
686+
687+
// Assert: Should find doc1 containing "AND"
688+
Assert.Equal(1, response.TotalResults);
689+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
690+
}
691+
692+
[Fact]
693+
public async Task KnownIssue2_QuotedReservedWordOR_FindsDocumentsContainingOR()
694+
{
695+
// Known Issue 2: Searching for just the word "OR" should work when quoted
696+
697+
// Arrange
698+
await this.InsertAsync("doc1", "The word OR appears here").ConfigureAwait(false);
699+
await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false);
700+
701+
// Act: Search for the literal word "OR"
702+
var response = await this.SearchAsync("\"OR\"").ConfigureAwait(false);
703+
704+
// Assert: Should find doc1 containing "OR"
705+
Assert.Equal(1, response.TotalResults);
706+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
707+
}
708+
709+
[Fact]
710+
public async Task KnownIssue2_QuotedReservedWordNOT_FindsDocumentsContainingNOT()
711+
{
712+
// Known Issue 2: Searching for just the word "NOT" should work when quoted
713+
714+
// Arrange
715+
await this.InsertAsync("doc1", "The word NOT appears here").ConfigureAwait(false);
716+
await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false);
717+
718+
// Act: Search for the literal word "NOT"
719+
var response = await this.SearchAsync("\"NOT\"").ConfigureAwait(false);
720+
721+
// Assert: Should find doc1 containing "NOT"
722+
Assert.Equal(1, response.TotalResults);
723+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
724+
}
725+
726+
[Fact]
727+
public async Task KnownIssue2_MixedQuotedPhraseAndOperator_WorksCorrectly()
728+
{
729+
// Known Issue 2: Mixing quoted phrases with actual operators should work
730+
// Search: "Alice AND Bob" AND kubernetes
731+
// This should find documents containing both the exact phrase "Alice AND Bob" AND the word "kubernetes"
732+
733+
// Arrange
734+
await this.InsertAsync("doc1", "Meeting notes for Alice AND Bob about kubernetes").ConfigureAwait(false);
735+
await this.InsertAsync("doc2", "Meeting notes for Alice AND Bob about docker").ConfigureAwait(false);
736+
await this.InsertAsync("doc3", "Meeting notes for Alice about kubernetes").ConfigureAwait(false);
737+
738+
// Act: Search for exact phrase AND another term
739+
var response = await this.SearchAsync("\"Alice AND Bob\" AND kubernetes").ConfigureAwait(false);
740+
741+
// Assert: Should find only doc1
742+
Assert.Equal(1, response.TotalResults);
743+
Assert.Single(response.Results);
744+
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
745+
}
746+
747+
#endregion
609748
}

tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -215,11 +215,10 @@ await this.ExecAsync($"search \"title:api AND (content:rest OR content:graphql)\
215215

216216
#endregion
217217

218-
// NOTE: Escaping special characters tests disabled - Known limitations:
219-
// 1. Quoted phrases like '"Alice AND Bob"' don't work - parser/FTS issues
220-
// 2. Field queries with quoted values like 'content:"user:password"' fail with SQLite error
221-
// 3. Literal reserved words like '"NOT"' cause parser errors
222-
// These are known bugs that need investigation and fixes before examples can be shown to users
218+
// NOTE: Some escaping special characters tests disabled - Known limitations:
219+
// - Field queries with quoted values like 'content:"user:password"' fail with SQLite error
220+
// FIXED: Quoted phrases like '"Alice AND Bob"' now work (see SearchEndToEndTests.KnownIssue2_*)
221+
// FIXED: Literal reserved words like '"NOT"' now work correctly
223222

224223
#region MongoDB JSON Format
225224

0 commit comments

Comments
 (0)