Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 47 additions & 39 deletions KNOWN-ISSUES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,22 @@

## Search Functionality

### 1. NOT Operator Doesn't Exclude Matches
### 1. NOT Operator Issues

**Status:** Known bug, not yet fixed

**Issue:** Queries like `"foo NOT bar"` should find documents containing "foo" but not "bar". Currently, it returns documents containing both.
**Issue:** The NOT operator has two problems:

**Example:**
1. **Standalone NOT crashes:** `km search "NOT foo"` throws FTS5 syntax error
2. **NOT doesn't exclude:** `km search "foo NOT bar"` returns documents containing both instead of excluding "bar"

**Examples:**
```bash
# Problem 1: Standalone NOT crashes
km search "NOT important"
# Error: SQLite Error 1: 'fts5: syntax error near "NOT"'

# Problem 2: NOT doesn't exclude
km put "foo and bar together"
km put "only foo here"
km search "foo NOT bar"
Expand All @@ -18,83 +26,83 @@ km search "foo NOT bar"
```

**Root Cause:**
- FTS query extraction passes `"NOT (bar)"` to SQLite FTS5
- SQLite FTS5's NOT operator support is limited/broken
- FTS5 requires NOT to have a left operand (e.g., `foo NOT bar`), standalone `NOT term` is invalid
- Even when valid, FTS query extraction passes `"NOT (bar)"` to SQLite FTS5 which doesn't work as expected
- No LINQ post-filtering is applied to exclude NOT terms
- The architecture assumes FTS handles all logic, but NOT needs LINQ filtering

**Workaround:** None currently. Avoid using NOT operator.
**Workaround:**
- For literal text containing "NOT", use quotes: `km search '"NOT important"'`
- Avoid using NOT as a boolean operator

**Fix Required:**
1. Split query: extract positive terms for FTS, negative terms for filtering
2. Apply LINQ filter to FTS results using QueryLinqBuilder
3. Filter out documents matching NOT terms
1. Handle standalone NOT gracefully (either treat as literal or provide clear error)
2. Split query: extract positive terms for FTS, negative terms for filtering
3. Apply LINQ filter to FTS results using QueryLinqBuilder
4. Filter out documents matching NOT terms

**Files Affected:**
- `src/Core/Search/NodeSearchService.cs:190` - ExtractLogical NOT handling
- Need to add LINQ filtering after line 89

---

### 2. Quoted Phrases Don't Escape Operators
### 2. Field Queries with Quoted Values Fail

**Status:** Known bug, not yet fixed

**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT".
**Issue:** Field-specific queries with quoted values containing special characters fail.

**Example:**
```bash
km put "Meeting with Alice AND Bob"
km search '"Alice AND Bob"'
km put "user:password format"
km search 'content:"user:password"'
# Expected: Find the document
# Actual: Parser error or incorrect results
# Actual: SQLite error "unknown special query"
```

**Root Cause:**
- Quoted strings should treat content literally
- Current parser/tokenizer doesn't properly handle operator escaping within quotes
- May be FTS query generation issue

**Workaround:** Rephrase searches to avoid reserved words.
- Quoted values after field prefix (`content:"..."`) generate invalid FTS queries
- FTS syntax may not support this pattern
- Need investigation of FTS query generation

**Fix Required:** Investigate tokenizer and FTS query extraction for quoted phrases.
**Workaround:** Search without field prefix or without quotes.

---

### 3. Field Queries with Quoted Values Fail
## Resolved Issues

**Status:** Known bug, not yet fixed
### Quoted Phrases Don't Escape Operators (Resolved)

**Issue:** Field-specific queries with quoted values containing special characters fail.
**Status:** Fixed

**Issue:** Cannot search for literal phrases containing reserved words like "AND", "OR", "NOT".

**Example:**
```bash
km put "user:password format"
km search 'content:"user:password"'
# Expected: Find the document
# Actual: SQLite error "unknown special query"
km put "Meeting with Alice AND Bob"
km search '"Alice AND Bob"'
# Now works correctly and finds the document
```

**Root Cause:**
- Quoted values after field prefix (`content:"..."`) generate invalid FTS queries
- FTS syntax may not support this pattern
- Need investigation of FTS query generation

**Workaround:** Search without field prefix or without quotes.
**Resolution:**
- The tokenizer correctly handles quoted strings and preserves them as literal text
- The FTS query extractor properly quotes phrases containing reserved words
- E2E tests added in `SearchEndToEndTests.cs` to prevent regression (tests: `KnownIssue2_*`)

---

## Testing Gaps

These bugs were discovered through comprehensive E2E testing. Previous tests only verified:
- AST structure correctness
- LINQ expression building
- Direct FTS calls
- AST structure correctness
- LINQ expression building
- Direct FTS calls

But did NOT test:
- Full pipeline: Parse Extract FTS Search Filter Rank
- Default settings (MinRelevance=0.3)
- Actual result verification
- Full pipeline: Parse -> Extract FTS -> Search -> Filter -> Rank
- Default settings (MinRelevance=0.3)
- Actual result verification

**Lesson:** Exit code testing and structure testing are insufficient. Must test actual behavior with real data.

Expand Down
139 changes: 139 additions & 0 deletions tests/Core.Tests/Search/SearchEndToEndTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -606,4 +606,143 @@ public async Task RegressionTest_FieldSpecificEqualOperator_ExtractsFtsQuery()
}

#endregion

#region Known Issue 2: Quoted Phrases With Reserved Words

[Fact]
public async Task KnownIssue2_QuotedPhraseWithAND_FindsExactPhrase()
{
// Known Issue 2: Quoted phrases don't escape operators
// This test verifies that searching for "Alice AND Bob" (as a phrase)
// finds documents containing that exact phrase, not documents with "Alice" AND "Bob" separately

// Arrange
await this.InsertAsync("doc1", "Meeting with Alice AND Bob").ConfigureAwait(false);
await this.InsertAsync("doc2", "Alice went to lunch and Bob stayed").ConfigureAwait(false);
await this.InsertAsync("doc3", "Just Alice here").ConfigureAwait(false);
await this.InsertAsync("doc4", "Just Bob here").ConfigureAwait(false);

// Act: Search for the exact phrase "Alice AND Bob" using quotes
var response = await this.SearchAsync("\"Alice AND Bob\"").ConfigureAwait(false);

// Assert: Should find only doc1 which contains the exact phrase
Assert.Equal(1, response.TotalResults);
Assert.Single(response.Results);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

[Fact]
public async Task KnownIssue2_QuotedPhraseWithOR_FindsExactPhrase()
{
// Known Issue 2: Quoted phrases don't escape operators
// This test verifies that "this OR that" searches for the literal phrase

// Arrange
await this.InsertAsync("doc1", "choose this OR that option").ConfigureAwait(false);
await this.InsertAsync("doc2", "this is one option or that is another").ConfigureAwait(false);
await this.InsertAsync("doc3", "just this").ConfigureAwait(false);
await this.InsertAsync("doc4", "just that").ConfigureAwait(false);

// Act: Search for the exact phrase "this OR that"
var response = await this.SearchAsync("\"this OR that\"").ConfigureAwait(false);

// Assert: Should find only doc1 with the exact phrase
Assert.Equal(1, response.TotalResults);
Assert.Single(response.Results);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

[Fact]
public async Task KnownIssue2_QuotedPhraseWithNOT_FindsExactPhrase()
{
// Known Issue 2: Quoted phrases don't escape operators
// This test verifies that "this is NOT important" searches for the literal phrase

// Arrange
await this.InsertAsync("doc1", "this is NOT important notice").ConfigureAwait(false);
await this.InsertAsync("doc2", "this is definitely important").ConfigureAwait(false);
await this.InsertAsync("doc3", "NOT a problem").ConfigureAwait(false);

// Act: Search for the exact phrase "this is NOT important"
var response = await this.SearchAsync("\"this is NOT important\"").ConfigureAwait(false);

// Assert: Should find only doc1 with the exact phrase
Assert.Equal(1, response.TotalResults);
Assert.Single(response.Results);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

[Fact]
public async Task KnownIssue2_QuotedReservedWordAND_FindsDocumentsContainingAND()
{
// Known Issue 2: Searching for just the word "AND" should work when quoted

// Arrange
await this.InsertAsync("doc1", "The word AND appears here").ConfigureAwait(false);
await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false);

// Act: Search for the literal word "AND"
var response = await this.SearchAsync("\"AND\"").ConfigureAwait(false);

// Assert: Should find doc1 containing "AND"
Assert.Equal(1, response.TotalResults);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

[Fact]
public async Task KnownIssue2_QuotedReservedWordOR_FindsDocumentsContainingOR()
{
// Known Issue 2: Searching for just the word "OR" should work when quoted

// Arrange
await this.InsertAsync("doc1", "The word OR appears here").ConfigureAwait(false);
await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false);

// Act: Search for the literal word "OR"
var response = await this.SearchAsync("\"OR\"").ConfigureAwait(false);

// Assert: Should find doc1 containing "OR"
Assert.Equal(1, response.TotalResults);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

[Fact]
public async Task KnownIssue2_QuotedReservedWordNOT_FindsDocumentsContainingNOT()
{
// Known Issue 2: Searching for just the word "NOT" should work when quoted

// Arrange
await this.InsertAsync("doc1", "The word NOT appears here").ConfigureAwait(false);
await this.InsertAsync("doc2", "No reserved words").ConfigureAwait(false);

// Act: Search for the literal word "NOT"
var response = await this.SearchAsync("\"NOT\"").ConfigureAwait(false);

// Assert: Should find doc1 containing "NOT"
Assert.Equal(1, response.TotalResults);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

[Fact]
public async Task KnownIssue2_MixedQuotedPhraseAndOperator_WorksCorrectly()
{
// Known Issue 2: Mixing quoted phrases with actual operators should work
// Search: "Alice AND Bob" AND kubernetes
// This should find documents containing both the exact phrase "Alice AND Bob" AND the word "kubernetes"

// Arrange
await this.InsertAsync("doc1", "Meeting notes for Alice AND Bob about kubernetes").ConfigureAwait(false);
await this.InsertAsync("doc2", "Meeting notes for Alice AND Bob about docker").ConfigureAwait(false);
await this.InsertAsync("doc3", "Meeting notes for Alice about kubernetes").ConfigureAwait(false);

// Act: Search for exact phrase AND another term
var response = await this.SearchAsync("\"Alice AND Bob\" AND kubernetes").ConfigureAwait(false);

// Assert: Should find only doc1
Assert.Equal(1, response.TotalResults);
Assert.Single(response.Results);
Assert.Equal(this._insertedIds["doc1"], response.Results[0].Id);
}

#endregion
}
9 changes: 4 additions & 5 deletions tests/Main.Tests/Integration/ExamplesCommandE2ETests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -215,11 +215,10 @@ await this.ExecAsync($"search \"title:api AND (content:rest OR content:graphql)\

#endregion

// NOTE: Escaping special characters tests disabled - Known limitations:
// 1. Quoted phrases like '"Alice AND Bob"' don't work - parser/FTS issues
// 2. Field queries with quoted values like 'content:"user:password"' fail with SQLite error
// 3. Literal reserved words like '"NOT"' cause parser errors
// These are known bugs that need investigation and fixes before examples can be shown to users
// NOTE: Some escaping special characters tests disabled - Known limitations:
// - Field queries with quoted values like 'content:"user:password"' fail with SQLite error
// FIXED: Quoted phrases like '"Alice AND Bob"' now work (see SearchEndToEndTests.KnownIssue2_*)
// FIXED: Literal reserved words like '"NOT"' now work correctly

#region MongoDB JSON Format

Expand Down
Loading