You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/locate_design.md
+75-32Lines changed: 75 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,22 +2,26 @@
2
2
3
3
## Background
4
4
5
-
Most LSP (Language Server Protocol) capabilities require a precise code `Position` or `Range`. For LLM Agents, providing exact line and column numbers is difficult—Agents typically understand code based on semantics rather than precise character offsets.
5
+
Most LSP (Language Server Protocol) capabilities require a precise code `Position` or `Range`. For LLM Agents, providing exact line and column numbers is difficult: Agents typically understand code based on semantics rather than precise character offsets.
6
6
7
7
### Problems with Traditional Approaches
8
8
9
9
**Option A: Direct Line/Column Specification**
10
+
10
11
```json
11
-
{"line": 42, "character": 15}
12
+
{"line": 42, "character": 15}
12
13
```
14
+
13
15
- Difficult for Agents to accurately calculate column numbers.
14
16
- Position becomes invalid after minor code changes.
15
17
- Lacks semantic expressiveness.
16
18
17
19
**Option B: Symbol Path Only**
20
+
18
21
```json
19
-
{"symbol_path": ["MyClass", "my_method"]}
22
+
{"symbol_path": ["MyClass", "my_method"]}
20
23
```
24
+
21
25
- Can only locate symbol declarations.
22
26
- Cannot locate specific positions inside a symbol.
|`SymbolScope`|`None`| Position of the symbol's declared name|
81
+
|`SymbolScope`| With `<HERE>`| Marked position within the symbol body|
77
82
|`SymbolScope`| Without `<HERE>`| Start of matched text within the symbol body |
78
-
|`LineScope`|`None`| First non-whitespace character of the line |
79
-
|`LineScope`| With `<HERE>`| Marked position within the line |
80
-
|`LineScope`| Without `<HERE>`| Start of matched text within the line |
81
-
|`None`| With `<HERE>`| Global search, marked position |
82
-
|`None`| Without `<HERE>`| Global search, start of matched text |
83
-
|`None`|`None`| ❌ Invalid, validation failure |
83
+
|`LineScope`|`None`| First non-whitespace character of the line |
84
+
|`LineScope`| With `<HERE>`| Marked position within the line |
85
+
|`LineScope`| Without `<HERE>`| Start of matched text within the line |
86
+
|`None`| With `<HERE>`| Global search, marked position |
87
+
|`None`| Without `<HERE>`| Global search, start of matched text |
88
+
|`None`|`None`| ❌ Invalid, validation should failure |
89
+
90
+
## Whitespace Handling
91
+
92
+
To balance flexibility and precision, the matching engine uses a **token-aware** whitespace strategy rather than exact string matching or full fuzzy matching.
93
+
94
+
### Tokenization Strategy
95
+
96
+
The search pattern is first tokenized into identifiers, operators, and explicit whitespace. The matching then follows these rules:
97
+
98
+
1.**Identifiers remain atomic**: Spaces are never allowed within an identifier (e.g., `int` will not match `i n t`).
99
+
2.**Flexible operator spacing**: Zero or more whitespace characters (`\s*`) are allowed between identifiers and operators, or between operators.
100
+
3.**Mandatory explicit whitespace**: If the search pattern contains explicit whitespace, the source must contain at least one whitespace character (`\s+`) at that position.
An empty `find` pattern (or whitespace-only) with a marker returns:
114
+
- Offset 0 if both before and after segments are empty.
115
+
- Otherwise, it is treated as a mandatory whitespace pattern (requiring at least one whitespace character).
116
+
117
+
### Design Rationale
118
+
119
+
#### Why Not Exact String Matching?
120
+
Code formatting varies across teams and tools. Exact matching would break on variations in indentation (spaces vs tabs), spacing around operators, or line continuation differences.
121
+
122
+
#### Why Not Full Fuzzy Matching?
123
+
Overly permissive matching creates ambiguity. For example, `int a` matching `inta` changes semantic meaning, and cross-line matches can accidentally hit unintended code structures.
124
+
125
+
#### Why Token-Based?
126
+
Token boundaries align with programming language semantics. It preserves identifier integrity while allowing natural operator spacing variations, matching the developer's mental model of "what should match".
84
127
85
128
## LSP Capability Mapping
86
129
87
130
### Capabilities Requiring Position
88
131
89
-
| LSP Capability | Positioning Need | Locate Usage |
90
-
|---------|---------|------------|
91
-
|`textDocument/definition`| Identifier position |`SymbolScope` or `find="<HERE>identifier"`|
92
-
|`textDocument/references`| Symbol declaration position |`SymbolScope(symbol_path=[...])`|
93
-
|`textDocument/rename`| Symbol declaration position |`SymbolScope(symbol_path=[...])`|
94
-
|`textDocument/hover`| Any identifier |`find="<HERE>target"`|
95
-
|`textDocument/completion`| Trigger point |`find="obj.<HERE>"`|
0 commit comments