Skip to content

Commit 1938e89

Browse files
committed
fix: add Unicode support for file path mentions in slash commands
- Added Unicode flag (u) to mentionRegex and mentionRegexGlobal to properly match Unicode characters - Added comprehensive tests for various Unicode scripts (Chinese, Japanese, Korean, Arabic, Cyrillic, etc.) - Updated documentation to clarify Unicode support in file paths - Fixes #7240
1 parent 5cf78a4 commit 1938e89

File tree

2 files changed

+31
-15
lines changed

2 files changed

+31
-15
lines changed

src/shared/__tests__/context-mentions.spec.ts

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,19 @@ describe("mentionRegex and mentionRegexGlobal", () => {
2525
{ input: "@a1b2c3d", expected: ["@a1b2c3d"] }, // Git commit hash (short)
2626
{ input: "@a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0", expected: ["@a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0"] }, // Git commit hash (long)
2727

28+
// Unicode file paths (Chinese, Japanese, Korean, Arabic, etc.)
29+
{ input: "@/路径/中文文件.txt", expected: ["@/路径/中文文件.txt"] }, // Chinese characters
30+
{ input: "@/パス/日本語ファイル.txt", expected: ["@/パス/日本語ファイル.txt"] }, // Japanese characters
31+
{ input: "@/경로/한국어파일.txt", expected: ["@/경로/한국어파일.txt"] }, // Korean characters
32+
{ input: "@/مسار/ملف_عربي.txt", expected: ["@/مسار/ملف_عربي.txt"] }, // Arabic characters
33+
{ input: "@/путь/русский_файл.txt", expected: ["@/путь/русский_файл.txt"] }, // Cyrillic characters
34+
{ input: "@/dossier/fichier_français.txt", expected: ["@/dossier/fichier_français.txt"] }, // French accents
35+
{ input: "@/carpeta/archivo_español.txt", expected: ["@/carpeta/archivo_español.txt"] }, // Spanish characters
36+
{ input: "@/文件夹/", expected: ["@/文件夹/"] }, // Chinese folder
37+
{ input: "@/mixed/中文_english_日本語.txt", expected: ["@/mixed/中文_english_日本語.txt"] }, // Mixed languages
38+
{ input: "@/emoji/file_😀_test.txt", expected: ["@/emoji/file_😀_test.txt"] }, // Emoji in filename
39+
{ input: "@/path/文件\\ with\\ 空格.txt", expected: ["@/path/文件\\ with\\ 空格.txt"] }, // Unicode with escaped spaces
40+
2841
// Mentions within text
2942
{
3043
input: "Check file @/path/to/file\\ with\\ spaces.txt for details.",
@@ -33,6 +46,8 @@ describe("mentionRegex and mentionRegexGlobal", () => {
3346
{ input: "See @problems and @terminal output.", expected: ["@problems", "@terminal"] },
3447
{ input: "URL: @https://example.com.", expected: ["@https://example.com"] }, // Trailing punctuation
3548
{ input: "Commit @a1b2c3d, then check @/file.txt", expected: ["@a1b2c3d", "@/file.txt"] },
49+
{ input: "Check @/文档/报告.pdf for details", expected: ["@/文档/报告.pdf"] }, // Unicode in sentence
50+
{ input: "Files: @/файл1.txt, @/файл2.txt", expected: ["@/файл1.txt", "@/файл2.txt"] }, // Multiple Unicode mentions
3651

3752
// Negative cases (should not match or match partially)
3853
{ input: "@/path/with unescaped space.txt", expected: ["@/path/with"] }, // Unescaped space

src/shared/context-mentions.ts

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,31 @@
11
/*
22
Mention regex:
3-
- **Purpose**:
4-
- To identify and highlight specific mentions in text that start with '@'.
5-
- These mentions can be file paths, URLs, or the exact word 'problems'.
3+
- **Purpose**:
4+
- To identify and highlight specific mentions in text that start with '@'.
5+
- These mentions can be file paths (including Unicode characters), URLs, or specific keywords.
66
- Ensures that trailing punctuation marks (like commas, periods, etc.) are not included in the match, allowing punctuation to follow the mention without being part of it.
77
88
- **Regex Breakdown**:
9-
- `/@`:
9+
- `/@`:
1010
- **@**: The mention must start with the '@' symbol.
1111
1212
- `((?:\/|\w+:\/\/)[^\s]+?|problems\b|git-changes\b)`:
1313
- **Capturing Group (`(...)`)**: Captures the part of the string that matches one of the specified patterns.
14-
- `(?:\/|\w+:\/\/)`:
14+
- `(?:\/|\w+:\/\/)`:
1515
- **Non-Capturing Group (`(?:...)`)**: Groups the alternatives without capturing them for back-referencing.
16-
- `\/`:
16+
- `\/`:
1717
- **Slash (`/`)**: Indicates that the mention is a file or folder path starting with a '/'.
1818
- `|`: Logical OR.
1919
- `\w+:\/\/`:
2020
- **Protocol (`\w+://`)**: Matches URLs that start with a word character sequence followed by '://', such as 'http://', 'https://', 'ftp://', etc.
2121
- `(?:[^\s\\]|\\ )+?`:
2222
- **Non-Capturing Group (`(?:...)`)**: Groups the alternatives without capturing them.
23-
- **Non-Whitespace and Non-Backslash (`[^\s\\]`)**: Matches any character that is not whitespace or a backslash.
23+
- **Non-Whitespace and Non-Backslash (`[^\s\\]`)**: Matches any character that is not whitespace or a backslash, including Unicode characters.
2424
- **OR (`|`)**: Logical OR.
2525
- **Escaped Space (`\\ `)**: Matches a backslash followed by a space (an escaped space).
2626
- **Non-Greedy (`+?`)**: Ensures the smallest possible match, preventing the inclusion of trailing punctuation.
2727
- `|`: Logical OR.
28-
- `problems\b`:
28+
- `problems\b`:
2929
- **Exact Word ('problems')**: Matches the exact word 'problems'.
3030
- **Word Boundary (`\b`)**: Ensures that 'problems' is matched as a whole word and not as part of another word (e.g., 'problematic').
3131
- `|`: Logical OR.
@@ -34,28 +34,29 @@ Mention regex:
3434
- **Word Boundary (`\b`)**: Ensures that 'terminal' is matched as a whole word and not as part of another word (e.g., 'terminals').
3535
- `(?=[.,;:!?]?(?=[\s\r\n]|$))`:
3636
- **Positive Lookahead (`(?=...)`)**: Ensures that the match is followed by specific patterns without including them in the match.
37-
- `[.,;:!?]?`:
37+
- `[.,;:!?]?`:
3838
- **Optional Punctuation (`[.,;:!?]?`)**: Matches zero or one of the specified punctuation marks.
39-
- `(?=[\s\r\n]|$)`:
39+
- `(?=[\s\r\n]|$)`:
4040
- **Nested Positive Lookahead (`(?=[\s\r\n]|$)`)**: Ensures that the punctuation (if present) is followed by a whitespace character, a line break, or the end of the string.
4141
4242
- **Summary**:
4343
- The regex effectively matches:
44-
- Mentions that are file or folder paths starting with '/' and containing any non-whitespace characters (including periods within the path).
45-
- File paths can include spaces if they are escaped with a backslash (e.g., `@/path/to/file\ with\ spaces.txt`).
44+
- Mentions that are file or folder paths starting with '/' and containing any non-whitespace characters, including Unicode characters like Chinese, Japanese, Korean, etc.
45+
- File paths can include spaces if they are escaped with a backslash (e.g., `@/path/to/file\ with\ spaces.txt` or `@/路径/中文文件.txt`).
4646
- URLs that start with a protocol (like 'http://') followed by any non-whitespace characters (including query parameters).
4747
- The exact word 'problems'.
4848
- The exact word 'git-changes'.
4949
- The exact word 'terminal'.
5050
- It ensures that any trailing punctuation marks (such as ',', '.', '!', etc.) are not included in the matched mention, allowing the punctuation to follow the mention naturally in the text.
51+
- The 'u' flag enables full Unicode support, allowing the regex to properly match Unicode characters in file paths.
5152
5253
- **Global Regex**:
53-
- `mentionRegexGlobal`: Creates a global version of the `mentionRegex` to find all matches within a given string.
54+
- `mentionRegexGlobal`: Creates a global version of the `mentionRegex` with Unicode support to find all matches within a given string.
5455
5556
*/
5657
export const mentionRegex =
57-
/(?<!\\)@((?:\/|\w+:\/\/)(?:[^\s\\]|\\ )+?|[a-f0-9]{7,40}\b|problems\b|git-changes\b|terminal\b)(?=[.,;:!?]?(?=[\s\r\n]|$))/
58-
export const mentionRegexGlobal = new RegExp(mentionRegex.source, "g")
58+
/(?<!\\)@((?:\/|\w+:\/\/)(?:[^\s\\]|\\ )+?|[a-f0-9]{7,40}\b|problems\b|git-changes\b|terminal\b)(?=[.,;:!?]?(?=[\s\r\n]|$))/u
59+
export const mentionRegexGlobal = new RegExp(mentionRegex.source, "gu")
5960

6061
// Regex to match command mentions like /command-name anywhere in text
6162
export const commandRegexGlobal = /(?:^|\s)\/([a-zA-Z0-9_\.-]+)(?=\s|$)/g

0 commit comments

Comments
 (0)