Skip to content

Commit b0d99f3

Browse files
authored
Fix Freelinks Aho-Corasick: failure links, cache invalidation, longest-match, and Unicode safety (TiddlyWiki#9676)
* Update aho-corasick.js fix transition logic; ensure complete outputs (via failure-output merge); clean up stats/build scoping; clarify CJK boundary behavior. * Update text.js implement global longest-match priority with overlap suppression; fix refresh invalidation to ignore $:/state and drafts; handle deletions precisely to avoid rebuilding on draft deletion; add defensive check for cached automaton presence. * Update text.js remove comment * Update aho-corasick.js remove comment * Create TiddlyWiki#9672.tid * Create TiddlyWiki#2026-0222.tid * Delete editions/tw5.com/tiddlers/releasenotes/5.4.0/TiddlyWiki#2026-0222.tid * Update text.js remove \" * Update and rename TiddlyWiki#9672.tid to TiddlyWiki#9676.tid change to right number * Update TiddlyWiki#9397.tid update the existing release note with the new PR link instead of creating a new release note. * Delete editions/tw5.com/tiddlers/releasenotes/5.4.0/TiddlyWiki#9676.tid update the existing release note with the new PR link instead of creating a new release note. * Rename TiddlyWiki#9397.tid to TiddlyWiki#9676.tid update the existing release note with the new PR link instead of creating a new release note. * Update and rename TiddlyWiki#9676.tid to TiddlyWiki#9397.tid add link * Rename TiddlyWiki#9397.tid to TiddlyWiki#9676.tid * Update tiddlywiki.info add plugin for test build * Update tiddlywiki.info reverse change, ready to be merge.
1 parent 91e7a62 commit b0d99f3

File tree

5 files changed

+292
-353
lines changed

5 files changed

+292
-353
lines changed

editions/tw5.com/tiddlers/releasenotes/5.4.0/#9397.tid

Lines changed: 0 additions & 17 deletions
This file was deleted.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
title: $:/changenotes/5.4.0/#9676
2+
description: Fix critical freelinks bugs: first character loss and false positive matches in v5.4.0
3+
release: 5.4.0
4+
tags: $:/tags/ChangeNote
5+
change-type: bugfix
6+
change-category: plugin
7+
github-links: https://github.com/TiddlyWiki/TiddlyWiki5/pull/9084 https://github.com/TiddlyWiki/TiddlyWiki5/pull/9397 https://github.com/TiddlyWiki/TiddlyWiki5/pull/9676
8+
github-contributors: s793016
9+
10+
Fixes and optimizations to the Freelinks plugin's Aho-Corasick implementation following #9397.
11+
12+
Fixes:
13+
* Failure Links Non-Functional (Critical): The failure link map used a plain object `{}` with trie nodes as keys. Since all JavaScript objects coerce to the same string `[object Object]`, every node resolved to the same map entry. Failure links were silently broken for all overlapping patterns. Fixed by replacing with `WeakMap`.
14+
* Cache Rebuilt on Every UI Interaction (Performance): Any `$:/state/...` update (e.g. clicking tabs) would trigger a full Aho-Corasick rebuild, causing severe lag on large wikis. The `refresh` logic now ignores system tiddlers, with an explicit allowlist for plugin config tiddlers.
15+
* Short Match Blocking Longer Match: A shorter title appearing earlier (e.g. "The New") could prevent a longer overlapping title (e.g. "New York City") from matching. Replaced left-to-right greedy selection with global length-first sorting and interval occupation tracking.
16+
* Unicode Index Desync in ignoreCase Mode: Calling `toLowerCase()` on the full text before searching could change string length (e.g. Turkish "İ" expands), causing `substring()` to split Emoji surrogate pairs and produce garbage output. Case conversion is now done per-character during search.
17+
* Removed Vestigial Regex Escaping: `escapeRegExp()` was called during trie construction but Aho-Corasick operates on literal character transitions, not regex. Removed.
18+
19+
Impact:
20+
* Overlapping titles now match correctly for the first time.
21+
* No cache rebuilds during normal UI interactions on large wikis.
22+
* Correct longest-match behavior for titles sharing substrings.
23+
* Safe Emoji and complex Unicode handling in case-insensitive mode.
24+
25+
26+
#9397
27+
This note addresses two major bugs introduced in the Freelinks plugin with the v5.4.0 release:
28+
29+
Fixes:
30+
* First Character Loss: The first character of a matched word would incorrectly disappear (e.g., "The" became "he"). This was fixed by correctly timing the filtering of the current tiddler's title during match validation, ensuring proper substring handling.
31+
* False Positive Matches: Unrelated words (like "it is" or "Choose") would incorrectly link to a tiddler title. This was resolved by fixing wrong output merging in the Aho-Corasick failure-link handling, eliminating spurious matches from intermediate nodes, and adding cycle detection.
32+
33+
Impact:
34+
* Significantly improved correctness and reliability of automatic linking for all users, especially in multilingual and large wikis.
35+
36+
37+
#9084
38+
This change introduces a fully optimized override of the core text widget, integrating an enhanced Aho-Corasick algorithm for automatic linkification of tiddler titles within text (freelinks). The new implementation prioritizes performance for large wikis and correct support for non-Latin scripts such as Chinese.
39+
40+
Highlights:
41+
- Full switch from regex-based matching to a custom, robust Aho-Corasick engine dedicated to rapid, multi-pattern title detection—drastically decreasing linkification time (tested: 1–5s reduced to 100–500ms on ~12,000 tiddlers).
42+
- Handles extremely large title sets gracefully, including a chunked insertion process and use of a persistent cache (`$:/config/Freelinks/PersistAhoCorasickCache`) to further accelerate subsequent linking operations in large/active wikis.
43+
- Improvements for CJK and non-Latin text: supports linking using long or full-width symbol titles such as ':' (U+FF1A) with no split or mismatch.
44+
- Smart prioritization: longer titles are automatically matched before shorter, more ambiguous ones, preventing partial/incorrect linking.
45+
- Actively skips self-linking in the current tiddler and prevents overlapping matches for clean, deterministic linkification.
46+
- End users with large or multilingual wikis see massive performance boost and 100% accurate linking for complex, full-width, or multi-language titles.
47+
- New options for persistent match cache and word boundary checking (`$:/config/Freelinks/WordBoundary`), both can be tuned based on wiki size and content language needs.
48+
- Safe for gradual rollout: legacy behavior is preserved if the new freelinks override is not enabled.

editions/tw5.com/tiddlywiki.info

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
"tiddlywiki/browser-sniff",
55
"tiddlywiki/confetti",
66
"tiddlywiki/dynannotate",
7-
"tiddlywiki/tour",
87
"tiddlywiki/internals",
98
"tiddlywiki/menubar",
10-
"tiddlywiki/railroad"
9+
"tiddlywiki/railroad",
10+
"tiddlywiki/tour"
1111
],
1212
"themes": [
1313
"tiddlywiki/vanilla",

0 commit comments

Comments
 (0)