Skip to content

Commit 6f5b7a9

Browse files
zxch3nclaude
andauthored
fix: production panic regressions (#943)
* Fix stale DAG causal iteration after node splits * Fix richtext cursor updates at fragment tails * Add list import regression for split tracking * Recover document locks after panic poisoning * Add unpoisoned lock helpers across core internals * Fail fast on poisoned core locks * Return errors for snapshot encoding edge cases * Fix loom lock helper usage * Fix three March-2026 import panics (kim, mads, pr929 fixtures) Three production blobs in loro-debug reliably panic inside doc.import on loro-crdt@1.10.6: - DagCausalIter assertion at dag/iter.rs: when a peer has multiple node segments in the target span and the later segment's deps fall outside the span, both segments reach the initial stack with zero in-degree and LIFO pops the higher counter first. Fix: in DagCausalIter::new, after out_degrees and succ are built, synthesize per-peer ordering edges so the lower counter must drain before the higher one is released. - OnceCell::set(..).unwrap() double-set at oplog/loro_dag.rs:917 during ensure_vv_for on a diamond dep (#929): a shared ancestor gets pushed onto the iterative-DFS stack by multiple paths and the second pop tries to initialize an already-filled cell. Fix: skip nodes whose vv is already Some at the top of the loop and swallow the Err from the final set as a defensive measure. - list_state Index-out-of-range panic for the mads-bootstrap fixture: the ListDiffCalculator cold-starts its RichtextTracker via new_with_unknown() and never learns about the snapshot's real list content. During replay the tracker's per-change checkouts temporarily retreat some snapshot ops, so a new op's fugue anchor lands inside the unknown prefix. CrdtRope::get_diff() then emits Retain(N) where N is larger than ListState.len(). Fix: change ContainerState::apply_diff (and DocState::apply_diff, init_with_states_and_version) to return LoroResult<()>; ListState::apply_diff pre-validates the delta against current length and returns LoroError::internal(..) on mismatch so doc.import surfaces the error instead of panicking. Root cause in the tracker cold-start path still needs follow-up. Regression tests for all three live in crates/loro/tests/march_2026_panics.rs with the captured production blobs in fixtures_march_2026/. Additional targeted unit tests in dag/iter.rs and oplog/loro_dag.rs cover the smallest synthetic DAG shapes that triggered each bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Make internal locks fail fast * chore: rm fixtures * chore: changeset * Fix awareness doctest lock usage * Fix loom RwLock wrapper * Update repository agent guidelines --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c8d0d60 commit 6f5b7a9

43 files changed

Lines changed: 1772 additions & 881 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.changeset/honest-cycles-mix.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
"loro-crdt-map": patch
3+
"loro-crdt": patch
4+
---
5+
6+
Fix production panic regressions

AGENTS.md

Lines changed: 66 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,79 @@
1-
# Agent Notes (Loro)
1+
# Repository Guidelines
22

3-
## Invariant: Flush Pending Events In `loro-wasm`
3+
## Project Structure & Module Organization
44

5-
In `crates/loro-wasm/src/lib.rs`, subscription callbacks (`subscribe*`, container `subscribe`, etc.)
6-
do not call user JS immediately. Instead, the binding enqueues JS calls into a global pending queue,
7-
and schedules a microtask check. If the microtask runs before `callPendingEvents()` flushes the
8-
queue, it will log:
5+
This is a Rust workspace with JS/WASM packaging around the core CRDT library.
6+
Key crates live under `crates/`: `loro` is the public Rust API, `loro-internal`
7+
contains core CRDT logic, `loro-wasm` exposes the WASM/TypeScript package, and
8+
`delta`, `rle`, `kv-store`, and `fractional_index` hold shared primitives.
9+
Integration and regression tests are mostly in `crates/loro/tests` and
10+
`crates/loro-internal/tests`; WASM tests and package files are in
11+
`crates/loro-wasm`. Examples live in `examples/` and `crates/examples`.
912

10-
- `[LORO_INTERNAL_ERROR] Event not called`
13+
## Build, Test, and Development Commands
14+
15+
- `cargo build`: build the Rust workspace.
16+
- `cargo check -p loro-internal`: quickly validate core internals.
17+
- `cargo test -p loro-internal --doc`: run Rust doctests for internal APIs.
18+
- `pnpm test`: run the main Rust test suite via nextest plus doctests.
19+
- `pnpm check`: run clippy with all features and deny warnings.
20+
- `pnpm release-wasm`: sync versions and build the release WASM package.
21+
- `pnpm test-loom`: run loom concurrency tests for `crates/loro/tests/multi_thread_test.rs`.
22+
23+
## Coding Style & Naming Conventions
24+
25+
Use standard Rust formatting with `rustfmt`; keep imports and chained calls formatted
26+
by the tool. Prefer explicit, small APIs and existing crate-local helpers over new
27+
abstractions. Rust items use `snake_case` for functions/modules and `CamelCase` for
28+
types. JS/TS bindings in `loro-wasm` should preserve the established exported API
29+
names used by tests and docs.
30+
31+
## Testing Guidelines
1132

12-
This creates a strict invariant:
33+
Add regression tests near the behavior being fixed: Rust API tests in
34+
`crates/loro/tests`, internal tests in `crates/loro-internal/tests` or module tests,
35+
and WASM behavior in `crates/loro-wasm/tests`. For import/encoding bugs, prefer
36+
fixture-based tests with small binary fixtures. Run the narrow package test first,
37+
then `pnpm test` when the change affects shared behavior. For changes touching
38+
internal diff calculation, checkout, import, or state-replay logic, also consider
39+
the fuzz targets in `crates/fuzz`; ask whether to run the broader `fuzz all`
40+
target before spending the extra time.
1341

14-
- **Any WASM-exposed API that can enqueue subscription events must flush pending events before
15-
returning control back to JS.**
42+
## Commit & Pull Request Guidelines
1643

17-
To avoid adding overhead to every single op, we only wrap (decorate) a small allowlist of
18-
methods on the JS side. The wrapper calls `callPendingEvents()` in a `finally` block.
44+
History uses short imperative commits, often prefixed by scope such as `fix:`,
45+
`test:`, `chore:`, or `refactor:`. Keep commits focused and include fixtures or
46+
tests with fixes. PRs should describe what changed, why, validation commands, and
47+
linked issues or production traces when relevant. Add a changeset when publishing
48+
behavior or package output changes.
1949

20-
### How To Maintain
50+
## Agent-Specific Notes
51+
52+
### Invariant: Flush Pending Events In `loro-wasm`
53+
54+
In `crates/loro-wasm/src/lib.rs`, subscription callbacks (`subscribe*`,
55+
container `subscribe`, etc.) do not call user JS immediately. The binding
56+
enqueues JS calls into a global pending queue and schedules a microtask check.
57+
If the microtask runs before `callPendingEvents()` flushes the queue, it logs:
58+
59+
- `[LORO_INTERNAL_ERROR] Event not called`
2160

22-
- When adding or changing a `#[wasm_bindgen]` API in `crates/loro-wasm/src/lib.rs` that can
23-
*mutate document state*:
24-
- If it can trigger an implicit commit / barrier (`commit`, `with_barrier` /
25-
`implicit_commit_then_stop`), emit events (`emit_events`), or applies diffs (e.g. `revertTo`,
26-
`applyDiff`), it typically **must** flush pending events.
27-
- Add its JS name to the allowlist in `crates/loro-wasm/index.ts` near the bottom:
28-
`decorateMethods(LoroDoc.prototype, [...])` (or the relevant prototype allowlist).
29-
- If it is a pure read/query API (no state mutation, no commit/barrier, no event emission),
30-
do **not** decorate it, to avoid unnecessary per-call cost.
61+
Any WASM-exposed API that can enqueue subscription events must flush pending
62+
events before returning control to JS. To avoid adding overhead to every op, only
63+
a small JS-side allowlist is wrapped; the wrapper calls `callPendingEvents()` in
64+
a `finally` block.
3165

32-
### Quick Check
66+
When adding or changing a `#[wasm_bindgen]` API in `crates/loro-wasm/src/lib.rs`
67+
that can mutate document state, check whether it can trigger an implicit commit
68+
or barrier (`commit`, `with_barrier`, `implicit_commit_then_stop`), emit events
69+
(`emit_events`), or apply diffs (`revertTo`, `applyDiff`). If so, add its JS
70+
name to the allowlist near the bottom of `crates/loro-wasm/index.ts`:
71+
`decorateMethods(LoroDoc.prototype, [...])` or the relevant prototype allowlist.
72+
Pure read/query APIs should not be decorated.
3373

34-
With active subscriptions (`doc.subscribe(...)` / container `subscribe(...)`), calling mutating APIs
35-
should not produce the error above. A recommended local check is:
74+
Quick check with active subscriptions (`doc.subscribe(...)` or container
75+
`subscribe(...)`): mutating APIs should not produce the error above. A useful
76+
local check is:
3677

3778
```sh
3879
pnpm -C crates/loro-wasm build-release

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)