Merge pull request #1 from sigridjineth/fix/abi3-wheel

sigridjineth · web-flow · commit 9483c23e620a · 2025-12-14T17:46:07.000+09:00
release: abi3 wheels for py3.9+
diff --git a/.cargo/config.toml b/.cargo/config.toml
@@ -0,0 +1,6 @@
+[target.aarch64-apple-darwin]
+rustflags = ["-C", "link-arg=-undefined", "-C", "link-arg=dynamic_lookup"]
+
+[target.x86_64-apple-darwin]
+rustflags = ["-C", "link-arg=-undefined", "-C", "link-arg=dynamic_lookup"]
+
diff --git a/BENCHMARK.md b/BENCHMARK.md
@@ -10,7 +10,7 @@ This document explains what `benchmarks/bench.py` measures and **why** the suite
 
 ## How to run
 
-Because `agentjson` ships a top-level `orjson` shim, you must use **two separate environments** if you want to compare with the real `orjson` package:
+Because `agentjson` provides a top-level `orjson` drop-in module, you must use **two separate environments** if you want to compare with the real `orjson` package:
 
 ```bash
 # Env A: real orjson
@@ -47,6 +47,19 @@ python benchmarks/bench.py
 - **PR‑101 (parallel delimiter indexer)**: use `large_root_array_suite` and increase `BENCH_LARGE_MB` (e.g. `200,1000`) to find the crossover where parallel indexing starts paying off.
 - **PR‑102 (nested huge value / corpus)**: use `nested_corpus_suite` to benchmark `scale_target_keys=["corpus"]` with `allow_parallel` on/off.
 
+### Example: CLI mmap suite (PR‑006)
+
+On macOS this suite records **wall time** (the `/usr/bin/time -v` max-RSS path is Linux-friendly).
+
+Example run (Env B, `BENCH_CLI_MMAP_MB=256`, 2025-12-14):
+
+| Mode | Elapsed |
+|---|---:|
+| `mmap(default)` | 1.27 s |
+| `read(--no-mmap)` | 1.38 s |
+
+Interpretation: mmap’s main win is avoiding **upfront heap allocation / extra copies** on huge files; it may or may not be faster depending on OS page cache and IO patterns.
+
 ## Suite 1 — LLM messy JSON suite (primary)
 
 ### What it tests
@@ -93,24 +106,24 @@ With `agentjson` as an `orjson` drop-in (same call site):
 
 ```python
 import os
-import orjson  # agentjson shim
+import orjson  # provided by agentjson (drop-in)
 
 os.environ["JSONPROB_ORJSON_MODE"] = "auto"
 orjson.loads('preface```json\n{"a":1}\n```suffix')   # -> {"a": 1}
 ```
 
 This is the core PR pitch: **don’t change code**, just switch the package and flip a mode when needed.
 
-### Example results (2025-12-13, Python 3.12.0, macOS 14.1 arm64)
+### Example results (2025-12-14, Python 3.12.0, macOS 14.1 arm64)
 
 | Library / mode | Success | Correct | Best time / case |
 |---|---:|---:|---:|
 | `json` (strict) | 0/10 | 0/10 | n/a |
 | `ujson` (strict) | 0/10 | 0/10 | n/a |
 | `orjson` (strict, real) | 0/10 | 0/10 | n/a |
-| `orjson` (auto, agentjson shim) | 10/10 | 10/10 | 45.9 µs |
-| `agentjson.parse(mode=auto)` | 10/10 | 10/10 | 39.8 µs |
-| `agentjson.parse(mode=probabilistic)` | 10/10 | 10/10 | 39.7 µs |
+| `agentjson` (drop-in `orjson.loads`, mode=auto) | 10/10 | 10/10 | 23.5 µs |
+| `agentjson.parse(mode=auto)` | 10/10 | 10/10 | 19.5 µs |
+| `agentjson.parse(mode=probabilistic)` | 10/10 | 10/10 | 19.5 µs |
 
 ## Suite 2 — Top‑K repair suite (secondary)
 
@@ -145,15 +158,15 @@ In the benchmark run, this case shows up exactly as:
 - **Top‑1 hit** misses (not the expected value),
 - but **Top‑K hit (K=5)** succeeds (the expected value is present in the candidate list).
 
-### Example results (2025-12-13, Python 3.12.0, macOS 14.1 arm64)
+### Example results (2025-12-14, Python 3.12.0, macOS 14.1 arm64)
 
 | Metric | Value |
 |---|---:|
 | Top‑1 hit rate | 7/8 |
 | Top‑K hit rate (K=5) | 8/8 |
 | Avg candidates returned | 1.25 |
 | Avg best confidence | 0.57 |
-| Best time / case | 92.7 µs |
+| Best time / case | 38.2 µs |
 
 ## Suite 3 — Large root-array parsing (big data angle)
 
@@ -169,15 +182,15 @@ and measures how long `loads(...)` takes for sizes like 5MB and 20MB.
 
 For comparing `json/ujson/orjson`, use **Env A (real orjson)**. In Env B, `import orjson` is the shim.
 
-### Example results (Env A: real `orjson`, 2025-12-13)
+### Example results (Env A: real `orjson`, 2025-12-14)
 
 | Library | 5 MB | 20 MB |
 |---|---:|---:|
-| `json.loads(str)` | 52.3 ms | 209.6 ms |
-| `ujson.loads(str)` | 42.2 ms | 176.1 ms |
-| `orjson.loads(bytes)` (real) | 24.6 ms | 115.9 ms |
+| `json.loads(str)` | 53.8 ms | 217.2 ms |
+| `ujson.loads(str)` | 45.9 ms | 173.7 ms |
+| `orjson.loads(bytes)` (real) | 27.0 ms | 116.2 ms |
 
-`benchmarks/bench.py` also measures `agentjson.scale(serial|parallel)` (Env B). On 5–20MB inputs the parallel path is slower due to overhead; it’s intended for much larger payloads (GB‑scale root arrays).
+`benchmarks/bench.py` also measures `agentjson.scale(serial|parallel)` (Env B). On 5–20MB inputs the crossover depends on your machine; it’s intended for much larger payloads (GB‑scale root arrays).
 
 ## Suite 3b — Nested `corpus` suite (targeted huge value)
 
@@ -202,4 +215,3 @@ Important nuance:
 
 - This suite uses **DOM** mode (`scale_output="dom"`) so `split_mode` shows whether nested targeting triggered (see `rust/src/scale.rs::try_nested_target_split`).
 - Wiring nested targeting into **tape** mode (`scale_output="tape"`) is the next-step work for true “huge nested value without DOM” workloads.
-
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "agentjson"
-version = "0.1.1"
+version = "0.1.2"
 edition = "2021"
 license = "MIT OR Apache-2.0"
 description = "Probabilistic JSON repair library powered by Rust"
@@ -14,5 +14,5 @@ crate-type = ["cdylib"]
 path = "rust-pyo3/src/lib.rs"
 
 [dependencies]
-pyo3 = { version = "0.23", features = ["extension-module"] }
+pyo3 = { version = "0.23", features = ["extension-module", "abi3-py39"] }
 json_prob_parser = { package = "agentjson-core", path = "rust" }
diff --git a/README.md b/README.md
@@ -56,6 +56,8 @@ uv add agentjson
 # or: python -m pip install agentjson
 ```
 
+Note: `agentjson` ships **abi3** wheels (Python **3.9+**) so the same wheel works across CPython versions (e.g. 3.11, 3.12).
+
 ### Build from source (development)
 
 #### 1) Install Rust toolchain
@@ -221,9 +223,9 @@ cargo build --release
 - Disable mmap: `--no-mmap`
 - Reproducible beam ordering: `--deterministic-seed 42`
 
-## orjson Drop-in Shim
+## orjson Drop-in
 
-Most LLM/agent stacks already call `orjson.loads()` everywhere. `agentjson` bundles an `orjson`-compatible shim so you can keep those call sites unchanged and still recover from “near‑JSON” outputs:
+Most LLM/agent stacks already call `orjson.loads()` everywhere. `agentjson` provides an `orjson`-compatible drop-in module so you can keep those call sites unchanged and still recover from “near‑JSON” outputs:
 
 ```python
 import orjson
@@ -232,6 +234,12 @@ data = orjson.loads(b'{"a": 1}')
 blob = orjson.dumps({"a": 1})
 ```
 
+If you prefer to be explicit (or want to avoid `orjson` name conflicts), you can also do:
+
+```python
+import agentjson as orjson
+```
+
 By default the shim is strict (like real `orjson`). To enable repair/scale fallback without changing call sites:
 
 ```bash
@@ -255,9 +263,9 @@ This suite reflects the context: LLM outputs like “json입니다~ …”, mark
 | `json` (strict) | 0/10 | 0/10 | n/a |
 | `ujson` (strict) | 0/10 | 0/10 | n/a |
 | `orjson` (strict, real) | 0/10 | 0/10 | n/a |
-| `orjson` (auto, agentjson shim) | 10/10 | 10/10 | 45.9 µs |
-| `agentjson.parse(mode=auto)` | 10/10 | 10/10 | 39.8 µs |
-| `agentjson.parse(mode=probabilistic)` | 10/10 | 10/10 | 39.7 µs |
+| `agentjson` (drop-in `orjson.loads`, mode=auto) | 10/10 | 10/10 | 23.5 µs |
+| `agentjson.parse(mode=auto)` | 10/10 | 10/10 | 19.5 µs |
+| `agentjson.parse(mode=probabilistic)` | 10/10 | 10/10 | 19.5 µs |
 
 Key point: **drop-in call sites** (`import orjson; orjson.loads(...)`) can go from *0% success* → *100% success* just by setting `JSONPROB_ORJSON_MODE=auto`.
 
@@ -271,19 +279,19 @@ This suite checks whether the “intended” JSON object is recovered as the **b
 | Top‑K hit rate (K=5) | 8/8 |
 | Avg candidates returned | 1.25 |
 | Avg best confidence | 0.57 |
-| Best time / case | 92.7 µs |
+| Best time / case | 38.2 µs |
 
 ### 3) Large root-array parsing (big data angle)
 
 Valid JSON only (parsing a single large root array).
 
 | Library | 5 MB | 20 MB |
 |---|---:|---:|
-| `json.loads(str)` | 52.3 ms | 209.6 ms |
-| `ujson.loads(str)` | 42.2 ms | 176.1 ms |
-| `orjson.loads(bytes)` (real) | 24.6 ms | 115.9 ms |
+| `json.loads(str)` | 53.8 ms | 217.2 ms |
+| `ujson.loads(str)` | 45.9 ms | 173.7 ms |
+| `orjson.loads(bytes)` (real) | 27.0 ms | 116.2 ms |
 
-`agentjson` also benchmarks `agentjson.scale(serial|parallel)` in the same script. On 5–20MB inputs the parallel path is slower due to overhead; it’s intended for much larger payloads (GB‑scale root arrays).
+`agentjson` also benchmarks `agentjson.scale(serial|parallel)` in the same script. On 5–20MB inputs the crossover depends on your machine: on this run the parallel path is slower at 5MB and slightly faster at 20MB; it’s intended for much larger payloads (GB‑scale root arrays).
 
 ### 3b) Nested `corpus` split (targeted huge value)
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "maturin"
 
 [project]
 name = "agentjson"
-version = "0.1.1"
+version = "0.1.2"
 description = "Probabilistic JSON repair library powered by Rust - fixes broken JSON from LLMs"
 readme = "README.md"
 requires-python = ">=3.9"
diff --git a/rust-pyo3/src/lib.rs b/rust-pyo3/src/lib.rs
@@ -1,7 +1,7 @@
 use pyo3::prelude::*;
-use pyo3::buffer::PyBuffer;
-use pyo3::types::{PyByteArray, PyBytes, PyDict, PyList};
+use pyo3::types::{PyBytes, PyDict, PyList};
 use pyo3::IntoPyObjectExt;
+use std::borrow::Cow;
 
 use json_prob_parser::beam;
 use json_prob_parser::json::JsonValue;
@@ -36,57 +36,23 @@ fn json_to_py(py: Python<'_>, v: &JsonValue) -> PyObject {
 
 #[pyfunction]
 fn strict_loads_py(py: Python<'_>, input: &Bound<'_, PyAny>) -> PyResult<PyObject> {
-    let parsed = if let Ok(s) = input.extract::<&str>() {
-        strict::strict_parse(s)
-            .map_err(|e| pyo3::exceptions::PyValueError::new_err((e.message, e.pos)))?
-    } else if let Ok(b) = input.downcast::<PyBytes>() {
-        let s = std::str::from_utf8(b.as_bytes()).map_err(|_| {
+    let parsed = if let Ok(s) = input.extract::<Cow<str>>() {
+        strict::strict_parse(s.as_ref())
+    } else if let Ok(b) = input.extract::<Cow<[u8]>>() {
+        let s = std::str::from_utf8(b.as_ref()).map_err(|_| {
             pyo3::exceptions::PyValueError::new_err((
                 "str is not valid UTF-8: surrogates not allowed".to_string(),
                 0_usize,
             ))
         })?;
         strict::strict_parse(s)
-            .map_err(|e| pyo3::exceptions::PyValueError::new_err((e.message, e.pos)))?
-    } else if let Ok(ba) = input.downcast::<PyByteArray>() {
-        let parsed = {
-            // SAFETY: We do not call back into Python while using this slice.
-            let bytes = unsafe { ba.as_bytes() };
-            let s = std::str::from_utf8(bytes).map_err(|_| {
-                pyo3::exceptions::PyValueError::new_err((
-                    "str is not valid UTF-8: surrogates not allowed".to_string(),
-                    0_usize,
-                ))
-            })?;
-            strict::strict_parse(s)
-        };
-        parsed.map_err(|e| pyo3::exceptions::PyValueError::new_err((e.message, e.pos)))?
-    } else if let Ok(buf) = PyBuffer::<u8>::get(input) {
-        let parsed = {
-            let cells = buf.as_slice(py).ok_or_else(|| {
-                pyo3::exceptions::PyValueError::new_err((
-                    "input buffer must be C-contiguous".to_string(),
-                    0_usize,
-                ))
-            })?;
-
-            // ReadOnlyCell<u8> is repr(transparent) over UnsafeCell<u8>, so this is safe.
-            let bytes = unsafe { std::slice::from_raw_parts(cells.as_ptr() as *const u8, cells.len()) };
-            let s = std::str::from_utf8(bytes).map_err(|_| {
-                pyo3::exceptions::PyValueError::new_err((
-                    "str is not valid UTF-8: surrogates not allowed".to_string(),
-                    0_usize,
-                ))
-            })?;
-            strict::strict_parse(s)
-        };
-        parsed.map_err(|e| pyo3::exceptions::PyValueError::new_err((e.message, e.pos)))?
     } else {
         return Err(pyo3::exceptions::PyValueError::new_err((
             "input must be bytes, bytearray, memoryview, or str".to_string(),
             0_usize,
         )));
-    };
+    }
+    .map_err(|e| pyo3::exceptions::PyValueError::new_err((e.message, e.pos)))?;
 
     Ok(json_to_py(py, &parsed))
 }
diff --git a/rust/Cargo.lock b/rust/Cargo.lock
diff --git a/src/agentjson/__init__.py b/src/agentjson/__init__.py