Skip to content

Commit 8ce7b19

Browse files
zxqfd555Manul from Pathway
authored andcommitted
correctly format unicode contents of pw.Json columns in pw.debug.compute_and_print (#9617)
GitOrigin-RevId: 45660af95d117c35687517ae13d6c293be207d50
1 parent 6bed99d commit 8ce7b19

File tree

3 files changed

+34
-2
lines changed

3 files changed

+34
-2
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
1111
- It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
1212
- Connector groups can now be used in the multiprocess runs.
1313

14+
### Changed
15+
- **BREAKING**: The `__str__` and `dumps` methods in `pw.Json` no longer enforce the result to be an ASCII string. This way, the behavior of `pw.debug.compute_and_print` is now consistent with other output connectors.
16+
1417
## [0.27.1] - 2025-12-08
1518

1619
### Added

python/pathway/internals/json.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ class Json:
5252
_value: JsonValue
5353

5454
def __str__(self) -> str:
55-
return _json.dumps(self.value)
55+
return _json.dumps(self.value, ensure_ascii=False)
5656

5757
def __float__(self) -> float:
5858
return float(self.value) # type:ignore[arg-type]
@@ -96,7 +96,7 @@ def parse(value: str | bytes | bytearray) -> Json:
9696

9797
@staticmethod
9898
def dumps(obj: Any) -> str:
99-
return _json.dumps(obj, cls=_JsonEncoder)
99+
return _json.dumps(obj, cls=_JsonEncoder, ensure_ascii=False)
100100

101101
def as_int(self) -> int:
102102
"""Returns Json value as an int if possible.

python/pathway/tests/test_json.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from __future__ import annotations
44

55
import datetime
6+
import json
67
import pathlib
78
import re
89
from typing import Any, Optional
@@ -1308,3 +1309,31 @@ class Schema(pw.Schema):
13081309
match=".*can only be applied to JSON columns, but column has type <class 'int'>.",
13091310
):
13101311
input.select(result=method(pw.this.data))
1312+
1313+
1314+
def test_json_serialize_parse():
1315+
base_unicode_value = "żółć"
1316+
base = {"data": base_unicode_value}
1317+
1318+
initial_json = pw.Json(base)
1319+
serialized = pw.Json.dumps(initial_json)
1320+
roundtrip_json = pw.Json.parse(serialized)
1321+
assert roundtrip_json["data"] == pw.Json(base_unicode_value)
1322+
1323+
serialized_ensure_ascii = json.dumps(base, ensure_ascii=True)
1324+
roundtrip_json_old_format = pw.Json.parse(serialized_ensure_ascii)
1325+
assert roundtrip_json_old_format["data"] == pw.Json(base_unicode_value)
1326+
1327+
1328+
def test_json_doesnt_enforce_ascii(capsys):
1329+
class InputSchema(pw.Schema):
1330+
data: pw.Json
1331+
1332+
rows = [
1333+
({"data": "szczęśliwość"},),
1334+
]
1335+
table = pw.debug.table_from_rows(schema=InputSchema, rows=rows)
1336+
pw.debug.compute_and_print(table, include_id=False)
1337+
1338+
captured = capsys.readouterr()
1339+
assert '{"data": "szczęśliwość"}' in captured.out

0 commit comments

Comments
 (0)