Use UTF-8 in item metadata and JSON serialization by ryvnf · Pull Request #17010 · luanti-org/luanti

ryvnf · 2026-03-09T22:27:55Z

Goal of the PR

Reduce size of itemstrings that contain non-ascii unicode characters. On master for example "☺" is encoded as "\u00e2\u0098\u00ba" in itemstrings, which is 6 times larger.

How does the PR work

Makes core.write_json emit UTF-8 using the emitUTF8 setting
Removes old code that transformed non-ascii characters into \u00XX for item and node metadata
core.serialize already preserves UTF-8 encoded text so no change was necessary there
ASCII control characters are still escaped in metadata using \u00XX so metadata separators like \x01 \x02 can be nested

Does it resolve any reported issue?

Implements what was suggested in #17007

Does this relate to a goal in the roadmap?

Probably not.

If not a bug fix, why is this PR needed? What usecases does it solve?

Makes it more difficult for users to spam unicode characters in item metadata like text for books to cause lag and disrupt servers.
Makes the storage size of metadata truncated with string.len consistent. For example, in Minetest Game the amount of text a user can enter into a book is limited to 10kB using string.len. If a user decides to spam the book with ☺ they can create a book itemstring which contains 60kB instead of 10kB as you would expect.
Generally reduces amount of data needed to be sent over network

If you have used an LLM/AI to help with code or assets, you must disclose this.

I have not.

Todo

Fix failing unit test
Verify itemstrings with metadata do not get corrupted when nesting (tested using Mineclonia enchanted items inside shulker boxes)

How to test

Do the following in devtest

Fetch the "Item Meta Editor" from the bag of everything
Use it with an item placed next to it, add metadata with key and value containing unicode characters like ☺♥★
Verify it works
Use the new dump_itemstring command and verify it contains "☺" instead of "\u00e2\u0098\u00ba"
Use "Node Meta Editor" on a node and store unicode characters
Quit the game and restart and verify that all data is still there

There may be other things that need testing that I have not considered.

src/unittest/test_serialization.cpp

ryvnf · 2026-03-10T19:15:45Z

I also added a dump_itemstring command to devtest. Not 100% sure if it should be there or not in master. Added it since without it you cannot check how the string is encoded. It could be removed after testing but before merging if that is preferable.

ryvnf · 2026-03-10T21:09:57Z

The test that failed did so by a 500 Internal Server Error to github.com. That is unrelated to the changes. I cannot restart it to fix it.

Edit: appears to have fixed itself

sfan5

LGTM

games/devtest/mods/util_commands/init.lua

SmallJoker

Works, tested with #12167 (comment) using the sample string:

local str = "baum\xe4\x01\xF5\xA3birne"

and libjsoncpp26, version 1.9.6-3

Result

it's the same
str:        	98	97	117	109	228	1	245	163	98	105	114	110	101
str_loaded: 	98	97	117	109	228	1	245	163	98	105	114	110	101

Works.

ryvnf force-pushed the serialize-with-utf8 branch from 310a693 to 098db59 Compare March 9, 2026 22:33

ryvnf changed the title ~~Use utf8 in metadata and JSON serialization~~ Use UTF-8 in metadata and JSON serialization Mar 9, 2026

sfan5 added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Mar 9, 2026

ryvnf changed the title ~~Use UTF-8 in metadata and JSON serialization~~ WIP: Use UTF-8 in metadata and JSON serialization Mar 9, 2026

ryvnf marked this pull request as draft March 9, 2026 22:51

Zughy added Feature ✨ PRs that add or enhance a feature Roadmap: Needs approval The change is not part of the current roadmap and needs to be approved by coredevs beforehand @ Script API labels Mar 10, 2026

ryvnf force-pushed the serialize-with-utf8 branch from 098db59 to 2bbf30d Compare March 10, 2026 18:21

sfan5 reviewed Mar 10, 2026

View reviewed changes

src/unittest/test_serialization.cpp Outdated Show resolved Hide resolved

ryvnf added 2 commits March 10, 2026 19:52

Serialize metadata as UTF-8

fe3ab01

Make JSON serialization emit UTF-8

803da3c

ryvnf force-pushed the serialize-with-utf8 branch from 2bbf30d to 25cfddc Compare March 10, 2026 18:52

ryvnf marked this pull request as ready for review March 10, 2026 19:30

ryvnf changed the title ~~WIP: Use UTF-8 in metadata and JSON serialization~~ Use UTF-8 in metadata and JSON serialization Mar 10, 2026

sfan5 approved these changes Mar 11, 2026

View reviewed changes

games/devtest/mods/util_commands/init.lua Outdated Show resolved Hide resolved

sfan5 added the One approval ✅ ◻️ label Mar 11, 2026

ryvnf force-pushed the serialize-with-utf8 branch from 25cfddc to 803da3c Compare March 11, 2026 16:32

sfan5 changed the title ~~Use UTF-8 in metadata and JSON serialization~~ Use UTF-8 in item metadata and JSON serialization Mar 11, 2026

SmallJoker approved these changes Mar 15, 2026

View reviewed changes

SmallJoker added >= Two approvals ✅ ✅ and removed One approval ✅ ◻️ labels Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use UTF-8 in item metadata and JSON serialization#17010

Use UTF-8 in item metadata and JSON serialization#17010
ryvnf wants to merge 2 commits intoluanti-org:masterfrom
ryvnf:serialize-with-utf8

ryvnf commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

ryvnf commented Mar 10, 2026 •

edited

Loading

Uh oh!

ryvnf commented Mar 10, 2026 •

edited

Loading

Uh oh!

sfan5 left a comment

Uh oh!

Uh oh!

SmallJoker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ryvnf commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal of the PR

How does the PR work

Does it resolve any reported issue?

Does this relate to a goal in the roadmap?

If not a bug fix, why is this PR needed? What usecases does it solve?

If you have used an LLM/AI to help with code or assets, you must disclose this.

Todo

How to test

Uh oh!

Uh oh!

ryvnf commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryvnf commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfan5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SmallJoker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ryvnf commented Mar 9, 2026 •

edited

Loading

ryvnf commented Mar 10, 2026 •

edited

Loading

ryvnf commented Mar 10, 2026 •

edited

Loading