-
Notifications
You must be signed in to change notification settings - Fork 76
Description
https://github.com/toon-format/toon:
Token-Oriented Object Notation is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input as a lossless, drop-in representation of JSON data.
TOON's sweet spot is uniform arrays of objects – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts. For deeply nested or non-uniform data, JSON may be more efficient.
TOON achieves CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Sounds like it could be a nice addition, even if we only use it as a proof-of-concept/internally for now.
Especially in notebooks TOON could be useful to provide context to the AI assistant/Junie without using many tokens.
At the moment JToon can only encode to TOON (and it uses JSON as intermediary format which seems less eficient maybe), but toon4s can do both.
but it may be possible and quicker to write a converter ourselves. It's pretty close to dataframe-json but we wouldn't have to create a Map<String, Value> for each row in the dataframe