-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use pandas to read and write TOON files directly into DataFrames. TOON (Token-Oriented Object Notation) is a compact, human-readable serialization format designed for LLM input, reducing token usage by 30–60% compared to JSON. Current pandas I/O (JSON, CSV, etc.) is verbose and token-expensive for LLM contexts.
Motivation:
Currently, pandas supports CSV, JSON, Excel, and other formats, but there is no native support for token-efficient, nested object data. TOON enables:
- Flat and nested structures in a single DataFrame.
- Inline nested tokens (
student.name), preserving hierarchy. - Efficient serialization/deserialization of token-based datasets.
- Compatibility with standard pandas workflows for reading, writing, and manipulating DataFrames.
Feature Description
Add read_toon and to_toon methods to pandas:
# Read a TOON file/string
df = pd.read_toon("data.toon", flatten_nested=True, nested_depth=1, format_type="tabular_array")
# Convert a DataFrame to TOON
toon_str = df.to_toon(
table_name="users",
delimiter="|",
format_type="auto",
indent=2
)Key Parameters:
encoding,compression,storage_optionstable_name,delimiter,length_marker,format_type,indent,nested_depth
Features:
- Directly read/write TOON to/from pandas DataFrames
- Supports flat and nested structures, including arrays and dictionaries
- Compact, human-readable, token-efficient
- YAML-like indentation + CSV-style tabular arrays for uniform objects
- Round-trip serialization:
df.to_toon()→pd.read_toon()preserves structure
Alternative Solutions
Currently, users can rely on external packages like toon-python for TOON I/O. A typical workflow involves multiple steps:
- TOON → JSON conversion using
decodefromtoon-python:
from toon_format import decode
with open("data.toon", "r") as f:
toon_data = f.read()
json_data = decode(toon_data , options={"indent": 2, "strict": True})- JSON → pandas DataFrame using
pd.read_json():
import pandas as pd
df = pd.read_json(json_data)- Optional flattening for nested structures using
pd.json_normalize():
from pandas import json_normalize
df_flat = json_normalize(json_data)- DataFrame → JSON → TOON using
encodefromtoon-python:
from toon_format import encode
json_data = df.to_json(orient="records")
toon_str = encode(json_data, options={"delimiter": "\t", "indent": 4, "lengthMarker": "#"})
with open("output.toon", "w") as f:
f.write(toon_str)Problems with this approach:
- Multiple conversion steps (TOON → JSON → DataFrame → JSON → TOON) make the workflow verbose and error-prone.
- Time-consuming; not a one-click, actionable operation.
- Integration with pandas I/O conventions (like
.to_csv(),.to_json()) is not seamless. - Round-trip serialization is not guaranteed without careful handling.
Advantage of native pandas TOON support:
- Directly read/write TOON to/from DataFrames in one step.
- Preserves nested arrays, objects, and inline fields.
- Fully compatible with pandas API and workflows.
- Maintains token-efficient encoding for LLM contexts.
Additional Context
No response