Skip to content

ENH: Add TOON (Token Object Oriented Notation) I/O Support #63153

@raselmeya94

Description

@raselmeya94

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use pandas to read and write TOON files directly into DataFrames. TOON (Token-Oriented Object Notation) is a compact, human-readable serialization format designed for LLM input, reducing token usage by 30–60% compared to JSON. Current pandas I/O (JSON, CSV, etc.) is verbose and token-expensive for LLM contexts.

Motivation:
Currently, pandas supports CSV, JSON, Excel, and other formats, but there is no native support for token-efficient, nested object data. TOON enables:

  • Flat and nested structures in a single DataFrame.
  • Inline nested tokens (student.name), preserving hierarchy.
  • Efficient serialization/deserialization of token-based datasets.
  • Compatibility with standard pandas workflows for reading, writing, and manipulating DataFrames.

Feature Description

Add read_toon and to_toon methods to pandas:

# Read a TOON file/string
df = pd.read_toon("data.toon", flatten_nested=True, nested_depth=1, format_type="tabular_array")

# Convert a DataFrame to TOON
toon_str = df.to_toon(
    table_name="users",
    delimiter="|",
    format_type="auto",
    indent=2
)

Key Parameters:

  • encoding, compression, storage_options
  • table_name, delimiter, length_marker, format_type, indent, nested_depth

Features:

  • Directly read/write TOON to/from pandas DataFrames
  • Supports flat and nested structures, including arrays and dictionaries
  • Compact, human-readable, token-efficient
  • YAML-like indentation + CSV-style tabular arrays for uniform objects
  • Round-trip serialization: df.to_toon()pd.read_toon() preserves structure

Alternative Solutions

Currently, users can rely on external packages like toon-python for TOON I/O. A typical workflow involves multiple steps:

  1. TOON → JSON conversion using decode from toon-python:
from toon_format import decode

with open("data.toon", "r") as f:
    toon_data = f.read()

json_data = decode(toon_data , options={"indent": 2, "strict": True})
  1. JSON → pandas DataFrame using pd.read_json():
import pandas as pd

df = pd.read_json(json_data)
  1. Optional flattening for nested structures using pd.json_normalize():
from pandas import json_normalize

df_flat = json_normalize(json_data)
  1. DataFrame → JSON → TOON using encode from toon-python:
from toon_format import encode

json_data = df.to_json(orient="records")
toon_str = encode(json_data,  options={"delimiter": "\t", "indent": 4, "lengthMarker": "#"})

with open("output.toon", "w") as f:
    f.write(toon_str)

Problems with this approach:

  • Multiple conversion steps (TOON → JSON → DataFrame → JSON → TOON) make the workflow verbose and error-prone.
  • Time-consuming; not a one-click, actionable operation.
  • Integration with pandas I/O conventions (like .to_csv(), .to_json()) is not seamless.
  • Round-trip serialization is not guaranteed without careful handling.

Advantage of native pandas TOON support:

  • Directly read/write TOON to/from DataFrames in one step.
  • Preserves nested arrays, objects, and inline fields.
  • Fully compatible with pandas API and workflows.
  • Maintains token-efficient encoding for LLM contexts.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions