ENH: Add TOON (Token Object Oriented Notation) I/O Support

### Feature Type

- [x] Adding new functionality to pandas

- [ ] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I wish I could use pandas to read and write TOON files directly into DataFrames. TOON (Token-Oriented Object Notation) is a **compact, human-readable serialization format** designed for LLM input, reducing token usage by 30–60% compared to JSON. Current pandas I/O (JSON, CSV, etc.) is verbose and token-expensive for LLM contexts.

**Motivation:**
Currently, pandas supports CSV, JSON, Excel, and other formats, but there is no **native support for token-efficient, nested object data**. TOON enables:

* Flat and nested structures in a single DataFrame.
* Inline nested tokens (`student.name`), preserving hierarchy.
* Efficient serialization/deserialization of token-based datasets.
* Compatibility with standard pandas workflows for reading, writing, and manipulating DataFrames.


### Feature Description


Add `read_toon` and `to_toon` methods to pandas:

```python
# Read a TOON file/string
df = pd.read_toon("data.toon", flatten_nested=True, nested_depth=1, format_type="tabular_array")

# Convert a DataFrame to TOON
toon_str = df.to_toon(
    table_name="users",
    delimiter="|",
    format_type="auto",
    indent=2
)
```

**Key Parameters:**

* `encoding`, `compression`, `storage_options`
* `table_name`, `delimiter`, `length_marker`, `format_type`, `indent`, `nested_depth`

**Features:**

* Directly read/write TOON to/from pandas DataFrames
* Supports flat and nested structures, including arrays and dictionaries
* Compact, human-readable, token-efficient
* YAML-like indentation + CSV-style tabular arrays for uniform objects
* Round-trip serialization: `df.to_toon()` → `pd.read_toon()` preserves structure

### Alternative Solutions


Currently, users can rely on external packages like [`toon-python`](https://pypi.org/project/toon-python/) for TOON I/O. A typical workflow involves multiple steps:

1. **TOON → JSON conversion using `decode` from `toon-python`:**

```python
from toon_format import decode

with open("data.toon", "r") as f:
    toon_data = f.read()

json_data = decode(toon_data , options={"indent": 2, "strict": True})
```

2. **JSON → pandas DataFrame using `pd.read_json()`:**

```python
import pandas as pd

df = pd.read_json(json_data)
```

3. **Optional flattening for nested structures using `pd.json_normalize()`:**

```python
from pandas import json_normalize

df_flat = json_normalize(json_data)
```

4. **DataFrame → JSON → TOON using `encode` from `toon-python`:**

```python
from toon_format import encode

json_data = df.to_json(orient="records")
toon_str = encode(json_data,  options={"delimiter": "\t", "indent": 4, "lengthMarker": "#"})

with open("output.toon", "w") as f:
    f.write(toon_str)
```

---

**Problems with this approach:**

* Multiple conversion steps (TOON → JSON → DataFrame → JSON → TOON) make the workflow **verbose and error-prone**.
* **Time-consuming**; not a one-click, actionable operation.
* Integration with pandas I/O conventions (like `.to_csv()`, `.to_json()`) is **not seamless**.
* Round-trip serialization is **not guaranteed** without careful handling.

---

**Advantage of native pandas TOON support:**

* Directly read/write TOON to/from DataFrames in **one step**.
* Preserves **nested arrays, objects, and inline fields**.
* Fully compatible with pandas **API and workflows**.
* Maintains **token-efficient encoding** for LLM contexts.



### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add TOON (Token Object Oriented Notation) I/O Support #63153

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Add TOON (Token Object Oriented Notation) I/O Support #63153

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions