Skip to content

Conversation

@konard
Copy link
Member

@konard konard commented Jan 21, 2026

Summary

This PR implements UTF-8 character count benchmarks comparing Links Notation (lino) against JSON, YAML, and XML formats. The benchmarks are implemented in all six supported languages and include a GitHub Actions workflow for automatic report generation.

Key Changes

  • New benchmark implementations in all 6 supported languages:

    • Rust (primary, used in CI/CD for auto-generating BENCHMARK_RESULTS.md)
    • JavaScript
    • Python
    • C#
    • Go
    • Java
  • 5 benchmark test cases covering different data structures:

    • employees - Employee records with nested structure
    • simple_doublets - Simple doublet links (2-tuples)
    • triplets - Triplet relations (3-tuples)
    • nested_structure - Deeply nested company structure
    • config - Application configuration
  • GitHub Actions workflow (benchmarks.yml) that:

    • Runs the Rust benchmark on push to main
    • Automatically commits updated BENCHMARK_RESULTS.md if results change
    • Validates all language implementations in CI

Benchmark Results

Format Total Characters vs Lino
Lino 734 -
JSON 1332 +81.5%
YAML 920 +25.3%
XML 1882 +156.4%

Average savings with Lino:

  • vs JSON: 47.9% fewer characters
  • vs YAML: 21.5% fewer characters
  • vs XML: 61.5% fewer characters

Files Changed

  • .github/workflows/benchmarks.yml - New CI workflow for benchmark automation
  • benchmarks/ - New benchmark directory with:
    • BENCHMARK_RESULTS.md - Generated results (auto-updated by CI)
    • README.md - Documentation for running benchmarks
    • data/ - Test data files in all formats
    • Language-specific benchmark implementations
  • rust/links-notation-benchmark/ - Rust benchmark crate
  • rust/Cargo.toml - Updated workspace members

Test Plan

  • All 6 benchmark implementations produce consistent results
  • Rust benchmark tests pass (cargo test -p links-notation-benchmark)
  • Existing Rust library tests still pass
  • GitHub Actions CI passes for all benchmark validations

Issue Reference

Closes #209


🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #209
@konard konard self-assigned this Jan 21, 2026
… XML

This implements UTF-8 character count benchmarks in all six supported languages:
- Rust (primary, used in CI/CD for auto-generating BENCHMARK_RESULTS.md)
- JavaScript
- Python
- C#
- Go
- Java

Features:
- Five benchmark test cases: employees, simple_doublets, triplets, nested_structure, config
- Detailed markdown report with summary and per-case results
- GitHub Actions workflow that automatically updates benchmark results on push to main
- Consistent benchmark implementation across all languages producing identical results

Results show Lino achieves on average:
- 47.9% fewer characters vs JSON
- 21.5% fewer characters vs YAML
- 61.5% fewer characters vs XML

Closes #209

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard changed the title [WIP] Add tokenization benchmarks comparing with YAML, XML, JSON Add tokenization benchmarks comparing Links Notation with JSON, YAML, XML Jan 21, 2026
Apply rustfmt formatting and fix clippy warning about redundant closure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard marked this pull request as ready for review January 21, 2026 10:38
@konard
Copy link
Member Author

konard commented Jan 21, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $8.161039 USD
  • Calculated by Anthropic: $6.654238 USD
  • Difference: $-1.506801 (-18.46%)
    📎 Log file uploaded as Gist (838KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tokenization benchmarks comparing with YAML, XML, JSON

2 participants