morph — Product Requirements Document

Vision

morph is a universal CLI data format converter with a built-in mapping language. It converts structured data between formats while giving users precise control over how fields are renamed, reshaped, filtered, and transformed — through a small, readable, purpose-built language.

The goal: one tool to replace the dozen format-specific converters people cobble together with scripts.

Problem Statement

Developers, ops engineers, and data workers constantly convert between data formats:

Config migration (YAML → TOML, JSON → YAML)
Data pipeline glue (CSV → JSON, JSON Lines → CSV)
API response reshaping (deeply nested JSON → flat CSV)
Legacy format interop (XML → JSON, INI-like → TOML)

Current solutions fall into three buckets:

One-off scripts — Python/Ruby/Node snippets. Work, but fragile, slow to write, hard to share.
jq / yq / xq family — Powerful but format-specific, steep learning curves, inconsistent syntax across tools.
Online converters — Can't handle sensitive data, no automation, no customization.

None of them offer a single tool that reads multiple formats, writes multiple formats, and provides a user-friendly transformation language in between.

Target Users

Primary

Backend / DevOps engineers — config file migration, CI/CD pipelines, infrastructure-as-code
Data engineers — quick format conversions in ETL scripts, data cleaning
CLI power users — people who live in the terminal and chain tools with pipes

Secondary

Technical writers — converting example data between formats for docs
Security researchers — parsing/reshaping structured data from various tools
Students / learners — understanding data format differences

Core Features

1. Multi-Format I/O

Read and write all supported formats through a single unified interface.

Supported formats (v1.0):

Format	Read	Write	Priority	Notes
JSON	✅	✅	P0	Streaming, pretty/compact output
YAML	✅	✅	P0	Multi-document, anchors/aliases
TOML	✅	✅	P0	Full spec compliance
CSV / TSV	✅	✅	P0	Headers, custom delimiters, quoting
XML	✅	✅	P1	Configurable attribute handling
JSON Lines	✅	✅	P0	Newline-delimited JSON
MessagePack	✅	✅	P1	Binary format
S-expressions	✅	✅	P2	Lisp-style
Query String	✅	✅	P2	URL-encoded
EDN	✅	✅	P2	Clojure data notation

Format detection:

By file extension (.json, .yaml, .yml, .toml, .csv, .xml, etc.)
By explicit flag (-f json, -t yaml)
Stdin requires explicit format flag

2. Zero-Config Simple Conversion

The simplest use case should require zero learning:

morph -i input.json -o output.yaml
# or
cat data.csv | morph -f csv -t json
# or
morph -i data.json -t yaml  # outputs to stdout

No mapping file, no config, no flags beyond format specification.

3. The Mapping Language

A purpose-built DSL for describing data transformations. Design goals:

Readable at first sight — someone unfamiliar should understand what a .morph file does
English-like keywords — rename, select, drop, where, set, cast
Minimal syntax — no brackets/parens unless necessary, arrow (->) for directionality
Composable — operations apply in order, top to bottom
No Turing-completeness — this is a data mapping language, not a programming language

3.1 Core Operations

rename <path> -> <path>        # rename a field
select <path>, <path>, ...     # keep only these fields
drop <path>, <path>, ...       # remove these fields
flatten <path> -> <paths...>   # unnest object into flat fields
nest <paths...> -> <path>      # group flat fields into object
set <path> = <expression>      # create/overwrite field
default <path> = <expression>  # set only if field is null/missing
cast <path> as <type>          # type coercion (int, float, bool, string)
where <condition>              # filter rows/elements
sort <path> [asc|desc]         # sort elements
each <path> { ... }            # iterate over array elements
when <condition> { ... }       # conditional block

3.2 Path Syntax

.field                   # top-level field
.parent.child            # nested field
.array[0]                # array index
.array[*]                # all array elements
.array[-1]               # last element
.["key with spaces"]     # quoted key

3.3 Expressions

# Literals
"hello", 42, 3.14, true, false, null

# Field references
.field_name

# Function calls
join(.first, " ", .last)
lower(.name)
len(.items)

# Arithmetic
.price * .quantity
.total - .discount

# String interpolation
"{.first} {.last}"

# Comparison
.age > 18
.status == "active"
.name != null

# Logical
.age > 18 and .status == "active"
.role == "admin" or .role == "superadmin"
not .deleted

3.4 Built-in Functions (v1.0)

String: join, split, lower, upper, trim, replace, starts_with, ends_with, contains, regex_match, regex_replace, substring, pad_left, pad_right

Math: round, ceil, floor, abs, min, max, sum

Collection: len, keys, values, flatten, unique, reverse, first, last, count, group_by

Type: type_of, is_null, is_array, is_object, is_string, is_number

Date/Time: now, parse_date, format_date

System: env (read environment variable)

Utility: coalesce (first non-null), if (ternary)

4. CLI Interface

morph [OPTIONS]

INPUT/OUTPUT:
  -i, --input <FILE>        Input file (or stdin if omitted)
  -o, --output <FILE>       Output file (or stdout if omitted)
  -f, --from <FORMAT>       Input format (auto-detected from extension)
  -t, --to <FORMAT>         Output format (auto-detected from extension)

MAPPING:
  -m, --map <FILE>          Mapping file (.morph)
  -e, --expr <EXPRESSION>   Inline mapping expression
      --dry-run             Parse and validate mapping without executing

OUTPUT CONTROL:
      --pretty              Pretty-print output (default for terminal)
      --compact             Compact output (default for pipe)
      --indent <N>          Indentation width (default: 2)
      --no-color            Disable colored output
  -q, --quiet               Suppress warnings

FORMAT-SPECIFIC:
      --csv-delimiter <C>   CSV delimiter (default: ',')
      --csv-no-header       CSV has no header row
      --csv-header <FIELDS> Explicit CSV headers
      --xml-root <NAME>     XML root element name
      --xml-attr-prefix <S> XML attribute prefix (default: '@')
      --yaml-multi          Treat input as multi-document YAML

META:
  -V, --version             Print version
  -h, --help                Print help
      --formats             List all supported formats
      --functions           List all built-in functions
      --completions <SHELL> Generate shell completions

5. Error Handling

Clear error messages with line numbers and context for mapping parse errors
Schema validation mode — optionally validate output against a JSON Schema
Graceful degradation — skip malformed records with --skip-errors, log to stderr
Type mismatch warnings — warn when a cast might lose data

6. Performance

Streaming for JSON, JSON Lines, CSV — don't buffer entire file in memory
Zero-copy parsing where the format allows it
Parallel processing for large files (optional, --parallel)
Target: >100MB/s throughput for simple JSON→YAML conversion on modern hardware

Architecture

┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌─────────────┐
│   Reader     │────▶│  Universal   │────▶│  Transform   │────▶│   Writer    │
│  (format)    │     │  Value (UV)  │     │  Engine      │     │  (format)   │
└─────────────┘     └──────────────┘     └──────────────┘     └─────────────┘
                                               ▲
                                               │
                                         ┌─────┴─────┐
                                         │  Mapping   │
                                         │  Parser    │
                                         └───────────┘

Universal Value (UV)

The internal representation all formats are normalized to:

enum Value {
    Null,
    Bool(bool),
    Int(i64),
    Float(f64),
    String(String),
    Bytes(Vec<u8>),
    Array(Vec<Value>),
    Map(IndexMap<String, Value>),  // preserves insertion order
}

Key design decisions:

Ordered maps — preserves key order from input (important for config files)
Separate Int/Float — avoids precision loss
Bytes variant — supports binary formats like MessagePack
No format-specific metadata in UV — XML attributes are mapped to a convention (e.g., @attr)

Module Structure

morph/
├── src/
│   ├── main.rs              # CLI entry point
│   ├── lib.rs               # Library root
│   ├── value.rs             # Universal Value type
│   ├── error.rs             # Error types
│   ├── formats/
│   │   ├── mod.rs           # Format registry, auto-detection
│   │   ├── json.rs          # JSON reader/writer
│   │   ├── yaml.rs          # YAML reader/writer
│   │   ├── toml.rs          # TOML reader/writer
│   │   ├── csv.rs           # CSV/TSV reader/writer
│   │   ├── xml.rs           # XML reader/writer
│   │   ├── jsonl.rs         # JSON Lines reader/writer
│   │   ├── msgpack.rs       # MessagePack reader/writer
│   │   ├── sexpr.rs         # S-expressions reader/writer
│   │   ├── querystring.rs   # Query string reader/writer
│   │   └── edn.rs           # EDN reader/writer
│   ├── mapping/
│   │   ├── mod.rs           # Mapping module root
│   │   ├── lexer.rs         # Tokenizer
│   │   ├── parser.rs        # AST construction
│   │   ├── ast.rs           # AST types
│   │   ├── eval.rs          # Mapping evaluator
│   │   └── functions.rs     # Built-in function implementations
│   └── cli.rs               # Argument parsing (clap)
├── tests/
│   ├── formats/             # Per-format round-trip tests
│   ├── mapping/             # Mapping language tests
│   └── integration/         # End-to-end CLI tests
├── Cargo.toml
├── README.md
├── PRD.md
├── LICENSE
└── .github/
    └── workflows/
        └── ci.yml           # Test + lint + build

Milestones

v0.1.0 — Foundation (MVP)

v0.2.0 — Mapping Language

Lexer and parser
Core operations: rename, select, drop, set, default, cast
Path syntax (dot notation, array indexing)
-m and -e flags
String functions (join, split, lower, upper, trim, replace)
Mapping validation (--dry-run)

v0.3.0 — Advanced Mappings

v0.4.0 — Extended Formats

XML reader/writer
JSON Lines reader/writer
MessagePack reader/writer
Format-specific options (CSV delimiters, XML attributes)
Multi-document YAML

v0.5.0 — Polish & Performance

Streaming mode for large files
Shell completions
Comprehensive error messages with suggestions
--formats and --functions help commands
Performance benchmarks

v1.0.0 — Stable Release

Non-Goals (v1.0)

GUI — this is a CLI tool
Database connectivity — not a database client
Network protocols — no HTTP fetching built in (use curl | morph)
Turing-complete scripting — the mapping language is intentionally limited
Format-specific validation — morph converts, it doesn't validate schemas (except optionally)
Bidirectional sync — one-way conversion only

Competitive Landscape

Tool	Multi-format	Transform Language	Learning Curve	Streaming
jq	JSON only	jq DSL	Steep	✅
yq	YAML/JSON	jq-like	Steep	✅
xq	XML→JSON	jq-like	Steep	❌
csvkit	CSV only	SQL-like	Medium	Partial
miller	CSV/JSON/etc	Miller DSL	Medium	✅
dasel	JSON/YAML/TOML/XML	Path queries	Low	❌
morph	10+ formats	morph DSL	Low	✅

Key differentiator: morph combines the broadest format support with a readable, learnable transformation language. No need to learn jq's syntax — morph reads like English.

Technical Decisions

Why Rust?

Single binary, no runtime dependencies
Excellent performance for data processing
Strong ecosystem for parsing (serde, nom)
Cross-platform (Linux, macOS, Windows)
Memory safety without GC (important for streaming)

Why a custom DSL instead of embedding jq/Lua/etc?

Readability — the #1 design goal. jq is powerful but cryptic for newcomers
Safety — no arbitrary code execution, just data transformation
Learnability — someone should understand a .morph file without reading docs
Optimization — a purpose-built evaluator can optimize for common patterns

Why not just wrap existing libraries?

We do! Format readers/writers use battle-tested crates (serde_json, serde_yaml, toml, quick-xml, csv, rmp-serde)
The value is in the unified interface and mapping language, not in reimplementing parsers

Success Criteria

Usability: A developer can convert between any two supported formats in under 10 seconds
Learnability: The mapping language should be understood without reading docs for simple cases
Performance: Competitive with or faster than format-specific tools for simple conversions
Adoption: 500+ GitHub stars within 6 months of launch (indicates real-world utility)
Reliability: Zero data loss in format conversion for well-formed inputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

morph — Product Requirements Document

Vision

Problem Statement

Target Users

Primary

Secondary

Core Features

1. Multi-Format I/O

2. Zero-Config Simple Conversion

3. The Mapping Language

3.1 Core Operations

3.2 Path Syntax

3.3 Expressions

3.4 Built-in Functions (v1.0)

4. CLI Interface

5. Error Handling

6. Performance

Architecture

Universal Value (UV)

Module Structure

Milestones

v0.1.0 — Foundation (MVP)

v0.2.0 — Mapping Language

v0.3.0 — Advanced Mappings

v0.4.0 — Extended Formats

v0.5.0 — Polish & Performance

v1.0.0 — Stable Release

Non-Goals (v1.0)

Competitive Landscape

Technical Decisions

Why Rust?

Why a custom DSL instead of embedding jq/Lua/etc?

Why not just wrap existing libraries?

Success Criteria

FilesExpand file tree

PRD.md

Latest commit

History

PRD.md

File metadata and controls

morph — Product Requirements Document

Vision

Problem Statement

Target Users

Primary

Secondary

Core Features

1. Multi-Format I/O

2. Zero-Config Simple Conversion

3. The Mapping Language

3.1 Core Operations

3.2 Path Syntax

3.3 Expressions

3.4 Built-in Functions (v1.0)

4. CLI Interface

5. Error Handling

6. Performance

Architecture

Universal Value (UV)

Module Structure

Milestones

v0.1.0 — Foundation (MVP)

v0.2.0 — Mapping Language

v0.3.0 — Advanced Mappings

v0.4.0 — Extended Formats

v0.5.0 — Polish & Performance

v1.0.0 — Stable Release

Non-Goals (v1.0)

Competitive Landscape

Technical Decisions

Why Rust?

Why a custom DSL instead of embedding jq/Lua/etc?

Why not just wrap existing libraries?

Success Criteria