morph is a universal CLI data format converter with a built-in mapping language. It converts structured data between formats while giving users precise control over how fields are renamed, reshaped, filtered, and transformed — through a small, readable, purpose-built language.
The goal: one tool to replace the dozen format-specific converters people cobble together with scripts.
Developers, ops engineers, and data workers constantly convert between data formats:
- Config migration (YAML → TOML, JSON → YAML)
- Data pipeline glue (CSV → JSON, JSON Lines → CSV)
- API response reshaping (deeply nested JSON → flat CSV)
- Legacy format interop (XML → JSON, INI-like → TOML)
Current solutions fall into three buckets:
- One-off scripts — Python/Ruby/Node snippets. Work, but fragile, slow to write, hard to share.
- jq / yq / xq family — Powerful but format-specific, steep learning curves, inconsistent syntax across tools.
- Online converters — Can't handle sensitive data, no automation, no customization.
None of them offer a single tool that reads multiple formats, writes multiple formats, and provides a user-friendly transformation language in between.
- Backend / DevOps engineers — config file migration, CI/CD pipelines, infrastructure-as-code
- Data engineers — quick format conversions in ETL scripts, data cleaning
- CLI power users — people who live in the terminal and chain tools with pipes
- Technical writers — converting example data between formats for docs
- Security researchers — parsing/reshaping structured data from various tools
- Students / learners — understanding data format differences
Read and write all supported formats through a single unified interface.
Supported formats (v1.0):
| Format | Read | Write | Priority | Notes |
|---|---|---|---|---|
| JSON | ✅ | ✅ | P0 | Streaming, pretty/compact output |
| YAML | ✅ | ✅ | P0 | Multi-document, anchors/aliases |
| TOML | ✅ | ✅ | P0 | Full spec compliance |
| CSV / TSV | ✅ | ✅ | P0 | Headers, custom delimiters, quoting |
| XML | ✅ | ✅ | P1 | Configurable attribute handling |
| JSON Lines | ✅ | ✅ | P0 | Newline-delimited JSON |
| MessagePack | ✅ | ✅ | P1 | Binary format |
| S-expressions | ✅ | ✅ | P2 | Lisp-style |
| Query String | ✅ | ✅ | P2 | URL-encoded |
| EDN | ✅ | ✅ | P2 | Clojure data notation |
Format detection:
- By file extension (
.json,.yaml,.yml,.toml,.csv,.xml, etc.) - By explicit flag (
-f json,-t yaml) - Stdin requires explicit format flag
The simplest use case should require zero learning:
morph -i input.json -o output.yaml
# or
cat data.csv | morph -f csv -t json
# or
morph -i data.json -t yaml # outputs to stdoutNo mapping file, no config, no flags beyond format specification.
A purpose-built DSL for describing data transformations. Design goals:
- Readable at first sight — someone unfamiliar should understand what a
.morphfile does - English-like keywords —
rename,select,drop,where,set,cast - Minimal syntax — no brackets/parens unless necessary, arrow (
->) for directionality - Composable — operations apply in order, top to bottom
- No Turing-completeness — this is a data mapping language, not a programming language
rename <path> -> <path> # rename a field
select <path>, <path>, ... # keep only these fields
drop <path>, <path>, ... # remove these fields
flatten <path> -> <paths...> # unnest object into flat fields
nest <paths...> -> <path> # group flat fields into object
set <path> = <expression> # create/overwrite field
default <path> = <expression> # set only if field is null/missing
cast <path> as <type> # type coercion (int, float, bool, string)
where <condition> # filter rows/elements
sort <path> [asc|desc] # sort elements
each <path> { ... } # iterate over array elements
when <condition> { ... } # conditional block
.field # top-level field
.parent.child # nested field
.array[0] # array index
.array[*] # all array elements
.array[-1] # last element
.["key with spaces"] # quoted key
# Literals
"hello", 42, 3.14, true, false, null
# Field references
.field_name
# Function calls
join(.first, " ", .last)
lower(.name)
len(.items)
# Arithmetic
.price * .quantity
.total - .discount
# String interpolation
"{.first} {.last}"
# Comparison
.age > 18
.status == "active"
.name != null
# Logical
.age > 18 and .status == "active"
.role == "admin" or .role == "superadmin"
not .deleted
String:
join, split, lower, upper, trim, replace, starts_with, ends_with, contains, regex_match, regex_replace, substring, pad_left, pad_right
Math:
round, ceil, floor, abs, min, max, sum
Collection:
len, keys, values, flatten, unique, reverse, first, last, count, group_by
Type:
type_of, is_null, is_array, is_object, is_string, is_number
Date/Time:
now, parse_date, format_date
System:
env (read environment variable)
Utility:
coalesce (first non-null), if (ternary)
morph [OPTIONS]
INPUT/OUTPUT:
-i, --input <FILE> Input file (or stdin if omitted)
-o, --output <FILE> Output file (or stdout if omitted)
-f, --from <FORMAT> Input format (auto-detected from extension)
-t, --to <FORMAT> Output format (auto-detected from extension)
MAPPING:
-m, --map <FILE> Mapping file (.morph)
-e, --expr <EXPRESSION> Inline mapping expression
--dry-run Parse and validate mapping without executing
OUTPUT CONTROL:
--pretty Pretty-print output (default for terminal)
--compact Compact output (default for pipe)
--indent <N> Indentation width (default: 2)
--no-color Disable colored output
-q, --quiet Suppress warnings
FORMAT-SPECIFIC:
--csv-delimiter <C> CSV delimiter (default: ',')
--csv-no-header CSV has no header row
--csv-header <FIELDS> Explicit CSV headers
--xml-root <NAME> XML root element name
--xml-attr-prefix <S> XML attribute prefix (default: '@')
--yaml-multi Treat input as multi-document YAML
META:
-V, --version Print version
-h, --help Print help
--formats List all supported formats
--functions List all built-in functions
--completions <SHELL> Generate shell completions
- Clear error messages with line numbers and context for mapping parse errors
- Schema validation mode — optionally validate output against a JSON Schema
- Graceful degradation — skip malformed records with
--skip-errors, log to stderr - Type mismatch warnings — warn when a cast might lose data
- Streaming for JSON, JSON Lines, CSV — don't buffer entire file in memory
- Zero-copy parsing where the format allows it
- Parallel processing for large files (optional,
--parallel) - Target: >100MB/s throughput for simple JSON→YAML conversion on modern hardware
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐
│ Reader │────▶│ Universal │────▶│ Transform │────▶│ Writer │
│ (format) │ │ Value (UV) │ │ Engine │ │ (format) │
└─────────────┘ └──────────────┘ └──────────────┘ └─────────────┘
▲
│
┌─────┴─────┐
│ Mapping │
│ Parser │
└───────────┘
The internal representation all formats are normalized to:
enum Value {
Null,
Bool(bool),
Int(i64),
Float(f64),
String(String),
Bytes(Vec<u8>),
Array(Vec<Value>),
Map(IndexMap<String, Value>), // preserves insertion order
}Key design decisions:
- Ordered maps — preserves key order from input (important for config files)
- Separate Int/Float — avoids precision loss
- Bytes variant — supports binary formats like MessagePack
- No format-specific metadata in UV — XML attributes are mapped to a convention (e.g.,
@attr)
morph/
├── src/
│ ├── main.rs # CLI entry point
│ ├── lib.rs # Library root
│ ├── value.rs # Universal Value type
│ ├── error.rs # Error types
│ ├── formats/
│ │ ├── mod.rs # Format registry, auto-detection
│ │ ├── json.rs # JSON reader/writer
│ │ ├── yaml.rs # YAML reader/writer
│ │ ├── toml.rs # TOML reader/writer
│ │ ├── csv.rs # CSV/TSV reader/writer
│ │ ├── xml.rs # XML reader/writer
│ │ ├── jsonl.rs # JSON Lines reader/writer
│ │ ├── msgpack.rs # MessagePack reader/writer
│ │ ├── sexpr.rs # S-expressions reader/writer
│ │ ├── querystring.rs # Query string reader/writer
│ │ └── edn.rs # EDN reader/writer
│ ├── mapping/
│ │ ├── mod.rs # Mapping module root
│ │ ├── lexer.rs # Tokenizer
│ │ ├── parser.rs # AST construction
│ │ ├── ast.rs # AST types
│ │ ├── eval.rs # Mapping evaluator
│ │ └── functions.rs # Built-in function implementations
│ └── cli.rs # Argument parsing (clap)
├── tests/
│ ├── formats/ # Per-format round-trip tests
│ ├── mapping/ # Mapping language tests
│ └── integration/ # End-to-end CLI tests
├── Cargo.toml
├── README.md
├── PRD.md
├── LICENSE
└── .github/
└── workflows/
└── ci.yml # Test + lint + build
- Universal Value type
- JSON reader/writer
- YAML reader/writer
- TOML reader/writer
- CSV reader/writer
- CLI with
-i,-o,-f,-tflags - Format auto-detection by extension
- Pretty/compact output modes
- Basic error handling
- Lexer and parser
- Core operations:
rename,select,drop,set,default,cast - Path syntax (dot notation, array indexing)
-
-mand-eflags - String functions (
join,split,lower,upper,trim,replace) - Mapping validation (
--dry-run)
-
flatten,nestoperations -
wherefiltering -
sortoperation -
eachandwhenblocks - Arithmetic expressions
- String interpolation
- Collection functions (
keys,values,unique,group_by)
- XML reader/writer
- JSON Lines reader/writer
- MessagePack reader/writer
- Format-specific options (CSV delimiters, XML attributes)
- Multi-document YAML
- Streaming mode for large files
- Shell completions
- Comprehensive error messages with suggestions
-
--formatsand--functionshelp commands - Performance benchmarks
- S-expressions, Query String, EDN formats
- Schema validation
-
--skip-errorsmode - Parallel processing
- Comprehensive documentation
- Homebrew / cargo-binstall distribution
- GUI — this is a CLI tool
- Database connectivity — not a database client
- Network protocols — no HTTP fetching built in (use curl | morph)
- Turing-complete scripting — the mapping language is intentionally limited
- Format-specific validation — morph converts, it doesn't validate schemas (except optionally)
- Bidirectional sync — one-way conversion only
| Tool | Multi-format | Transform Language | Learning Curve | Streaming |
|---|---|---|---|---|
| jq | JSON only | jq DSL | Steep | ✅ |
| yq | YAML/JSON | jq-like | Steep | ✅ |
| xq | XML→JSON | jq-like | Steep | ❌ |
| csvkit | CSV only | SQL-like | Medium | Partial |
| miller | CSV/JSON/etc | Miller DSL | Medium | ✅ |
| dasel | JSON/YAML/TOML/XML | Path queries | Low | ❌ |
| morph | 10+ formats | morph DSL | Low | ✅ |
Key differentiator: morph combines the broadest format support with a readable, learnable transformation language. No need to learn jq's syntax — morph reads like English.
- Single binary, no runtime dependencies
- Excellent performance for data processing
- Strong ecosystem for parsing (serde, nom)
- Cross-platform (Linux, macOS, Windows)
- Memory safety without GC (important for streaming)
- Readability — the #1 design goal. jq is powerful but cryptic for newcomers
- Safety — no arbitrary code execution, just data transformation
- Learnability — someone should understand a
.morphfile without reading docs - Optimization — a purpose-built evaluator can optimize for common patterns
- We do! Format readers/writers use battle-tested crates (
serde_json,serde_yaml,toml,quick-xml,csv,rmp-serde) - The value is in the unified interface and mapping language, not in reimplementing parsers
- Usability: A developer can convert between any two supported formats in under 10 seconds
- Learnability: The mapping language should be understood without reading docs for simple cases
- Performance: Competitive with or faster than format-specific tools for simple conversions
- Adoption: 500+ GitHub stars within 6 months of launch (indicates real-world utility)
- Reliability: Zero data loss in format conversion for well-formed inputs