Skip to content

Commit d070b18

Browse files
committed
Initial release: AI-powered jq filter synthesis
1 parent a2bbc05 commit d070b18

28 files changed

+3804
-252
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ dist/
1010
build/
1111
.pytest_cache/
1212
.claude/
13+
*.zip

Archive.zip

-303 KB
Binary file not shown.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 nulone
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 115 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ JQ-Synth automatically generates [jq](https://stedolan.github.io/jq/) filter exp
88
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
99
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
1010

11+
![Demo](demo.gif)
12+
1113
## Overview
1214

1315
JQ-Synth solves a common developer problem: you know what JSON transformation you want, but writing the correct jq filter is tricky. Simply provide example input/output pairs, and JQ-Synth will synthesize the filter for you.
@@ -48,64 +50,6 @@ source .venv/bin/activate # On Windows: .venv\Scripts\activate
4850
pip install -e .
4951
```
5052

51-
## Supported Providers
52-
53-
| Provider | Status | Note |
54-
|----------|--------|------|
55-
| OpenAI | Stable ✅ | Default, tested |
56-
| Anthropic | Beta ⚠️ | May have edge cases |
57-
| OpenRouter | Beta ⚠️ | Via OpenAI-compatible endpoint |
58-
| Ollama | Alpha 🧪 | Local only, requires setup |
59-
60-
> Note: OpenAI is default and most tested. Others should work but report issues if found.
61-
62-
### Provider Setup
63-
64-
**OpenAI (Default)**
65-
66-
```bash
67-
export OPENAI_API_KEY='sk-...'
68-
# Optional: specify model (default: gpt-4o)
69-
export LLM_MODEL='gpt-4o'
70-
```
71-
72-
**Anthropic**
73-
74-
```bash
75-
export LLM_PROVIDER='anthropic'
76-
export ANTHROPIC_API_KEY='sk-ant-...'
77-
# Optional: specify model (default: claude-sonnet-4-20250514)
78-
export LLM_MODEL='claude-sonnet-4-20250514'
79-
```
80-
81-
**OpenRouter**
82-
83-
```bash
84-
export LLM_BASE_URL='https://openrouter.ai/api/v1'
85-
export OPENAI_API_KEY='sk-or-...'
86-
export LLM_MODEL='anthropic/claude-3.5-sonnet'
87-
```
88-
89-
**Local (Ollama)**
90-
91-
```bash
92-
export LLM_BASE_URL='http://localhost:11434/v1'
93-
export LLM_MODEL='llama3'
94-
export OPENAI_API_KEY='dummy' # Ollama doesn't require a real key
95-
```
96-
97-
**Together AI / Groq**
98-
99-
```bash
100-
# Together AI
101-
export LLM_BASE_URL='https://api.together.xyz/v1'
102-
export OPENAI_API_KEY='...'
103-
104-
# Groq
105-
export LLM_BASE_URL='https://api.groq.com/openai/v1'
106-
export OPENAI_API_KEY='gsk_...'
107-
```
108-
10953
## Quick Start
11054

11155
### Interactive Mode
@@ -245,6 +189,19 @@ jq-synth --base-url https://openrouter.ai/api/v1 --model anthropic/claude-3.5-so
245189
jq-synth --base-url http://localhost:11434/v1 --model llama3 --task nested-field
246190
```
247191

192+
## How It Works
193+
194+
JQ-Synth uses a **deterministic oracle** approach:
195+
196+
1. **Generation**: An LLM (GPT-4, Claude, or compatible model) generates candidate jq filters based on your examples and description
197+
2. **Verification**: Each filter is executed against the real jq binary with your input examples
198+
3. **Scoring**: A deterministic algorithm compares actual vs expected outputs, computing similarity scores (0.0 to 1.0)
199+
4. **Feedback**: The algorithm classifies errors (syntax, shape, missing/extra elements, order) and generates actionable feedback
200+
5. **Refinement**: The LLM receives the feedback and generates an improved filter
201+
6. **Iteration**: Steps 2-5 repeat until a perfect match is found or limits are reached
202+
203+
This hybrid approach combines LLM creativity with deterministic verification, ensuring correctness while leveraging AI for filter synthesis.
204+
248205
## Architecture
249206

250207
JQ-Synth follows a modular architecture with clear separation of concerns:
@@ -349,6 +306,104 @@ The reviewer classifies errors by priority (highest to lowest):
349306
- **Scalars**: Binary (1.0 for exact match, 0.0 for mismatch)
350307
- **Multiple examples**: Arithmetic mean of scores
351308

309+
## Supported jq Patterns
310+
311+
JQ-Synth works well with these common jq operations:
312+
313+
- **Field extraction**: `.foo`, `.user.name`, `.data.items[0]`
314+
- **Array operations**: `.[]`, `.[0]`, `.[1:3]`, `.[-1]`
315+
- **Filtering**: `select(.active == true)`, `select(.age > 18)`
316+
- **Mapping**: `map(.name)`, `[.[] | .id]`
317+
- **Array construction**: `[.items[].name]`
318+
- **Object construction**: `{name: .user.name, email: .user.email}`
319+
- **Conditionals**: `if .status == "active" then .name else null end`
320+
- **Null handling**: `select(. != null)`, `.field // "default"`
321+
- **String operations**: String interpolation, concatenation
322+
- **Arithmetic**: Addition, subtraction, comparison operators
323+
- **Type checking**: `type`, `length`
324+
325+
## Known Limitations
326+
327+
JQ-Synth may struggle with these advanced jq features:
328+
329+
- **Aggregations**: `group_by()`, `reduce`, `min_by()`, `max_by()`
330+
- **Complex recursion**: `recurse()`, `walk()`
331+
- **Variable bindings**: Complex `as $var` patterns
332+
- **Custom functions**: `def` statements (blocked for security)
333+
- **Advanced array operations**: `combinations()`, `transpose()`
334+
- **Path manipulation**: `getpath()`, `setpath()`, `delpaths()`
335+
- **Format strings**: `@csv`, `@json`, `@base64`
336+
337+
For these cases, you may need to write the filter manually or break down the task into simpler steps.
338+
339+
## Model recommendations
340+
341+
| Task complexity | Recommended model | Speed |
342+
|-----------------|-------------------|-------|
343+
| Simple filters (extract, select) | GPT-4o-mini, Claude Haiku | Fast |
344+
| Medium (grouping, aggregation, recursion) | Claude Sonnet, GPT-4o | Fast |
345+
| Complex algorithms (graph traversal, sorting) | DeepSeek R1 | Slow (minutes) |
346+
347+
> Note: DeepSeek R1 solved topological sort and Dijkstra's shortest path in jq. Most users won't need this — standard models handle 95%+ of real-world tasks.
348+
349+
## Supported Providers
350+
351+
| Provider | Status | Note |
352+
|----------|--------|------|
353+
| OpenAI | Stable ✅ | Default provider |
354+
| Anthropic | Beta ⚠️ | Different API format |
355+
| OpenRouter | Tested ✅ | OpenAI-compatible |
356+
| Ollama | Alpha 🧪 | Local only, requires setup |
357+
358+
> Note: OpenAI is default and most tested. Others should work but report issues if found.
359+
360+
### Provider Setup
361+
362+
**OpenAI (Default)**
363+
364+
```bash
365+
export OPENAI_API_KEY='sk-...'
366+
# Optional: specify model (default: gpt-4o)
367+
export LLM_MODEL='gpt-4o'
368+
```
369+
370+
**Anthropic**
371+
372+
```bash
373+
export LLM_PROVIDER='anthropic'
374+
export ANTHROPIC_API_KEY='sk-ant-...'
375+
# Optional: specify model (default: claude-sonnet-4-20250514)
376+
export LLM_MODEL='claude-sonnet-4-20250514'
377+
```
378+
379+
**OpenRouter**
380+
381+
```bash
382+
export LLM_BASE_URL='https://openrouter.ai/api/v1'
383+
export OPENAI_API_KEY='sk-or-...'
384+
export LLM_MODEL='anthropic/claude-3.5-sonnet'
385+
```
386+
387+
**Local (Ollama)**
388+
389+
```bash
390+
export LLM_BASE_URL='http://localhost:11434/v1'
391+
export LLM_MODEL='llama3'
392+
export OPENAI_API_KEY='dummy' # Ollama doesn't require a real key
393+
```
394+
395+
**Together AI / Groq**
396+
397+
```bash
398+
# Together AI
399+
export LLM_BASE_URL='https://api.together.xyz/v1'
400+
export OPENAI_API_KEY='...'
401+
402+
# Groq
403+
export LLM_BASE_URL='https://api.groq.com/openai/v1'
404+
export OPENAI_API_KEY='gsk_...'
405+
```
406+
352407
## Task File Format
353408

354409
Tasks are defined in JSON format:
@@ -604,7 +659,7 @@ mypy src && \
604659
pytest -m "not e2e"
605660
```
606661

607-
### Project Structure
662+
## Project Structure
608663

609664
```
610665
jq-synth/
@@ -650,7 +705,7 @@ Contributions are welcome! Please follow these steps:
650705
6. **Push** to your fork: `git push origin feature/my-feature`
651706
7. **Open** a Pull Request
652707

653-
### Code Style
708+
## Code Style
654709

655710
- Type hints required for all public functions
656711
- Docstrings required for all public functions and classes (Google style)
@@ -669,49 +724,6 @@ MIT License - see [LICENSE](LICENSE) for details.
669724
- [OpenAI](https://openai.com) - GPT models and API
670725
- [Anthropic](https://anthropic.com) - Claude models and API
671726

672-
## Supported jq Patterns
673-
674-
JQ-Synth works well with these common jq operations:
675-
676-
- **Field extraction**: `.foo`, `.user.name`, `.data.items[0]`
677-
- **Array operations**: `.[]`, `.[0]`, `.[1:3]`, `.[-1]`
678-
- **Filtering**: `select(.active == true)`, `select(.age > 18)`
679-
- **Mapping**: `map(.name)`, `[.[] | .id]`
680-
- **Array construction**: `[.items[].name]`
681-
- **Object construction**: `{name: .user.name, email: .user.email}`
682-
- **Conditionals**: `if .status == "active" then .name else null end`
683-
- **Null handling**: `select(. != null)`, `.field // "default"`
684-
- **String operations**: String interpolation, concatenation
685-
- **Arithmetic**: Addition, subtraction, comparison operators
686-
- **Type checking**: `type`, `length`
687-
688-
## Known Limitations
689-
690-
JQ-Synth may struggle with these advanced jq features:
691-
692-
- **Aggregations**: `group_by()`, `reduce`, `min_by()`, `max_by()`
693-
- **Complex recursion**: `recurse()`, `walk()`
694-
- **Variable bindings**: Complex `as $var` patterns
695-
- **Custom functions**: `def` statements (blocked for security)
696-
- **Advanced array operations**: `combinations()`, `transpose()`
697-
- **Path manipulation**: `getpath()`, `setpath()`, `delpaths()`
698-
- **Format strings**: `@csv`, `@json`, `@base64`
699-
700-
For these cases, you may need to write the filter manually or break down the task into simpler steps.
701-
702-
## How It Works
703-
704-
JQ-Synth uses a **deterministic oracle** approach:
705-
706-
1. **Generation**: An LLM (GPT-4, Claude, or compatible model) generates candidate jq filters based on your examples and description
707-
2. **Verification**: Each filter is executed against the real jq binary with your input examples
708-
3. **Scoring**: A deterministic algorithm compares actual vs expected outputs, computing similarity scores (0.0 to 1.0)
709-
4. **Feedback**: The algorithm classifies errors (syntax, shape, missing/extra elements, order) and generates actionable feedback
710-
5. **Refinement**: The LLM receives the feedback and generates an improved filter
711-
6. **Iteration**: Steps 2-5 repeat until a perfect match is found or limits are reached
712-
713-
This hybrid approach combines LLM creativity with deterministic verification, ensuring correctness while leveraging AI for filter synthesis.
714-
715727
---
716728

717729
**JQ-Synth** - Because life's too short to debug jq filters manually.

demo.gif

196 KB
Loading

0 commit comments

Comments
 (0)