|
1 | | -# Claude Assistant Memory |
| 1 | +# Claude Assistant Memory - JsonRemedy TDD Rewrite |
2 | 2 |
|
3 | | -This file contains information to help Claude understand and work with this project. |
| 3 | +This file tracks the ground-up TDD rewrite of JsonRemedy following the honest, pragmatic approach outlined in the critique and comprehensive test plans. |
4 | 4 |
|
5 | 5 | ## Project Overview |
6 | | -JSON Remedy - An Elixir library for parsing and fixing malformed JSON. |
| 6 | +JsonRemedy - A practical, multi-layered JSON repair library for Elixir that intelligently fixes malformed JSON strings commonly produced by LLMs, legacy systems, and data pipelines. |
| 7 | + |
| 8 | +**Architecture**: 5-layer pipeline approach where each layer uses the most appropriate tool for the job—regex for syntax fixes, state machines for context-aware repairs, and Jason.decode for clean parsing. |
| 9 | + |
| 10 | +## Current TDD Implementation Status |
| 11 | + |
| 12 | +### Phase 1: Foundation & Interfaces ✅ PLANNED |
| 13 | +- **Layer contracts defined** in test/04_API_CONTRACTS.md |
| 14 | +- **Test specifications** detailed in test/05_DETAILED_TEST_SPEC_AND_CASES.md |
| 15 | +- **TDD strategy** documented in test/03_TDD_STRATEGY.md |
| 16 | + |
| 17 | +### Phase 2: Layer 1 - Content Cleaning ✅ COMPLETED |
| 18 | +**Goal**: Remove non-JSON content and normalize encoding using regex (the right tool for this job) |
| 19 | + |
| 20 | +**Test Categories**: |
| 21 | +- ✅ Code fence removal (`test/unit/layer1_content_cleaning_test.exs`) |
| 22 | +- ✅ Comment stripping (// and /* */) |
| 23 | +- ✅ Wrapper text extraction (HTML, prose) |
| 24 | +- ✅ Encoding normalization |
| 25 | + |
| 26 | +**Status**: **TDD COMPLETE** - All 14 tests passing, code quality checks pass |
| 27 | + |
| 28 | +### Phase 3: Layer 2 - Structural Repair 📋 PLANNED |
| 29 | +**Goal**: Fix missing/extra delimiters using state machine for context tracking |
| 30 | + |
| 31 | +### Phase 4: Layer 3 - Syntax Normalization 📋 PLANNED |
| 32 | +**Goal**: Fix quotes, booleans, trailing commas using context-aware regex rules |
| 33 | + |
| 34 | +### Phase 5: Layer 4 - Validation 📋 PLANNED |
| 35 | +**Goal**: Attempt Jason.decode for fast path optimization |
| 36 | + |
| 37 | +### Phase 6: Layer 5 - Tolerant Parsing 📋 PLANNED |
| 38 | +**Goal**: Custom parser for edge cases with aggressive error recovery |
7 | 39 |
|
8 | 40 | ## Key Commands |
9 | | -- `mix test` - Run tests |
| 41 | +- `mix test` - Run all tests |
| 42 | +- `mix test test/unit/layer1_content_cleaning_test.exs` - Run Layer 1 tests |
| 43 | +- `mix test --only unit` - Run unit tests only |
| 44 | +- `mix test --only integration` - Run integration tests only |
| 45 | +- `mix test --only performance` - Run performance tests only |
| 46 | +- `mix test --only property` - Run property-based tests only |
10 | 47 | - `mix deps.get` - Install dependencies |
11 | | -- `iex -S mix` - Start interactive Elixir shell with project loaded |
12 | | -- `mix format` - Format code according to Elixir standards |
13 | | -- `mix credo --strict` - Run code quality checks |
14 | | -- `mix dialyzer` - Run static type analysis |
| 48 | +- `iex -S mix` - Start interactive Elixir shell |
| 49 | +- `mix format` - Format code |
| 50 | +- `mix credo --strict` - Code quality checks |
| 51 | +- `mix dialyzer` - Static type analysis |
| 52 | + |
| 53 | +## TDD Strategy - Red-Green-Refactor |
| 54 | + |
| 55 | +### Daily Workflow |
| 56 | +1. **Red Phase**: Write failing tests for new features (30 min) |
| 57 | +2. **Green Phase**: Implement minimal code to pass (60-90 min) |
| 58 | +3. **Refactor Phase**: Clean up code and improve design (30 min) |
| 59 | +4. **Integration**: Run full test suite and fix issues (30 min) |
| 60 | + |
| 61 | +### Success Metrics |
| 62 | +- **Test coverage**: 95%+ for core repair functions |
| 63 | +- **Layer success rates**: Layer1: 95%, Layer2: 85%, Layer3: 90% |
| 64 | +- **Performance targets**: Valid JSON <10μs, Simple repair <1ms, Complex <5ms |
15 | 65 |
|
16 | 66 | ## Project Structure |
17 | | -- `lib/` - Main library code |
18 | | - - `json_remedy.ex` - Main API module |
19 | | - - `json_remedy/binary_parser.ex` - Binary pattern matching parser |
20 | | - - `json_remedy/combinators.ex` - Parser combinator approach (future) |
21 | | - - `json_remedy/pipeline.ex` - Streaming pipeline approach (future) |
22 | | -- `test/` - Test files |
23 | | -- `mix.exs` - Project configuration and dependencies |
24 | | -- `.credo.exs` - Code quality configuration |
25 | | -- `libPorted/` - Legacy code (unused) |
26 | | - |
27 | | -## Testing |
28 | | -- Main test file: `test/json_remedy_test.exs` |
29 | | -- Test data: `test/support/` contains valid.json and invalid.json files |
30 | | -- Run specific tests with: `mix test test/json_remedy_test.exs` |
31 | | -- All tests pass: 10 doctests + 25 regular tests |
32 | | - |
33 | | -## Code Quality Standards |
34 | | -- All modules follow CODE_QUALITY.md standards |
35 | | -- All public functions have @spec type annotations |
36 | | -- All modules have proper @moduledoc and @doc documentation |
| 67 | +``` |
| 68 | +lib/ |
| 69 | +├── json_remedy.ex # Main API module |
| 70 | +├── json_remedy/ |
| 71 | +│ ├── layer1/ |
| 72 | +│ │ └── content_cleaning.ex # 🟡 CURRENT: Code fences, comments, encoding |
| 73 | +│ ├── layer2/ |
| 74 | +│ │ └── structural_repair.ex # Missing delimiters, state machine |
| 75 | +│ ├── layer3/ |
| 76 | +│ │ └── syntax_normalization.ex # Quotes, booleans, commas (context-aware) |
| 77 | +│ ├── layer4/ |
| 78 | +│ │ └── validation.ex # Jason.decode fast path |
| 79 | +│ ├── layer5/ |
| 80 | +│ │ └── tolerant_parsing.ex # Custom parser for edge cases |
| 81 | +│ ├── pipeline.ex # Orchestrates layers |
| 82 | +│ ├── error.ex # Standardized error handling |
| 83 | +│ └── config.ex # Configuration management |
| 84 | +
|
| 85 | +test/ |
| 86 | +├── unit/ # Layer-specific tests |
| 87 | +│ ├── layer1_content_cleaning_test.exs # 🟡 STARTING HERE |
| 88 | +│ ├── layer2_structural_repair_test.exs |
| 89 | +│ ├── layer3_syntax_normalization_test.exs |
| 90 | +│ ├── layer4_validation_test.exs |
| 91 | +│ └── layer5_tolerant_parsing_test.exs |
| 92 | +├── integration/ # End-to-end tests |
| 93 | +├── performance/ # Benchmarks and memory tests |
| 94 | +├── property/ # Property-based testing |
| 95 | +└── support/ # Test fixtures and utilities |
| 96 | +``` |
| 97 | + |
| 98 | +## Implementation Notes |
| 99 | + |
| 100 | +### Architectural Decisions |
| 101 | +- **NOT using binary pattern matching as primary approach** (per critique analysis) |
| 102 | +- **Regex for Layer 1**: Perfect tool for content cleaning and syntax fixes |
| 103 | +- **State machine for Layer 2**: Context-aware structural repairs |
| 104 | +- **Jason.decode for Layer 4**: Leverage battle-tested parser for fast path |
| 105 | +- **Custom parser for Layer 5**: Handle truly edge cases |
| 106 | + |
| 107 | +### Layer 1 Implementation Plan |
| 108 | +1. **Code fence removal** - regex-based with string content preservation |
| 109 | +2. **Comment stripping** - both `//` and `/* */` with context awareness |
| 110 | +3. **Wrapper extraction** - HTML tags, prose text around JSON |
| 111 | +4. **Encoding normalization** - UTF-8 handling |
| 112 | + |
| 113 | +### Context Awareness Strategy |
| 114 | +Each layer must distinguish between: |
| 115 | +- **JSON structure** (should be repaired) |
| 116 | +- **String content** (should be preserved) |
| 117 | + |
| 118 | +Example: `{message: "Don't change: True, None", active: True}` |
| 119 | +- Preserve `True, None` inside string |
| 120 | +- Repair `True` → `true` outside string |
| 121 | + |
| 122 | +## Quality Standards |
| 123 | +- All public functions have `@spec` type annotations |
| 124 | +- All modules have `@moduledoc` and `@doc` documentation |
37 | 125 | - Code passes Dialyzer static analysis |
38 | 126 | - Code passes Credo quality checks |
39 | | -- Code is formatted with `mix format` |
| 127 | +- Code formatted with `mix format` |
| 128 | +- All repairs logged with detailed context |
| 129 | + |
| 130 | +## Test Data Categories |
| 131 | +1. **Syntax fixes** (95% success rate): unquoted keys, wrong booleans, trailing commas |
| 132 | +2. **Structural repairs** (85% success rate): missing delimiters, nesting issues |
| 133 | +3. **Content cleaning** (98% success rate): code fences, comments, wrappers |
| 134 | +4. **Complex scenarios** (75% success rate): multiple issues combined |
| 135 | +5. **Edge cases** (50% success rate): severely malformed (graceful failure OK) |
| 136 | + |
| 137 | +## Next Steps |
| 138 | +1. ✅ **COMPLETED**: Layer 1 Content Cleaning with TDD (14/14 tests passing) |
| 139 | +2. 🟡 **NEXT**: Begin Layer 2 Structural Repair with state machine approach |
| 140 | +3. Create test fixtures for comprehensive scenarios |
| 141 | +4. Build context-aware syntax normalization for Layer 3 |
| 142 | +5. Add performance benchmarking |
| 143 | +6. Create comprehensive integration tests |
40 | 144 |
|
41 | | -## Notes |
42 | | -- This is an Elixir project using Mix build tool |
43 | | -- Focus on JSON parsing and error recovery functionality |
44 | | -- Binary pattern matching approach provides superior performance |
45 | | -- Type specifications ensure reliability and enable static analysis |
| 145 | +## Important Reminders |
| 146 | +- **Test-first approach**: Write failing tests before implementation |
| 147 | +- **Context preservation**: Never break valid JSON content inside strings |
| 148 | +- **Performance targets**: Maintain fast path for valid JSON |
| 149 | +- **Pragmatic over pure**: Use the right tool for each layer |
| 150 | +- **Comprehensive logging**: Track all repair actions for debugging |
0 commit comments