Skip to content

Commit 9114090

Browse files
NSHkrNSHkr
authored andcommitted
ph1 tests
1 parent 4deb1bf commit 9114090

File tree

4 files changed

+756
-34
lines changed

4 files changed

+756
-34
lines changed

CLAUDE.md

Lines changed: 139 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,150 @@
1-
# Claude Assistant Memory
1+
# Claude Assistant Memory - JsonRemedy TDD Rewrite
22

3-
This file contains information to help Claude understand and work with this project.
3+
This file tracks the ground-up TDD rewrite of JsonRemedy following the honest, pragmatic approach outlined in the critique and comprehensive test plans.
44

55
## Project Overview
6-
JSON Remedy - An Elixir library for parsing and fixing malformed JSON.
6+
JsonRemedy - A practical, multi-layered JSON repair library for Elixir that intelligently fixes malformed JSON strings commonly produced by LLMs, legacy systems, and data pipelines.
7+
8+
**Architecture**: 5-layer pipeline approach where each layer uses the most appropriate tool for the job—regex for syntax fixes, state machines for context-aware repairs, and Jason.decode for clean parsing.
9+
10+
## Current TDD Implementation Status
11+
12+
### Phase 1: Foundation & Interfaces ✅ PLANNED
13+
- **Layer contracts defined** in test/04_API_CONTRACTS.md
14+
- **Test specifications** detailed in test/05_DETAILED_TEST_SPEC_AND_CASES.md
15+
- **TDD strategy** documented in test/03_TDD_STRATEGY.md
16+
17+
### Phase 2: Layer 1 - Content Cleaning ✅ COMPLETED
18+
**Goal**: Remove non-JSON content and normalize encoding using regex (the right tool for this job)
19+
20+
**Test Categories**:
21+
- ✅ Code fence removal (`test/unit/layer1_content_cleaning_test.exs`)
22+
- ✅ Comment stripping (// and /* */)
23+
- ✅ Wrapper text extraction (HTML, prose)
24+
- ✅ Encoding normalization
25+
26+
**Status**: **TDD COMPLETE** - All 14 tests passing, code quality checks pass
27+
28+
### Phase 3: Layer 2 - Structural Repair 📋 PLANNED
29+
**Goal**: Fix missing/extra delimiters using state machine for context tracking
30+
31+
### Phase 4: Layer 3 - Syntax Normalization 📋 PLANNED
32+
**Goal**: Fix quotes, booleans, trailing commas using context-aware regex rules
33+
34+
### Phase 5: Layer 4 - Validation 📋 PLANNED
35+
**Goal**: Attempt Jason.decode for fast path optimization
36+
37+
### Phase 6: Layer 5 - Tolerant Parsing 📋 PLANNED
38+
**Goal**: Custom parser for edge cases with aggressive error recovery
739

840
## Key Commands
9-
- `mix test` - Run tests
41+
- `mix test` - Run all tests
42+
- `mix test test/unit/layer1_content_cleaning_test.exs` - Run Layer 1 tests
43+
- `mix test --only unit` - Run unit tests only
44+
- `mix test --only integration` - Run integration tests only
45+
- `mix test --only performance` - Run performance tests only
46+
- `mix test --only property` - Run property-based tests only
1047
- `mix deps.get` - Install dependencies
11-
- `iex -S mix` - Start interactive Elixir shell with project loaded
12-
- `mix format` - Format code according to Elixir standards
13-
- `mix credo --strict` - Run code quality checks
14-
- `mix dialyzer` - Run static type analysis
48+
- `iex -S mix` - Start interactive Elixir shell
49+
- `mix format` - Format code
50+
- `mix credo --strict` - Code quality checks
51+
- `mix dialyzer` - Static type analysis
52+
53+
## TDD Strategy - Red-Green-Refactor
54+
55+
### Daily Workflow
56+
1. **Red Phase**: Write failing tests for new features (30 min)
57+
2. **Green Phase**: Implement minimal code to pass (60-90 min)
58+
3. **Refactor Phase**: Clean up code and improve design (30 min)
59+
4. **Integration**: Run full test suite and fix issues (30 min)
60+
61+
### Success Metrics
62+
- **Test coverage**: 95%+ for core repair functions
63+
- **Layer success rates**: Layer1: 95%, Layer2: 85%, Layer3: 90%
64+
- **Performance targets**: Valid JSON <10μs, Simple repair <1ms, Complex <5ms
1565

1666
## Project Structure
17-
- `lib/` - Main library code
18-
- `json_remedy.ex` - Main API module
19-
- `json_remedy/binary_parser.ex` - Binary pattern matching parser
20-
- `json_remedy/combinators.ex` - Parser combinator approach (future)
21-
- `json_remedy/pipeline.ex` - Streaming pipeline approach (future)
22-
- `test/` - Test files
23-
- `mix.exs` - Project configuration and dependencies
24-
- `.credo.exs` - Code quality configuration
25-
- `libPorted/` - Legacy code (unused)
26-
27-
## Testing
28-
- Main test file: `test/json_remedy_test.exs`
29-
- Test data: `test/support/` contains valid.json and invalid.json files
30-
- Run specific tests with: `mix test test/json_remedy_test.exs`
31-
- All tests pass: 10 doctests + 25 regular tests
32-
33-
## Code Quality Standards
34-
- All modules follow CODE_QUALITY.md standards
35-
- All public functions have @spec type annotations
36-
- All modules have proper @moduledoc and @doc documentation
67+
```
68+
lib/
69+
├── json_remedy.ex # Main API module
70+
├── json_remedy/
71+
│ ├── layer1/
72+
│ │ └── content_cleaning.ex # 🟡 CURRENT: Code fences, comments, encoding
73+
│ ├── layer2/
74+
│ │ └── structural_repair.ex # Missing delimiters, state machine
75+
│ ├── layer3/
76+
│ │ └── syntax_normalization.ex # Quotes, booleans, commas (context-aware)
77+
│ ├── layer4/
78+
│ │ └── validation.ex # Jason.decode fast path
79+
│ ├── layer5/
80+
│ │ └── tolerant_parsing.ex # Custom parser for edge cases
81+
│ ├── pipeline.ex # Orchestrates layers
82+
│ ├── error.ex # Standardized error handling
83+
│ └── config.ex # Configuration management
84+
85+
test/
86+
├── unit/ # Layer-specific tests
87+
│ ├── layer1_content_cleaning_test.exs # 🟡 STARTING HERE
88+
│ ├── layer2_structural_repair_test.exs
89+
│ ├── layer3_syntax_normalization_test.exs
90+
│ ├── layer4_validation_test.exs
91+
│ └── layer5_tolerant_parsing_test.exs
92+
├── integration/ # End-to-end tests
93+
├── performance/ # Benchmarks and memory tests
94+
├── property/ # Property-based testing
95+
└── support/ # Test fixtures and utilities
96+
```
97+
98+
## Implementation Notes
99+
100+
### Architectural Decisions
101+
- **NOT using binary pattern matching as primary approach** (per critique analysis)
102+
- **Regex for Layer 1**: Perfect tool for content cleaning and syntax fixes
103+
- **State machine for Layer 2**: Context-aware structural repairs
104+
- **Jason.decode for Layer 4**: Leverage battle-tested parser for fast path
105+
- **Custom parser for Layer 5**: Handle truly edge cases
106+
107+
### Layer 1 Implementation Plan
108+
1. **Code fence removal** - regex-based with string content preservation
109+
2. **Comment stripping** - both `//` and `/* */` with context awareness
110+
3. **Wrapper extraction** - HTML tags, prose text around JSON
111+
4. **Encoding normalization** - UTF-8 handling
112+
113+
### Context Awareness Strategy
114+
Each layer must distinguish between:
115+
- **JSON structure** (should be repaired)
116+
- **String content** (should be preserved)
117+
118+
Example: `{message: "Don't change: True, None", active: True}`
119+
- Preserve `True, None` inside string
120+
- Repair `True``true` outside string
121+
122+
## Quality Standards
123+
- All public functions have `@spec` type annotations
124+
- All modules have `@moduledoc` and `@doc` documentation
37125
- Code passes Dialyzer static analysis
38126
- Code passes Credo quality checks
39-
- Code is formatted with `mix format`
127+
- Code formatted with `mix format`
128+
- All repairs logged with detailed context
129+
130+
## Test Data Categories
131+
1. **Syntax fixes** (95% success rate): unquoted keys, wrong booleans, trailing commas
132+
2. **Structural repairs** (85% success rate): missing delimiters, nesting issues
133+
3. **Content cleaning** (98% success rate): code fences, comments, wrappers
134+
4. **Complex scenarios** (75% success rate): multiple issues combined
135+
5. **Edge cases** (50% success rate): severely malformed (graceful failure OK)
136+
137+
## Next Steps
138+
1.**COMPLETED**: Layer 1 Content Cleaning with TDD (14/14 tests passing)
139+
2. 🟡 **NEXT**: Begin Layer 2 Structural Repair with state machine approach
140+
3. Create test fixtures for comprehensive scenarios
141+
4. Build context-aware syntax normalization for Layer 3
142+
5. Add performance benchmarking
143+
6. Create comprehensive integration tests
40144

41-
## Notes
42-
- This is an Elixir project using Mix build tool
43-
- Focus on JSON parsing and error recovery functionality
44-
- Binary pattern matching approach provides superior performance
45-
- Type specifications ensure reliability and enable static analysis
145+
## Important Reminders
146+
- **Test-first approach**: Write failing tests before implementation
147+
- **Context preservation**: Never break valid JSON content inside strings
148+
- **Performance targets**: Maintain fast path for valid JSON
149+
- **Pragmatic over pure**: Use the right tool for each layer
150+
- **Comprehensive logging**: Track all repair actions for debugging

0 commit comments

Comments
 (0)