Skip to content

Commit 9d51861

Browse files
nshkrdotcomclaude
authored andcommitted
Release v0.1.4: Add hardcoded patterns from json_repair Python library
This release integrates battle-tested cleanup patterns ported from the Python json_repair library into JsonRemedy's Layer 3 processing pipeline. ## New Features ### Hardcoded Patterns Module (lib/json_remedy/layer3/hardcoded_patterns.ex) - Smart quotes normalization: Converts ""curly"", «guillemets», ‹angles› to standard quotes - Doubled quotes repair: Fixes ""value"" → "value" while preserving empty strings - Number format normalization: Removes thousands separators (1,234,567 → 1234567) - Unicode/hex escape sequences: Converts \uXXXX and \xXX (opt-in via config) ### Feature Flags - :enable_hardcoded_patterns (default: true) - Master switch for all patterns - :enable_escape_normalization (default: false) - Opt-in for escape sequences ### Examples & Documentation - New examples/hardcoded_patterns_examples.exs with 8 comprehensive examples - README.md updated with dedicated hardcoded patterns section - Full attribution to source library: https://github.com/mangiucugna/json_repair ## Testing & Quality - 47 new tests (100% pass rate) - Total test suite: 499 tests, 0 failures - Dialyzer: 0 type warnings - Credo: 0 issues in new code - Full UTF-8 international character support ## Technical Details - Context-aware processing preserves commas in string values - Smart quotes use Unicode byte sequences to avoid syntax conflicts - Regex-based optimizations with minimal overhead - Clean integration as Layer 3 pre-processing step ## Files Changed Modified: - .gitignore (added json_repair_python/) - CHANGELOG.md (comprehensive v0.1.4 entry) - README.md (version + hardcoded patterns section) - lib/json_remedy/layer3/syntax_normalization.ex (integration) - mix.exs (version 0.1.4) New: - lib/json_remedy/layer3/hardcoded_patterns.ex (250 lines) - test/unit/layer3_hardcoded_patterns_test.exs (310 lines, 47 tests) - examples/hardcoded_patterns_examples.exs (318 lines, 8 examples) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent f4fef83 commit 9d51861

File tree

8 files changed

+1048
-7
lines changed

8 files changed

+1048
-7
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ erl_crash.dump
1010
.elixir_ls/
1111
*.tar
1212
repomix*.xml
13+
json_repair_python/

CHANGELOG.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,44 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.1.4] - 2025-10-07
11+
12+
### Added
13+
- **Hardcoded Patterns Module**: New `JsonRemedy.Layer3.HardcodedPatterns` module with battle-tested cleanup patterns ported from Python's `json_repair` library
14+
- **Smart quotes normalization**: Converts curly quotes (`""`), guillemets (`«»`), and angle quotes (`‹›`) to standard JSON quotes
15+
- **Doubled quotes repair**: Fixes `""value""``"value"` while preserving empty strings `""`
16+
- **Number format normalization**: Removes thousands separators from numbers: `1,234,567``1234567`
17+
- **Unicode escape sequences**: Converts `\u263a``` (opt-in via `:enable_escape_normalization`)
18+
- **Hex escape sequences**: Converts `\x41``A` (opt-in via `:enable_escape_normalization`)
19+
- **Comprehensive examples**: New `examples/hardcoded_patterns_examples.exs` with 8 detailed examples demonstrating:
20+
- Smart quotes with international text (French, Japanese, German)
21+
- Doubled quotes edge cases
22+
- Number format cleaning while preserving string content
23+
- Unicode/hex escape handling
24+
- Combined patterns for real-world LLM output
25+
- Full Layer 3 pipeline integration
26+
- **Feature flags**: Configurable pattern processing with safe defaults
27+
- `:enable_hardcoded_patterns` (default: `true`)
28+
- `:enable_escape_normalization` (default: `false` for safety)
29+
30+
### Enhanced
31+
- **Layer 3 integration**: Hardcoded patterns run as pre-processing step before main syntax normalization
32+
- **Context-aware processing**: Number format normalization preserves commas in string values
33+
- **International support**: Full UTF-8 support with smart quotes from multiple languages
34+
- **Documentation**: README updated with dedicated hardcoded patterns section and attribution to source library
35+
36+
### Technical Details
37+
- **Test coverage**: 47 new tests for hardcoded patterns (100% pass rate)
38+
- **Source attribution**: Patterns ported from [json_repair](https://github.com/mangiucugna/json_repair) by Stefano Baccianella
39+
- **Architecture**: Cleanly organized as Layer 3 subsection with proper separation of concerns
40+
- **Type safety**: Full Dialyzer compliance with zero warnings
41+
- **Performance**: Regex-based optimizations with minimal overhead
42+
43+
### Performance
44+
- **Test suite**: 499 total tests, 0 failures (added 47 new tests)
45+
- **Zero regressions**: All existing functionality preserved
46+
- **Efficient processing**: Smart quotes and number normalization use optimized regex patterns
47+
1048
## [0.1.3] - 2025-07-05
1149

1250
### Fixed
@@ -117,6 +155,9 @@ This is a **100% rewrite** - all previous code has been replaced with the new la
117155
- Minimal memory overhead (< 8KB for repairs)
118156
- All operations pass performance thresholds
119157

120-
[Unreleased]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.1...HEAD
158+
[Unreleased]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.4...HEAD
159+
[0.1.4]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.3...v0.1.4
160+
[0.1.3]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.2...v0.1.3
161+
[0.1.2]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.1...v0.1.2
121162
[0.1.1]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.0...v0.1.1
122163
[0.1.0]: https://github.com/nshkrdotcom/json_remedy/releases/tag/v0.1.0

README.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,17 @@ Standard JSON parsers fail completely on these inputs. JsonRemedy fixes them int
7979
- **Unescaped quotes**: `"text "quoted" text"` → proper escaping
8080
- **Trailing backslashes**: Streaming artifact cleanup
8181

82+
#### 🔧 **Hardcoded Patterns** *(ported from [json_repair](https://github.com/mangiucugna/json_repair) Python library)*
83+
Layer 3 includes battle-tested cleanup patterns for edge cases commonly found in LLM output:
84+
85+
- **Smart quotes**: `"curly"`, `«guillemets»`, `‹angles›``"standard"`
86+
- **Doubled quotes**: `""value""``"value"` (preserves empty strings `""`)
87+
- **Number formats**: `1,234,567``1234567` (removes thousands separators)
88+
- **Unicode escapes**: `\u263a``` (opt-in via `:enable_escape_normalization`)
89+
- **Hex escapes**: `\x41``A` (opt-in via `:enable_escape_normalization`)
90+
91+
These patterns run as a pre-processing step and can be controlled via feature flags. See `examples/hardcoded_patterns_examples.exs` for demonstrations.
92+
8293
### 🚀 **Fast Path Validation (Layer 4)**
8394
- **Jason.decode optimization**: Valid JSON uses battle-tested parser
8495
- **Performance monitoring**: Automatic fallback for complex repairs
@@ -128,7 +139,7 @@ Add JsonRemedy to your `mix.exs`:
128139
```elixir
129140
def deps do
130141
[
131-
{:json_remedy, "~> 0.1.3"}
142+
{:json_remedy, "~> 0.1.4"}
132143
]
133144
end
134145
```
@@ -228,11 +239,23 @@ mix run examples/basic_usage.exs
228239
```
229240
Learn the fundamentals with step-by-step examples:
230241
- Fixing unquoted keys
231-
- Normalizing quote styles
242+
- Normalizing quote styles
232243
- Handling boolean/null variants
233244
- Repairing structural issues
234245
- Processing LLM outputs
235246

247+
### 🔧 **Hardcoded Patterns Examples***NEW*
248+
```bash
249+
mix run examples/hardcoded_patterns_examples.exs
250+
```
251+
Demonstrates advanced cleanup patterns ported from Python's `json_repair` library:
252+
- **Smart quotes normalization**: Curly quotes, guillemets, angle quotes
253+
- **Doubled quotes repair**: `""value""``"value"`
254+
- **Number format cleaning**: `1,234,567``1234567`
255+
- **Unicode/hex escapes**: `\u263a```, `\x41``A`
256+
- **International text**: UTF-8 support with smart quotes
257+
- **Combined patterns**: Real-world LLM output examples
258+
236259
### 🌍 **Real-World Scenarios**
237260
```bash
238261
mix run examples/real_world_scenarios.exs

0 commit comments

Comments
 (0)