Skip to content

Commit 7a2b07a

Browse files
committed
Release v0.1.8
1 parent 1aa60aa commit 7a2b07a

File tree

12 files changed

+284
-36
lines changed

12 files changed

+284
-36
lines changed

.github/workflows/elixir.yaml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,3 +150,32 @@ jobs:
150150
# Step: Execute the tests.
151151
- name: Run tests
152152
run: mix test
153+
154+
windows_test:
155+
runs-on: windows-latest
156+
name: Windows Test on OTP ${{matrix.otp}} / Elixir ${{matrix.elixir}}
157+
strategy:
158+
matrix:
159+
otp: ['27.3.3']
160+
elixir: ['1.18.3']
161+
defaults:
162+
run:
163+
shell: pwsh
164+
steps:
165+
- name: Checkout code
166+
uses: actions/checkout@v3
167+
168+
- name: Set up Elixir
169+
uses: erlef/setup-beam@v1
170+
with:
171+
otp-version: ${{matrix.otp}}
172+
elixir-version: ${{matrix.elixir}}
173+
174+
- name: Install dependencies
175+
run: mix deps.get
176+
177+
- name: Compile without warnings
178+
run: mix compile --warnings-as-errors
179+
180+
- name: Run tests
181+
run: mix test

CHANGELOG.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.1.8] - 2025-10-27
11+
12+
### Added
13+
- **HTML metadata**: `HtmlHandlers.extract_html_content/2` now returns byte counts alongside grapheme totals so every caller gets accurate offsets without recomputing (#6).
14+
- **Examples**: New `html_metadata_examples.exs` and `windows_ci_examples.exs` scripts document the metadata workflow and Windows CI parity (#6, #7).
15+
16+
### Fixed
17+
- **Syntax normalization**: Layer 3 consumes the shared metadata directly, removing duplicated byte math around HTML quoting (#6).
18+
19+
### CI
20+
- **Windows coverage**: Introduced a `windows-latest` PowerShell job to the GitHub Actions matrix to run `mix deps.get`, `mix compile --warnings-as-errors`, and `mix test`, ensuring CRLF regressions are caught ahead of releases (#7).
21+
1022
## [0.1.7] - 2025-10-27
1123

1224
### Fixed
@@ -328,7 +340,8 @@ This is a **100% rewrite** - all previous code has been replaced with the new la
328340
- Minimal memory overhead (< 8KB for repairs)
329341
- All operations pass performance thresholds
330342

331-
[Unreleased]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.7...HEAD
343+
[Unreleased]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.8...HEAD
344+
[0.1.8]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.7...v0.1.8
332345
[0.1.7]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.6...v0.1.7
333346
[0.1.6]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.5...v0.1.6
334347
[0.1.5]: https://github.com/nshkrdotcom/json_remedy/compare/v0.1.4...v0.1.5

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ Runs **before** the main layer pipeline to handle complex patterns that would ot
9797
- DOCTYPE declarations, comments, void elements
9898
- Self-closing tags, nested structures
9999
- Proper escaping of quotes, newlines, special chars
100+
- Byte + grapheme metadata for consumer slices *(v0.1.8)*
100101

101102
#### 🔧 **Hardcoded Patterns** *(ported from [json_repair](https://github.com/mangiucugna/json_repair) Python library)*
102103
Layer 3 includes battle-tested cleanup patterns for edge cases commonly found in LLM output:
@@ -158,7 +159,7 @@ Add JsonRemedy to your `mix.exs`:
158159
```elixir
159160
def deps do
160161
[
161-
{:json_remedy, "~> 0.1.7"}
162+
{:json_remedy, "~> 0.1.8"}
162163
]
163164
end
164165
```
@@ -302,6 +303,15 @@ Demonstrates handling of unquoted HTML content in JSON values (common when APIs
302303

303304
This example showcases the HTML detection and quoting capabilities added in v0.1.5, which handle real-world scenarios where API endpoints return HTML error pages instead of JSON.
304305

306+
### 🧮 **HTML Metadata Examples***NEW in v0.1.8*
307+
```bash
308+
mix run examples/html_metadata_examples.exs
309+
```
310+
Inspect the metadata returned when quoting HTML fragments:
311+
- Grapheme vs byte counts for emoji-rich HTML bodies
312+
- Byte-accurate offsets from non-zero starting positions
313+
- Guidance for integrating the metadata into downstream pipelines
314+
305315
### 🌍 **Real-World Scenarios**
306316
```bash
307317
mix run examples/real_world_scenarios.exs
@@ -336,6 +346,15 @@ Verify reliability under load:
336346
- Large array processing
337347
- Memory usage stability
338348

349+
### 🪟 **Windows CI Examples***NEW in v0.1.8*
350+
```bash
351+
mix run examples/windows_ci_examples.exs
352+
```
353+
Validate the cross-platform pipeline:
354+
- Confirms the GitHub Actions workflow includes a Windows runner
355+
- Lists the PowerShell commands executed in CI
356+
- Helps contributors mirror the job locally when debugging CRLF issues
357+
339358
### 📊 **Example Output**
340359

341360
Here's what you'll see when running the real-world scenarios:
@@ -1012,6 +1031,9 @@ mix dialyzer # Type analysis
10121031
mix format --check-formatted # Code formatting
10131032
mix test.coverage # Coverage analysis
10141033

1034+
# Windows CI parity (PowerShell)
1035+
mix run examples/windows_ci_examples.exs
1036+
10151037
# Benchmarking
10161038
mix run bench/comprehensive_benchmark.exs
10171039
mix run bench/memory_profile.exs
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# HTML Metadata Examples for JsonRemedy
2+
#
3+
# Demonstrates the new metadata returned by HtmlHandlers.extract_html_content/2
4+
# so that consumers can rely on both grapheme and byte measurements.
5+
#
6+
# Run with: mix run examples/html_metadata_examples.exs
7+
8+
defmodule HtmlMetadataExamples do
9+
@moduledoc """
10+
Shows how to inspect the grapheme and byte counts returned when JsonRemedy
11+
wraps unquoted HTML fragments. Useful when integrating with systems that need
12+
byte-accurate slicing (e.g., Windows CRLF payloads or emoji-rich HTML).
13+
"""
14+
15+
alias JsonRemedy.Layer3.HtmlHandlers
16+
17+
def run_all_examples do
18+
IO.puts("=== JsonRemedy HTML Metadata Examples ===\n")
19+
20+
example_1_multibyte_html_metadata()
21+
example_2_offset_html_metadata()
22+
23+
IO.puts("\n=== Completed HTML metadata examples! ===")
24+
end
25+
26+
defp example_1_multibyte_html_metadata do
27+
IO.puts("Example 1: Metadata for multi-byte HTML fragment")
28+
IO.puts("==============================================")
29+
30+
fragment = "<div>café 🚀</div>,\"next\""
31+
32+
{html, graphemes, bytes} = HtmlHandlers.extract_html_content(fragment, 0)
33+
34+
IO.puts("Extracted HTML: #{html}")
35+
IO.puts("Graphemes consumed: #{graphemes}")
36+
IO.puts("Bytes consumed: #{bytes}")
37+
IO.puts("Byte/Grapheme delta: #{bytes - graphemes}")
38+
IO.puts("\n" <> String.duplicate("-", 80) <> "\n")
39+
end
40+
41+
defp example_2_offset_html_metadata do
42+
IO.puts("Example 2: Metadata from non-zero starting offset")
43+
IO.puts("===============================================")
44+
45+
payload = ~s({"body":<section data-info="café 🚀">Line</section>,"ok":true})
46+
start_position = String.length(~s({"body":))
47+
48+
{html, graphemes, bytes} = HtmlHandlers.extract_html_content(payload, start_position)
49+
50+
IO.puts("Extracted HTML: #{html}")
51+
IO.puts("Graphemes consumed from offset #{start_position}: #{graphemes}")
52+
IO.puts("Bytes consumed from offset #{start_position}: #{bytes}")
53+
IO.puts("\n" <> String.duplicate("-", 80) <> "\n")
54+
end
55+
end
56+
57+
HtmlMetadataExamples.run_all_examples()

examples/windows_ci_examples.exs

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Windows CI Examples for JsonRemedy
2+
#
3+
# Highlights the new Windows job in the GitHub Actions matrix and how to
4+
# reproduce the steps locally on a Windows machine.
5+
#
6+
# Run with: mix run examples/windows_ci_examples.exs
7+
8+
defmodule WindowsCIExamples do
9+
@moduledoc """
10+
Useful helpers for verifying JsonRemedy's Windows CI coverage:
11+
- Confirms the workflow includes a `windows-latest` runner with PowerShell.
12+
- Prints the exact commands the CI executes so contributors can mirror them.
13+
"""
14+
15+
@workflow_path ".github/workflows/elixir.yaml"
16+
17+
def run_all_examples do
18+
IO.puts("=== JsonRemedy Windows CI Examples ===\n")
19+
20+
example_1_verify_windows_job()
21+
example_2_windows_reproduction_steps()
22+
23+
IO.puts("\n=== Finished Windows CI examples! ===")
24+
end
25+
26+
defp example_1_verify_windows_job do
27+
IO.puts("Example 1: Verify Windows job exists in CI workflow")
28+
IO.puts("===================================================")
29+
30+
workflow = File.read!(@workflow_path)
31+
windows_job_present = String.contains?(workflow, "runs-on: windows-latest")
32+
uses_pwsh = String.contains?(workflow, "shell: pwsh")
33+
34+
IO.puts("Windows runner configured? #{windows_job_present}")
35+
IO.puts("PowerShell shell configured? #{uses_pwsh}")
36+
IO.puts("\n" <> String.duplicate("-", 80) <> "\n")
37+
end
38+
39+
defp example_2_windows_reproduction_steps do
40+
IO.puts("Example 2: Commands executed on the Windows runner")
41+
IO.puts("==================================================")
42+
43+
commands = [
44+
"mix deps.get",
45+
"mix compile --warnings-as-errors",
46+
"mix test"
47+
]
48+
49+
Enum.each(commands, fn command ->
50+
IO.puts("pwsh> #{command}")
51+
end)
52+
53+
IO.puts(
54+
"\nTip: Run the commands above inside a PowerShell session after installing Elixir via asdf or the official installer."
55+
)
56+
57+
IO.puts("\n" <> String.duplicate("-", 80) <> "\n")
58+
end
59+
end
60+
61+
WindowsCIExamples.run_all_examples()

lib/json_remedy/layer3/character_parsers.ex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,8 @@ defmodule JsonRemedy.Layer3.CharacterParsers do
233233
char == "<" and state.expecting == :value and
234234
HtmlHandlers.is_html_start?(content, state.position) ->
235235
# Start of HTML content - quote it
236-
{html_iolist, chars_consumed, repairs} = HtmlHandlers.process_html_iolist(content, state)
236+
{html_iolist, chars_consumed, _bytes_consumed, repairs} =
237+
HtmlHandlers.process_html_iolist(content, state)
237238

238239
%{
239240
state
@@ -373,7 +374,8 @@ defmodule JsonRemedy.Layer3.CharacterParsers do
373374
char == "<" and state.expecting == :value and
374375
HtmlHandlers.is_html_start?(content, state.position) ->
375376
# Start of HTML content - quote it
376-
{html_string, chars_consumed, repairs} = HtmlHandlers.process_html_string(content, state)
377+
{html_string, chars_consumed, _bytes_consumed, repairs} =
378+
HtmlHandlers.process_html_string(content, state)
377379

378380
%{
379381
state

lib/json_remedy/layer3/html_handlers.ex

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -37,18 +37,30 @@ defmodule JsonRemedy.Layer3.HtmlHandlers do
3737

3838
@doc """
3939
Extract HTML content starting from position until we hit a JSON structural delimiter.
40-
Returns {html_content, chars_consumed}.
40+
Returns {html_content, chars_consumed, bytes_consumed}.
4141
4242
Strategy: Track HTML tag depth. Only stop at JSON delimiters when:
4343
- We're at HTML tag depth 0 (all tags closed)
4444
- We're at JSON depth 0 (no nested JSON-like braces)
4545
- We're not inside an HTML tag marker (between < and >)
4646
"""
47-
@spec extract_html_content(String.t(), non_neg_integer()) :: {String.t(), non_neg_integer()}
47+
@spec extract_html_content(String.t(), non_neg_integer()) ::
48+
{String.t(), non_neg_integer(), non_neg_integer()}
4849
def extract_html_content(content, start_position) do
4950
extract_html_content_recursive(content, start_position, start_position, 0, 0, false)
5051
end
5152

53+
defp finalize_html_result(content, start_pos, stop_pos) do
54+
length = stop_pos - start_pos
55+
raw_html = String.slice(content, start_pos, length)
56+
trimmed_html = String.trim_trailing(raw_html)
57+
58+
chars_consumed = max(length, 0)
59+
bytes_consumed = if length <= 0, do: 0, else: byte_size(raw_html)
60+
61+
{trimmed_html, chars_consumed, bytes_consumed}
62+
end
63+
5264
# Helper to find end of HTML comment
5365
defp find_comment_end(content, start_pos) do
5466
case :binary.match(content, "-->", scope: {start_pos, byte_size(content) - start_pos}) do
@@ -96,11 +108,11 @@ defmodule JsonRemedy.Layer3.HtmlHandlers do
96108
html_depth,
97109
inside_tag_marker
98110
) do
99-
if current_pos >= String.length(content) do
111+
content_length = String.length(content)
112+
113+
if current_pos >= content_length do
100114
# Reached end of content
101-
html = String.slice(content, start_pos..-1//1)
102-
chars = String.length(html)
103-
{String.trim_trailing(html), chars}
115+
finalize_html_result(content, start_pos, content_length)
104116
else
105117
char = String.at(content, current_pos)
106118

@@ -181,9 +193,7 @@ defmodule JsonRemedy.Layer3.HtmlHandlers do
181193

182194
# Stop at JSON delimiters ONLY when all HTML tags are closed
183195
char in [",", "}", "]"] and json_depth == 0 and html_depth <= 0 and not inside_tag_marker ->
184-
html = String.slice(content, start_pos..(current_pos - 1))
185-
chars = current_pos - start_pos
186-
{String.trim_trailing(html), chars}
196+
finalize_html_result(content, start_pos, current_pos)
187197

188198
# Track JSON-like depth (for data attributes with JSON)
189199
char == "{" and not inside_tag_marker ->
@@ -275,25 +285,27 @@ defmodule JsonRemedy.Layer3.HtmlHandlers do
275285

276286
@doc """
277287
Process HTML content for IO list version.
278-
Returns {html_iolist, chars_consumed, repairs}.
288+
Returns {html_iolist, chars_consumed, bytes_consumed, repairs}.
279289
"""
280-
@spec process_html_iolist(String.t(), map()) :: {iodata(), non_neg_integer(), list()}
290+
@spec process_html_iolist(String.t(), map()) ::
291+
{iodata(), non_neg_integer(), non_neg_integer(), list()}
281292
def process_html_iolist(content, state) do
282-
{html, chars_consumed} = extract_html_content(content, state.position)
293+
{html, chars_consumed, bytes_consumed} = extract_html_content(content, state.position)
283294
{quoted_html, repairs} = quote_html_content(html, state.position)
284295

285-
{quoted_html, chars_consumed, repairs}
296+
{quoted_html, chars_consumed, bytes_consumed, repairs}
286297
end
287298

288299
@doc """
289300
Process HTML content for regular string version.
290-
Returns {html_string, chars_consumed, repairs}.
301+
Returns {html_string, chars_consumed, bytes_consumed, repairs}.
291302
"""
292-
@spec process_html_string(String.t(), map()) :: {String.t(), non_neg_integer(), list()}
303+
@spec process_html_string(String.t(), map()) ::
304+
{String.t(), non_neg_integer(), non_neg_integer(), list()}
293305
def process_html_string(content, state) do
294-
{html, chars_consumed} = extract_html_content(content, state.position)
306+
{html, chars_consumed, bytes_consumed} = extract_html_content(content, state.position)
295307
{quoted_html, repairs} = quote_html_content(html, state.position)
296308

297-
{quoted_html, chars_consumed, repairs}
309+
{quoted_html, chars_consumed, bytes_consumed, repairs}
298310
end
299311
end

lib/json_remedy/layer3/syntax_normalization.ex

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -806,25 +806,18 @@ defmodule JsonRemedy.Layer3.SyntaxNormalization do
806806
HtmlHandlers.is_html_start?(<<char::utf8, rest::binary>>, 0) ->
807807
# Start of HTML content - quote it
808808
fragment = <<char::utf8, rest::binary>>
809-
{html_content, chars_consumed} = HtmlHandlers.extract_html_content(fragment, 0)
809+
810+
{html_content, chars_consumed, bytes_consumed} =
811+
HtmlHandlers.extract_html_content(fragment, 0)
810812

811813
{quoted_html, html_repairs} = HtmlHandlers.quote_html_content(html_content, pos)
812814

813-
consumed_fragment = String.slice(fragment, 0, chars_consumed)
814-
bytes_consumed = byte_size(consumed_fragment)
815-
bytes_for_rest = max(bytes_consumed - byte_size(<<char::utf8>>), 0)
815+
fragment_size = byte_size(fragment)
816816

817817
remaining =
818-
if bytes_for_rest <= 0 do
819-
rest
820-
else
821-
rest_size = byte_size(rest)
822-
823-
if bytes_for_rest >= rest_size do
824-
<<>>
825-
else
826-
binary_part(rest, bytes_for_rest, rest_size - bytes_for_rest)
827-
end
818+
cond do
819+
bytes_consumed >= fragment_size -> <<>>
820+
true -> binary_part(fragment, bytes_consumed, fragment_size - bytes_consumed)
828821
end
829822

830823
normalize_syntax_binary_simple(

mix.exs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
defmodule JsonRemedy.MixProject do
22
use Mix.Project
33

4-
@version "0.1.7"
4+
@version "0.1.8"
55
@source_url "https://github.com/nshkrdotcom/json_remedy"
66

77
def project do

0 commit comments

Comments
 (0)