Skip to content

Commit eaaf3d2

Browse files
rolfedhclaude
andcommitted
feat: EntityReference rule now respects AsciiDoc subs attributes
- Check code block subs attribute to determine if entities should be processed - Only fix entities in code blocks when subs includes "replacements" - Respect subs="none", subs="attributes+", subs="normal" etc. - Add comprehensive tests for different subs scenarios - Add documentation explaining the behavior This ensures that: - Code examples preserve literal entities by default - Users can opt-in to entity processing with subs="replacements" - Aditi follows AsciiDoc's substitution model correctly Related to upstream issue jhradilek/asciidoctor-dita-vale#98 Addresses #13 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 9aa6c5c commit eaaf3d2

File tree

4 files changed

+455
-17
lines changed

4 files changed

+455
-17
lines changed

CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -308,25 +308,25 @@ A comprehensive test suite prevents Jekyll deployment failures:
308308
## Recent Development Focus (July 2025)
309309
310310
### Statistics
311-
- Total commits: 220
311+
- Total commits: 221
312312
313313
### Latest Achievements
314+
- ✅ Entityreference rule now respects asciidoc subs attributes.
314315
- ✅ Implement single-source versioning.
315316
- ✅ Add intermediate recheck step and fix accurate fix counting in journey workflow.
316317
- ✅ Add vale configuration and update asciidocdita styles for improved validation.
317318
- ✅ Reintroduce claude.md updater workflow with enhanced commit parsing.
318-
- ✅ Improve directory selection ui for better user experience.
319319
320320
### Development Focus
321321
- **Ci/Cd**: 89 commits
322322
- **Features**: 23 commits
323323
- **Bug Fixes**: 18 commits
324324
- **Documentation**: 17 commits
325-
- **Testing**: 12 commits
325+
- **Testing**: 13 commits
326326
327327
### Most Active Files
328328
- `docs/_data/recent_commits.yml`: 85 changes
329-
- `CLAUDE.md`: 63 changes
329+
- `CLAUDE.md`: 64 changes
330330
- `src/aditi/commands/journey.py`: 19 changes
331331
<!-- /AUTO-GENERATED:RECENT -->
332332

docs/ENTITY_REFERENCE_HANDLING.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# EntityReference Rule - Code Block Handling
2+
3+
This document explains how Aditi's EntityReference rule handles HTML entities in different contexts, particularly in code blocks.
4+
5+
## Overview
6+
7+
The EntityReference rule converts unsupported HTML entities (like `&nbsp;`, `&copy;`, etc.) to DITA-compatible AsciiDoc attributes (like `{nbsp}`, `{copy}`). However, entities in code contexts should often remain literal.
8+
9+
## Code Block Behavior
10+
11+
### Default Behavior
12+
13+
By default, entities in code blocks are **NOT** converted:
14+
15+
```asciidoc
16+
[source,html]
17+
----
18+
<p>Hello&nbsp;World</p> <!-- &nbsp; remains literal -->
19+
----
20+
```
21+
22+
### With Substitutions Enabled
23+
24+
When the `subs` attribute includes `replacements`, entities **ARE** converted:
25+
26+
```asciidoc
27+
[source,html,subs="replacements"]
28+
----
29+
<p>Hello&nbsp;World</p> <!-- &nbsp; becomes {nbsp} -->
30+
----
31+
```
32+
33+
### Common Substitution Patterns
34+
35+
1. **`subs="attributes+"`** - Only processes attribute references, NOT entities:
36+
```asciidoc
37+
[source,terminal,subs="attributes+"]
38+
----
39+
echo "Version {version}" # {version} is replaced
40+
echo "Hello&nbsp;World" # &nbsp; remains literal
41+
----
42+
```
43+
44+
2. **`subs="attributes+,replacements"`** - Processes both:
45+
```asciidoc
46+
[source,html,subs="attributes+,replacements"]
47+
----
48+
<p>{product}&nbsp;v{version}</p> # Both {product} and &nbsp; are processed
49+
----
50+
```
51+
52+
3. **`subs="normal"`** - All normal substitutions including replacements:
53+
```asciidoc
54+
[listing,subs="normal"]
55+
----
56+
Text with &copy; symbol # &copy; becomes {copy}
57+
----
58+
```
59+
60+
4. **`subs="none"`** - No substitutions at all:
61+
```asciidoc
62+
[source,html,subs="none"]
63+
----
64+
<p>&trade; {version}</p> # Nothing is processed
65+
----
66+
```
67+
68+
## Inline Code
69+
70+
Entities in inline code (backticks) are **NEVER** converted:
71+
72+
```asciidoc
73+
Use the `&nbsp;` entity in HTML. # &nbsp; remains literal
74+
```
75+
76+
## Why This Matters
77+
78+
This behavior ensures that:
79+
1. Code examples remain accurate and don't have their entities converted
80+
2. When you DO want entities processed in code blocks (e.g., for documentation), you can enable it with `subs="replacements"`
81+
3. Aditi respects AsciiDoc's substitution model
82+
83+
## Known Limitations
84+
85+
- Nested code blocks are not fully supported. The outer block's settings may affect inner blocks.
86+
- Complex substitution patterns (like conditional processing) follow AsciiDoc's standard rules.
87+
88+
## Related Issues
89+
90+
- [Vale Issue #98](https://github.com/jhradilek/asciidoctor-dita-vale/issues/98) - Vale incorrectly flags entities in code blocks
91+
- [Aditi Issue #13](https://github.com/rolfedh/aditi/issues/13) - Aditi correctly handles these cases

src/aditi/rules/entity_reference.py

Lines changed: 184 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -127,23 +127,194 @@ def validate_fix(self, fix: Fix, file_content: str) -> bool:
127127
if not line_content:
128128
return False
129129

130-
# Check if we're inside a code block
131-
# Look for code block delimiters before the current line
132-
lines = file_content.splitlines()
133-
in_code_block = False
134-
for i in range(fix.violation.line - 1):
135-
line = lines[i].strip()
136-
if line == "----" or line == "....":
137-
in_code_block = not in_code_block
138-
139-
if in_code_block:
140-
return False
141-
142130
# Check if we're inside inline code
143131
# Count backticks before the entity position
144132
before_text = line_content[:fix.violation.column - 1]
145133
backtick_count = before_text.count("`")
146134
if backtick_count % 2 != 0: # Odd number means we're inside inline code
147135
return False
148136

149-
return True
137+
# Check if we're inside a code block and if replacements are enabled
138+
lines = file_content.splitlines()
139+
code_block_info = self._get_code_block_context(lines, fix.violation.line - 1)
140+
141+
if code_block_info['in_code_block']:
142+
# Check if replacements are enabled for this code block
143+
if code_block_info['replacements_enabled']:
144+
return True # Entities should be processed
145+
else:
146+
return False # Entities should remain literal
147+
148+
return True
149+
150+
def _get_code_block_context(self, lines: list, target_line_idx: int) -> dict:
151+
"""Determine if we're in a code block and check its substitution settings.
152+
153+
Args:
154+
lines: List of all lines in the document
155+
target_line_idx: Zero-based index of the target line
156+
157+
Returns:
158+
Dict with 'in_code_block' and 'replacements_enabled' flags
159+
"""
160+
in_code_block = False
161+
block_type = None
162+
block_start_line = -1
163+
subs_value = None
164+
pending_source_subs = None # Store subs from [source] line
165+
166+
for i in range(min(target_line_idx + 1, len(lines))):
167+
line = lines[i].strip()
168+
169+
# First check if this is a source attribute line
170+
if line.startswith("[source"):
171+
# Extract subs but don't mark as in block yet
172+
pending_source_subs = self._extract_subs_from_line(line)
173+
continue
174+
175+
# Check for listing/source block delimiters
176+
if line == "----":
177+
if not in_code_block:
178+
in_code_block = True
179+
block_type = "listing"
180+
block_start_line = i
181+
# Check if there was a source line just before
182+
if i > 0 and pending_source_subs is not None:
183+
subs_value = pending_source_subs
184+
block_type = "source"
185+
else:
186+
# Look for other attributes in previous lines
187+
subs_value = self._find_block_attributes(lines, i)
188+
pending_source_subs = None # Reset
189+
else:
190+
# Closing delimiter
191+
in_code_block = False
192+
block_type = None
193+
subs_value = None
194+
pending_source_subs = None
195+
elif line == "....":
196+
if not in_code_block:
197+
in_code_block = True
198+
block_type = "literal"
199+
block_start_line = i
200+
# Look for attributes in previous lines
201+
subs_value = self._find_block_attributes(lines, i)
202+
else:
203+
# Closing delimiter
204+
in_code_block = False
205+
block_type = None
206+
subs_value = None
207+
else:
208+
# Any other line resets pending source
209+
if line and not line.startswith("["):
210+
pending_source_subs = None
211+
212+
# Determine if replacements are enabled
213+
replacements_enabled = False
214+
if subs_value:
215+
# Parse subs value
216+
subs_list = self._parse_subs_value(subs_value)
217+
replacements_enabled = 'replacements' in subs_list
218+
219+
return {
220+
'in_code_block': in_code_block,
221+
'replacements_enabled': replacements_enabled,
222+
'block_type': block_type,
223+
'subs': subs_value
224+
}
225+
226+
def _find_block_attributes(self, lines: list, delimiter_idx: int) -> Optional[str]:
227+
"""Find block attributes that might contain subs setting.
228+
229+
Args:
230+
lines: List of all lines
231+
delimiter_idx: Index of the block delimiter line
232+
233+
Returns:
234+
The subs value if found, None otherwise
235+
"""
236+
# Look backwards for block attributes (up to 3 lines)
237+
for i in range(max(0, delimiter_idx - 3), delimiter_idx):
238+
line = lines[i].strip()
239+
# Check for [source,...] or [listing,...] style attributes
240+
if line.startswith('[') and line.endswith(']'):
241+
return self._extract_subs_from_line(line)
242+
return None
243+
244+
def _extract_subs_from_line(self, line: str) -> Optional[str]:
245+
"""Extract subs value from an attribute line.
246+
247+
Args:
248+
line: Line containing attributes like [source,java,subs="attributes+"]
249+
250+
Returns:
251+
The subs value if found, None otherwise
252+
"""
253+
import re
254+
255+
# Look for subs="value" or subs='value'
256+
match = re.search(r'subs\s*=\s*["\']([^"\']+)["\']', line)
257+
if match:
258+
return match.group(1)
259+
260+
# Look for subs=value (without quotes)
261+
match = re.search(r'subs\s*=\s*([^,\]]+)', line)
262+
if match:
263+
return match.group(1).strip()
264+
265+
return None
266+
267+
def _parse_subs_value(self, subs_value: str) -> list:
268+
"""Parse the subs attribute value into a list of substitutions.
269+
270+
Args:
271+
subs_value: Value like "attributes+", "replacements", "+replacements,-attributes"
272+
273+
Returns:
274+
List of active substitution types
275+
"""
276+
if not subs_value:
277+
return []
278+
279+
# Handle special values
280+
if subs_value == 'normal':
281+
# Normal substitutions
282+
return ['specialcharacters', 'quotes', 'attributes', 'replacements', 'macros', 'post_replacements']
283+
elif subs_value == 'none':
284+
return []
285+
elif subs_value == 'verbatim':
286+
return ['specialcharacters']
287+
288+
# For code blocks, default is no substitutions
289+
# We start with empty list and only add what's explicitly requested
290+
active_subs = []
291+
292+
# Parse comma-separated list with +/- modifiers
293+
parts = [p.strip() for p in subs_value.split(',')]
294+
295+
for part in parts:
296+
if not part:
297+
continue
298+
299+
# Check for trailing + which means "add to defaults"
300+
# For code blocks, default is empty, so "attributes+" just adds attributes
301+
if part.endswith('+') and not part.startswith('+'):
302+
sub_type = part[:-1] # Remove trailing +
303+
if sub_type and sub_type not in active_subs:
304+
active_subs.append(sub_type)
305+
elif part.startswith('+'):
306+
# Explicit add with +prefix
307+
sub_type = part[1:]
308+
if sub_type and sub_type not in active_subs:
309+
active_subs.append(sub_type)
310+
elif part.startswith('-'):
311+
# Remove from existing
312+
sub_type = part[1:]
313+
if sub_type in active_subs:
314+
active_subs.remove(sub_type)
315+
else:
316+
# No modifier - this replaces everything
317+
if part not in active_subs:
318+
active_subs.append(part)
319+
320+
return active_subs

0 commit comments

Comments
 (0)