Skip to content

Commit 3c29218

Browse files
authored
Merge pull request #161 from efargas/copilot/test-validacion-documentacion
fix(docs-validation): fix 6 bugs in validation scripts + add test fixtures and 42 new tests
2 parents dd82c02 + 0b047bb commit 3c29218

18 files changed

+971
-38
lines changed

docs/.test-fixtures/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Documentation Validation Test Fixtures
2+
3+
This directory contains markdown files used to test and validate the
4+
`scripts/validate-*.py` validation suite.
5+
6+
## Structure
7+
8+
| File | Purpose |
9+
|------|---------|
10+
| `valid-document.md` | Well-formed document: correct frontmatter, valid file refs, valid links |
11+
| `invalid-frontmatter.md` | Missing / malformed frontmatter fields |
12+
| `code-blocks.md` | Code blocks with escape chars, inline code, fenced fences |
13+
| `links.md` | Mix of valid, broken, external, anchor, and code-block links |
14+
| `file-references.md` | Mix of existing and missing file path references |
15+
16+
## Usage
17+
18+
```bash
19+
# Run frontmatter-specific tests
20+
pytest scripts/tests -k frontmatter -q
21+
22+
# Run the full validation test suite (covers all fixture scenarios)
23+
pytest scripts/tests -q
24+
```
25+
26+
> **Note:** This directory is excluded from the production validation runs
27+
> (`validate-frontmatter.py` and `validate-documentation.py` both skip `.test-fixtures/`
28+
> when scanning the main docs tree). Use the pytest suite above to exercise these
29+
> fixtures programmatically.

docs/.test-fixtures/code-blocks.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: "Code Blocks Test"
3+
version: "1.0.0"
4+
created: "2026-01-01"
5+
last-updated: "2026-03-24"
6+
status: "current"
7+
tags: ["test", "code-blocks", "escape-chars"]
8+
---
9+
10+
# Code Blocks Test
11+
12+
This document exercises escape-character and code-block edge cases that
13+
the validation scripts must handle without false positives.
14+
15+
## Fenced Code Blocks — No False Links
16+
17+
The VB.NET snippet below calls `_random.[Next](0, count)`, which looks like a
18+
markdown link `](0, count)` if the code fence is not properly detected.
19+
20+
```vbnet
21+
Private Function GenerateIndex() As Integer
22+
Dim index As Integer = _random.[Next](0, _eventNames.Count)
23+
Return _messages(_random.[Next](0, _messages.Count))
24+
End Function
25+
```
26+
27+
A C# snippet with square-bracket indexers:
28+
29+
```csharp
30+
var result = myList[index].ToString();
31+
var nested = dict["key"][0];
32+
int[] arr = new int[items.Count];
33+
```
34+
35+
The paths below are inside a fenced block and must **not** be extracted as
36+
file references (they are hypothetical examples, not real paths):
37+
38+
```text
39+
src/Does/Not/Exist.cs
40+
docs/imaginary/file.md
41+
tests/NonExistent/Test.cs
42+
```
43+
44+
## Inline Code — Safe Backtick Paths
45+
46+
These inline code paths **should** be extracted because they appear in prose:
47+
48+
- `src/S7Tools/Services/Profiles/StandardProfileManager.cs`
49+
50+
## Escaped Markdown Characters
51+
52+
The following use backslash escapes and must not trigger validators:
53+
54+
\[this is not a link\](not-a-target.md)
55+
\`not inline code\`
56+
57+
## Mixed Fences
58+
59+
Opening fence with extra backticks should be matched correctly:
60+
61+
````markdown
62+
This is a code block with four backticks.
63+
](docs/fake-link.md)
64+
src/Fake/File.cs
65+
````
66+
67+
## Tildes as Fences
68+
69+
~~~python
70+
def method(self, arg):
71+
return self.items[index](0, len(self.items))
72+
~~~
73+
74+
## Nested Code in Lists
75+
76+
- Item with `src/S7Tools/Services/Tasking/ResourceCoordinator.cs` inline ref.
77+
- Item without any code.
78+
79+
## Links Inside Code Spans
80+
81+
The text `[not a link](not-a-file.md)` written inside backticks should NOT
82+
produce a broken-link error.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
title: "File References Test"
3+
version: "1.0.0"
4+
created: "2026-01-01"
5+
last-updated: "2026-03-24"
6+
status: "current"
7+
tags: ["test", "file-references"]
8+
---
9+
10+
# File References Test
11+
12+
This document exercises file-reference extraction and resolution.
13+
14+
## Existing References (should all resolve)
15+
16+
Inline backtick paths that exist in the repository:
17+
18+
- `src/S7Tools/Services/Profiles/StandardProfileManager.cs`
19+
- `src/S7Tools.Core/Interfaces/Services/IProfileManager.cs`
20+
- `src/S7Tools.Core/Interfaces/Services/IProfileBase.cs`
21+
- `src/S7Tools/Services/Socat/SocatService.cs`
22+
- `src/S7Tools/Services/PowerSupply/PowerSupplyService.cs`
23+
- `src/S7Tools/Services/Tasking/ResourceCoordinator.cs`
24+
25+
## Paths Inside Code Blocks (must NOT be extracted as file references)
26+
27+
The paths inside the fenced block below are hypothetical examples and should
28+
not be counted as file references by the validator:
29+
30+
```text
31+
src/Does/Not/Exist.cs
32+
docs/imaginary-file.md
33+
tests/Fake/FakeTest.cs
34+
src/S7Tools/Hypothetical/MyService.cs
35+
```
36+
37+
## Relative Markdown Links
38+
39+
Link to sibling: [valid document](valid-document.md)
40+
41+
## Path in Markdown Link Syntax
42+
43+
[StandardProfileManager](../../src/S7Tools/Services/Profiles/StandardProfileManager.cs)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
title: "Invalid Frontmatter Test"
3+
version: "not-semver"
4+
created: "01/01/2026"
5+
status: "unknown-status"
6+
tags: []
7+
---
8+
9+
# Invalid Frontmatter Test
10+
11+
This document deliberately contains frontmatter violations so the validator
12+
can be tested against known-bad input.
13+
14+
## Expected Violations
15+
16+
| Rule | Field | Problem |
17+
|------|-------|---------|
18+
| META-002 | version | `not-semver` is not valid semver (expects `X.Y.Z`) |
19+
| META-003 | created | `01/01/2026` is not `YYYY-MM-DD` format |
20+
| META-001 | last-updated | Field is missing entirely |
21+
| META-004 | status | `unknown-status` is not a recognised status value |
22+
| META-005 | tags | Empty array is not allowed |

docs/.test-fixtures/links.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: "Links Test"
3+
version: "1.0.0"
4+
created: "2026-01-01"
5+
last-updated: "2026-03-24"
6+
status: "current"
7+
tags: ["test", "links"]
8+
---
9+
10+
# Links Test
11+
12+
This document exercises link-extraction edge cases for the link validator.
13+
14+
## Valid Internal Links
15+
16+
- Sibling: [valid-document](valid-document.md)
17+
- Another sibling: [code-blocks](code-blocks.md)
18+
19+
## Anchor-Only Links (must not trigger broken-link errors)
20+
21+
- [Jump to section](#valid-internal-links)
22+
- [Another anchor](#code-snippets-in-links)
23+
24+
## External Links (must be ignored by the internal-link validator)
25+
26+
- [GitHub](https://github.com/efargas/S7-Tools)
27+
- [Avalonia UI](https://avaloniaui.net)
28+
- [Microsoft Docs](https://docs.microsoft.com)
29+
30+
## Links with Anchors (file must exist, anchor is informational)
31+
32+
- [valid doc section](valid-document.md#file-references)
33+
34+
## Code Snippets in Links
35+
36+
Links written **inside** backticks must not be treated as real links:
37+
38+
The text `` [not a link](no-such-file.md) `` is inline code, not a link.
39+
40+
And inside a fenced block they must also be ignored:
41+
42+
```markdown
43+
[fake link inside fence](totally-imaginary.md)
44+
```
45+
46+
## Link with URL-Encoded Characters
47+
48+
- [url encoded](valid-document.md)
49+
50+
## mailto and Protocol Links (must be ignored)
51+
52+
- [Send email](mailto:user@example.com)
53+
- [FTP resource](ftp://example.com/resource)
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: "Valid Test Document"
3+
version: "1.0.0"
4+
created: "2026-01-01"
5+
last-updated: "2026-03-24"
6+
status: "current"
7+
tags: ["test", "fixture", "validation"]
8+
---
9+
10+
# Valid Test Document
11+
12+
This document has correct frontmatter and is used to verify that the
13+
validation scripts correctly accept well-formed documentation.
14+
15+
## File References
16+
17+
The following file references exist in the repository and should all resolve:
18+
19+
- `src/S7Tools/Services/Profiles/StandardProfileManager.cs`
20+
- `src/S7Tools.Core/Interfaces/Services/IProfileManager.cs`
21+
- `src/S7Tools.Core/Interfaces/Services/IProfileBase.cs`
22+
- `src/S7Tools/Services/Socat/SocatService.cs`
23+
- `src/S7Tools/Services/PowerSupply/PowerSupplyService.cs`
24+
- `src/S7Tools/Services/Tasking/ResourceCoordinator.cs`
25+
26+
## Code Snippets
27+
28+
C# code inside fenced blocks must **not** generate false-positive file references
29+
or link targets.
30+
31+
```csharp
32+
// Correct: StandardProfileManager implements the unified pattern
33+
public class MyProfileService : StandardProfileManager<MyProfile>
34+
{
35+
protected override MyProfile CreateDefaultProfile() =>
36+
new() { Name = "Default" };
37+
}
38+
```
39+
40+
Inline code like `src/S7Tools/Services/Profiles/StandardProfileManager.cs` is fine
41+
because the validator intentionally resolves inline-backtick paths.
42+
43+
```csharp
44+
// This is a simplified illustration
45+
// ... simplified
46+
public class SimplifiedExample
47+
{
48+
// work omitted for brevity
49+
}
50+
```
51+
52+
## Internal Links
53+
54+
Link to a sibling fixture: [invalid-frontmatter](invalid-frontmatter.md)
55+
56+
## Escaped Characters
57+
58+
Markdown allows escape sequences: \*not bold\*, \[not a link\], \`not code\`.
59+
These must not be mistaken for real syntax by the validators.

scripts/entities.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ class NamespaceValidation:
117117
def __post_init__(self):
118118
"""Validate entity after initialization."""
119119
valid_categories = [
120-
"Base", "Controls", "Dialogs", "Jobs", "Layout",
120+
"Base", "Components", "Controls", "Dialogs", "Hex", "Jobs", "Layout",
121121
"Pages", "Profiles", "Settings", "Tasks"
122122
]
123123

scripts/extractors/markdown_parser.py

Lines changed: 47 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,9 @@ def extract_file_references(self, content: str, source_file: str) -> list[FilePa
116116
- tests/...
117117
- Relative paths: ../path/to/file
118118
119+
Content inside fenced code blocks is excluded to avoid false positives
120+
from code examples that reference hypothetical or template paths.
121+
119122
Args:
120123
content: Markdown file content
121124
source_file: Path to source file
@@ -125,34 +128,64 @@ def extract_file_references(self, content: str, source_file: str) -> list[FilePa
125128
"""
126129
references = []
127130

128-
# Regex patterns for file paths
129-
patterns = [
131+
# Inline code patterns – these target explicit backtick-wrapped paths in prose
132+
inline_patterns = [
130133
r'`(src/[^`]+\.(cs|csproj|axaml|json))`',
131134
r'`(docs/[^`]+\.md)`',
132135
r'`(tests/[^`]+\.(cs|csproj))`',
133-
r'\]\((\.\./[^)]+\.md)\)', # Relative markdown links
136+
]
137+
138+
# Link patterns – match markdown link syntax [text](path)
139+
link_patterns = [
140+
r'\]\((\.\./[^)]+\.md)\)', # Relative markdown links
134141
r'\]\(([^)]+\.(cs|md|json|axaml))\)', # Any file in markdown links
135142
]
136143

137-
for line_num, line in enumerate(content.split('\n'), start=1):
138-
for pattern in patterns:
144+
lines = content.split('\n')
145+
in_code_fence = False
146+
147+
for line_num, line in enumerate(lines, start=1):
148+
# Track fenced code block boundaries
149+
stripped = line.strip()
150+
if stripped.startswith('```') or stripped.startswith('~~~'):
151+
in_code_fence = not in_code_fence
152+
continue
153+
154+
# Skip all extraction while inside a fenced code block –
155+
# paths and links there are illustrative/hypothetical, not real refs.
156+
if in_code_fence:
157+
continue
158+
159+
# Extract inline backtick patterns (prose references)
160+
for pattern in inline_patterns:
139161
for match in re.finditer(pattern, line):
140162
referenced_path = match.group(1)
163+
path_type = "relative" if referenced_path.startswith('../') else (
164+
"absolute" if referenced_path.startswith('/') else "project_relative"
165+
)
166+
references.append(FilePathReference(
167+
source_file=source_file,
168+
line_number=line_num,
169+
referenced_path=referenced_path,
170+
path_type=path_type,
171+
exists=False
172+
))
141173

142-
# Determine path type
143-
if referenced_path.startswith('../'):
144-
path_type = "relative"
145-
elif referenced_path.startswith('/'):
146-
path_type = "absolute"
147-
else:
148-
path_type = "project_relative"
149-
174+
# Extract link patterns; strip inline code spans first to avoid
175+
# matching syntax written inside backticks.
176+
line_no_inline = re.sub(r'`[^`]+`', '', line)
177+
for pattern in link_patterns:
178+
for match in re.finditer(pattern, line_no_inline):
179+
referenced_path = match.group(1)
180+
path_type = "relative" if referenced_path.startswith('../') else (
181+
"absolute" if referenced_path.startswith('/') else "project_relative"
182+
)
150183
references.append(FilePathReference(
151184
source_file=source_file,
152185
line_number=line_num,
153186
referenced_path=referenced_path,
154187
path_type=path_type,
155-
exists=False # Will be validated later
188+
exists=False
156189
))
157190

158191
return references

0 commit comments

Comments
 (0)