Skip to content

Commit 9626086

Browse files
authored
Refactor documentation for tools (#2949)
2 parents 018f463 + 80f7905 commit 9626086

File tree

16 files changed

+2396
-144
lines changed

16 files changed

+2396
-144
lines changed
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
applyTo: "docs/content/docs/**/_index.md"
3+
---
4+
5+
# Documentation Standards
6+
7+
All documentation content **must stay in sync with the codebase**.
8+
List pages automatically show the `description` from the front matter, followed by cards for each sub-page. This layout is applied automatically.
9+
10+
---
11+
12+
## Front Matter
13+
14+
- Only edit the `description`.
15+
- The `description` must be **SEO/GEO friendly**:
16+
- Use clear, relevant keywords.
17+
- Keep it concise, about 150–160 characters.
18+
- Write for humans first.
19+
20+
---
21+
22+
## Content
23+
24+
The content appears **after the list of sub-pages**.
25+
It must describe the overall purpose of the collection or section, based on an **inspection of all current sub-pages**.
26+
27+
Write **concise generalisms** that:
28+
- Summarise common themes, capabilities, and patterns found across the sub-pages.
29+
- Call out notable variations only when they matter to readers choosing where to go next.
30+
- Link to one or two representative sub-pages when helpful, not an exhaustive list.
31+
32+
---
33+
34+
## Rules
35+
36+
- Content appears **after the auto-generated list of sub-pages**; do not add additional lists.
37+
- Use the content to give context and orientation, not item-level documentation.
38+
- Base statements on an actual review of all sub-pages in the section.
39+
- Keep wording concise and consistent with the page’s keywords, and ensure the `description` aligns with the content.
40+
- Update this content whenever sub-pages are added, removed, or materially changed.
41+
42+
---

.github/instructions/docs.content.docs.instructions.md renamed to .github/instructions/docs.content.docs.single.instructions.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
---
2-
applyTo: "docs/content/docs/**"
2+
applyTo: "docs/content/docs/**/index.md"
33
---
44

55
# Documentation Standards
66

77
All documentation content **must stay in sync with the codebase**.
8-
Every page must accurately reflect the associated **data files** and **schemas**.
8+
Every page must accurately reflect the associated **data files** and **schemas**.
9+
Existing manually added content must be preserved. Reorganise or adapt it as needed, but do not remove it.
910

1011
---
1112

@@ -25,7 +26,8 @@ Each documentation file **may include** the following properties:
2526

2627
## Documentation Structure
2728

28-
Documentation files should generally include these sections (in order):
29+
Documentation files should generally include these sections (in order).
30+
Manually added content should be placed into the most relevant section, or reorganised if necessary.
2931

3032
1. **Overview**
3133
- Subsections: *How It Works*, *Use Cases*

docs/content/docs/reference/tools/stringmanipulatortool/index.md

Lines changed: 184 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
22
title: String Manipulator Tool
3-
description: Used to process the String fields of a work item. This is useful for cleaning up data. It will limit fields to a max length and apply regex replacements based on what is configured. Each regex replacement is applied in order and can be enabled or disabled.
3+
description: Processes and cleans up string fields in work items by applying regex patterns, length limitations, and text transformations. Essential for data cleanup and standardization during migration.
44
dataFile: reference.tools.stringmanipulatortool.yaml
5+
schemaFile: schema.tools.stringmanipulatortool.json
56
slug: string-manipulator-tool
67
aliases:
78
- /docs/Reference/Tools/StringManipulatorTool
@@ -12,13 +13,39 @@ date: 2025-06-24T12:07:31Z
1213
discussionId: 2643
1314
---
1415

15-
{{< class-description >}}
16+
## Overview
1617

17-
## Options
18+
The String Manipulator Tool provides powerful text processing capabilities for work item migration. It applies configurable string manipulations to all text fields in work items, enabling data cleanup, standardization, and format corrections during the migration process.
1819

19-
{{< class-options >}}
20+
The tool processes string fields through a series of regex-based manipulators that can remove invalid characters, standardize formats, replace text patterns, and enforce field length limits. Each manipulation is applied in sequence and can be individually enabled or disabled.
21+
22+
### How It Works
23+
24+
The String Manipulator Tool operates on all string fields within work items during migration:
25+
26+
1. **Field Processing**: The tool identifies all string-type fields in each work item
27+
2. **Sequential Application**: Each configured manipulator is applied in the order defined in the configuration
28+
3. **Regex Transformations**: Pattern-based replacements using regular expressions
29+
4. **Length Enforcement**: Truncates fields that exceed the maximum allowed length
30+
5. **Conditional Execution**: Each manipulator can be individually enabled or disabled
31+
32+
The tool is automatically invoked by migration processors and applies transformations before work items are saved to the target system.
33+
34+
### Use Cases
35+
36+
Common scenarios where the String Manipulator Tool is essential:
2037

21-
## Samples
38+
- **Data Cleanup**: Removing invalid Unicode characters, control characters, or formatting artifacts
39+
- **Format Standardization**: Converting text patterns to consistent formats
40+
- **Length Compliance**: Ensuring field values don't exceed target system limits
41+
- **Character Encoding**: Fixing encoding issues from legacy systems
42+
- **Pattern Replacement**: Updating URLs, paths, or references to match target environment
43+
44+
## Configuration Structure
45+
46+
### Options
47+
48+
{{< class-options >}}
2249

2350
### Sample
2451

@@ -28,13 +55,161 @@ discussionId: 2643
2855

2956
{{< class-sample sample="defaults" >}}
3057

31-
### Classic
58+
### Basic Examples
59+
60+
The String Manipulator Tool is configured with an array of manipulators, each defining a specific text transformation:
61+
62+
```json
63+
{
64+
"StringManipulatorTool": {
65+
"Enabled": true,
66+
"MaxStringLength": 1000000,
67+
"Manipulators": [
68+
{
69+
"$type": "RegexStringManipulator",
70+
"Enabled": true,
71+
"Description": "Remove invalid characters",
72+
"Pattern": "[^\\x20-\\x7E\\r\\n\\t]",
73+
"Replacement": ""
74+
}
75+
]
76+
}
77+
}
78+
```
79+
80+
### Complex Examples
81+
82+
#### Manipulator Types
83+
84+
Currently, the tool supports the following manipulator types:
85+
86+
- **RegexStringManipulator**: Applies regular expression pattern matching and replacement
87+
88+
#### Manipulator Properties
89+
90+
Each manipulator supports these properties:
91+
92+
- **$type**: Specifies the manipulator type (e.g., "RegexStringManipulator")
93+
- **Enabled**: Boolean flag to enable/disable this specific manipulator
94+
- **Description**: Human-readable description of what the manipulator does
95+
- **Pattern**: Regular expression pattern to match text
96+
- **Replacement**: Text to replace matched patterns (can be empty string for removal)
97+
98+
## Common Scenarios
99+
100+
### Removing Invalid Characters
101+
102+
Remove non-printable characters that may cause issues in the target system:
103+
104+
```json
105+
{
106+
"$type": "RegexStringManipulator",
107+
"Description": "Remove invalid characters from the end of the string",
108+
"Enabled": true,
109+
"Pattern": "[^( -~)\n\r\t]+",
110+
"Replacement": ""
111+
}
112+
```
113+
114+
### Standardizing Line Endings
115+
116+
Convert all line endings to a consistent format:
117+
118+
```json
119+
{
120+
"$type": "RegexStringManipulator",
121+
"Description": "Standardize line endings to CRLF",
122+
"Enabled": true,
123+
"Pattern": "\r\n|\n|\r",
124+
"Replacement": "\r\n"
125+
}
126+
```
127+
128+
### Cleaning HTML Content
129+
130+
Remove or clean HTML tags from text fields:
131+
132+
```json
133+
{
134+
"$type": "RegexStringManipulator",
135+
"Description": "Remove HTML tags",
136+
"Enabled": true,
137+
"Pattern": "<[^>]*>",
138+
"Replacement": ""
139+
}
140+
```
141+
142+
### Fixing Encoding Issues
143+
144+
Replace common encoding artifacts:
145+
146+
```json
147+
{
148+
"$type": "RegexStringManipulator",
149+
"Description": "Fix common encoding issues",
150+
"Enabled": true,
151+
"Pattern": "’|“|â€\u009d",
152+
"Replacement": "'"
153+
}
154+
```
155+
156+
## Good Practices
157+
158+
### Pattern Testing
159+
160+
- **Test regex patterns** thoroughly before applying to production data
161+
- **Use regex testing tools** to validate patterns against sample data
162+
- **Consider edge cases** and unintended matches in your patterns
163+
164+
### Performance Considerations
165+
166+
- **Order manipulators efficiently**: Place simpler patterns before complex ones
167+
- **Use specific patterns**: Avoid overly broad regex that may match unintended content
168+
- **Consider field length**: Set appropriate `MaxStringLength` to prevent excessive processing
169+
170+
### Data Safety
171+
172+
- **Backup source data**: Always maintain backups before applying string manipulations
173+
- **Test with sample data**: Validate manipulations on a subset before full migration
174+
- **Review results**: Check processed fields to ensure transformations are correct
175+
176+
### Configuration Management
177+
178+
- **Document patterns**: Include clear descriptions for each manipulator
179+
- **Version control**: Maintain configuration files in version control
180+
- **Incremental changes**: Test one manipulator at a time when developing complex transformations
181+
182+
## Troubleshooting
183+
184+
### Common Issues
185+
186+
**Manipulations Not Applied:**
187+
188+
- Verify the tool is enabled (`"Enabled": true`)
189+
- Check that individual manipulators are enabled
190+
- Review regex patterns for syntax errors
191+
- Ensure the tool is configured in the processor's tool list
192+
193+
**Unexpected Results:**
194+
195+
- Test regex patterns in isolation with sample data
196+
- Check the order of manipulators (they execute sequentially)
197+
- Verify escape sequences in JSON configuration
198+
- Review field content before and after processing
199+
200+
**Performance Issues:**
32201

33-
{{< class-sample sample="classic" >}}
202+
- Consider reducing `MaxStringLength` if processing very large fields
203+
- Optimize regex patterns to avoid catastrophic backtracking
204+
- Disable unnecessary manipulators
205+
- Process smaller batches of work items
34206

35-
## Metadata
207+
**Regex Pattern Errors:**
36208

37-
{{< class-metadata >}}
209+
- Validate regex syntax using online tools or testing utilities
210+
- Escape special characters properly in JSON configuration
211+
- Consider case sensitivity requirements
212+
- Test patterns against various input scenarios
38213

39214
## Schema
40215

0 commit comments

Comments
 (0)