You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cleanrepo/Options.cs
+13Lines changed: 13 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -53,6 +53,19 @@ class Options
53
53
[Option("remove-hops",HelpText="Clean redirection JSON file by replacing targets that are themselves redirected (daisy chains).")]
54
54
publicboolRemoveRedirectHops{get;set;}
55
55
56
+
[Option("catalog-images-with-text",Default=false,HelpText="Map images to the markdown/YAML files that reference them, with all text found in images. Must set --ocr-model-directory path.")]
57
+
publicboolCatalogImagesWithText{get;set;}
58
+
59
+
[Option("filter-images-for-text",Default=false,HelpText="Filter images for text. Must set --ocr-model-directory and --filter-text-json-file paths.")]
60
+
publicboolFilterImagesForText{get;set;}
61
+
62
+
[Option("ocr-model-directory",HelpText="Directory that contains the OCR (Tesseract) models for image scanning.")]
63
+
publicstring?OcrModelDirectory{get;set;}
64
+
65
+
[Option("filter-text-json-file",HelpText="JSON file of array of strings to filter OCR results with.")]
66
+
publicstring?FilterTextJsonFile{get;set;}
67
+
68
+
56
69
//[Option("format-redirects", Required = false, HelpText = "Format the redirection JSON file by deserializing and then serializing with pretty printing.")]
- Map images to the files that reference them and return text from those images.
8
8
- Find and delete orphaned "shared" markdown files (includes).
9
9
- Find and delete orphaned snippet (.cs, .vb, .cpp, .fs, and .xaml) files.
10
10
- Find and replace links to redirected files.
11
11
- Remove daisy chains (or hops) within the redirection files for the docset.
12
12
- Replace site-relative links with file-relative links (includes image links).
13
+
- Filter image list based on strings found in images.
13
14
14
15
## Usage
15
16
@@ -23,6 +24,10 @@ This command-line tool helps you clean up a DocFx-based content repo. It can:
23
24
| --replace-redirects | Find backlinks to redirected files and replace with new target. |
24
25
| --remove-hops | Remove daisy chains within the redirection files for the docset. |
25
26
| --relative-links | Replace site-relative links with file-relative links. |
27
+
| --catalog-images-with-text | Map images to the markdown/YAML files that reference them, with all text found in images. Must set --ocr-model-directory path. |
28
+
| --filter-images-for-text | Filter images for text. Must set --ocr-model-directory and --filter-text-json-file paths. |
29
+
| --ocr-model-directory | Directory that contains the OCR (Tesseract) models for image scanning. |
30
+
| --filter-text-json-file | JSON file of array of strings to filter OCR results with. |
26
31
27
32
## Usage examples
28
33
@@ -43,3 +48,66 @@ This command-line tool helps you clean up a DocFx-based content repo. It can:
43
48
```
44
49
CleanRepo.exe --orphaned-images
45
50
```
51
+
52
+
## Text to image examples
53
+
54
+
The text-to-image functionality supported in the `--catalog-images-with-text` and `--filter-images-for-text` options is provided by the [Tesseract](https://www.nuget.org/packages/tesseract/) NuGet package.
55
+
56
+
### Get the Tesseract models
57
+
58
+
You must determine which Tesseract models you want to use and install them on your system. Tesseract models are generated per operating system. Tesseract models come in a variety of sizes. You will also need to download the language data files for tesseract 4.0.0 or above from [tesseract-tessdata](https://github.com/tesseract-ocr/tessdata/). Use the `--ocr-model-directory` value to set the path.
"Value": "Function App\n\n\u00AE Overview\n\n\n\n| View Application Insights data G)\n\n\n\n\n\n\n\n\n\nActivity log Link to an Application Insights resource\n8. Access control (IAM)\n\u00A9 tes \u00A9 temepiseaieiin yt eb ise ea\n\n@ Diagnose and solve problems\n\n\u00A9 Microsoft Defender for Cloud @ totum Apptzation ihe of check that Applicaton nights OK ard the insramentaion key are removed rm your apliaton,\n\n\u0026 events (preview)\n\nFunctions O) \u00E9sarteg etiam caer toe Gorman Vier Tc home\nApplication Insights. You have the option to disable non-essential data collection, Learn more\n(A) Functions\n\u00A9 App keys\nChange your resource\nB App files\n\n\n\nDeployment\n\n= Deployment slots\n@ Deployment Center\nSettings\n\nHl Configuration\n\n\u0026\u0026 Authentication\n\n\u00AE Application insights\n\n\n"
0 commit comments