You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
awaitXmlTokenizer.Create().ParseAsync(stream, onToken: token=> { /* handle xml-tokens here */ });
27
27
28
+
// kickoff html tokenizer*
29
+
awaitHtmlTokenizer.Create().ParseAsync(stream, onToken: token=> { /* handle html-tokens here */ });
30
+
28
31
// kickoff yaml tokenizer
29
32
awaitYamlTokenizer.Create().ParseAsync(stream, onToken: token=> { /* handle yaml-tokens here */ });
30
33
```
31
34
35
+
* You also have to handle the script and style Tokenizers. Check out the [docs](https://crwsolutions.github.io/ntokenizers/html) for more information.
36
+
32
37
## Overview
33
38
34
-
NTokenizers is a .NET library written in C# that provides tokenizers for processing structured text formats like Markdown, JSON, XML, YAML, SQL, Typescript, CSS and CSharp. The `Tokenize` method is the core functionality that breaks down structured text into meaningful components (tokens) for processing. Its key feature is **stream processing capability** - it can handle data as it arrives in real-time, making it ideal for processing large files or streaming data without loading everything into memory at once.
39
+
NTokenizers is a .NET library written in C# that provides tokenizers for processing structured text formats like Markdown, JSON, XML, HTML, YAML, SQL, Typescript, CSS and CSharp. The `Tokenize` method is the core functionality that breaks down structured text into meaningful components (tokens) for processing. Its key feature is **stream processing capability** - it can handle data as it arrives in real-time, making it ideal for processing large files or streaming data without loading everything into memory at once.
35
40
36
41
> [!WARNING]
37
42
>
@@ -75,6 +80,17 @@ The same principle applies to inline tokenizers such as Heading, Blockquote, Lis
75
80
│ └─────────┘
76
81
│
77
82
│ ┌─────────┐
83
+
├──────►│ html │ ───► fire html tokens
84
+
│ └─────────┘
85
+
│ │
86
+
│ ▼ ┌─────────┐
87
+
│ ├──────►│ css │ ───► fire css tokens
88
+
│ │ └─────────┘
89
+
│ │
90
+
│ │ ┌─────────┐
91
+
│ └──────►│ script │ ───► fire typescript tokens
92
+
│ └─────────┘
93
+
│ ┌─────────┐
78
94
└──────►│ etc.. │ ───► etc
79
95
└─────────┘
80
96
```
@@ -84,12 +100,16 @@ The same principle applies to inline tokenizers such as Heading, Blockquote, Lis
84
100
Here's a simple example showing how to use the `MarkdownTokenizer`:
85
101
86
102
```csharp
103
+
usingNTokenizers.Core;
104
+
usingNTokenizers.Css;
105
+
usingNTokenizers.Html;
87
106
usingNTokenizers.Json;
88
107
usingNTokenizers.Markdown;
89
108
usingNTokenizers.Markdown.Metadata;
90
109
usingNTokenizers.Typescript;
91
110
usingNTokenizers.Xml;
92
111
usingSpectre.Console;
112
+
usingSystem.Diagnostics;
93
113
usingSystem.IO.Pipes;
94
114
usingSystem.Text;
95
115
@@ -101,6 +121,14 @@ class Program
101
121
Here is some **bold** text and some *italic* text.
102
122
103
123
# NTokenizers Showcase
124
+
125
+
## Css example
126
+
```css
127
+
.user {
128
+
color: #FFFFFF;
129
+
active: true;
130
+
}
131
+
```
104
132
105
133
## XML example
106
134
```xml
@@ -109,6 +137,28 @@ class Program
109
137
</user>
110
138
```
111
139
140
+
## HTML example
141
+
```html
142
+
<html>
143
+
<head>
144
+
<style>
145
+
body { font-family: Arial, sans-serif; background-color: #f0f8ff; }
0 commit comments