You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -107,10 +132,16 @@ This one will extract content from generated "lorem ipsum" page
107
132
cargo run --example check -- lorem-ipsum
108
133
```
109
134
110
-
This one print node with highest density:
135
+
This one prints node with highest density:
111
136
112
137
```bash
113
-
cargo run --examples check -- test4
138
+
cargo run --example check -- test4
139
+
```
140
+
141
+
Extract content as markdown from lorem ipsum (requires markdown feature):
142
+
143
+
```bash
144
+
cargo run --example check -- lorem-ipsum-markdown
114
145
```
115
146
116
147
There is scoring example i'm trying to implement scoring.
@@ -148,7 +179,9 @@ Overall Performance:
148
179
149
180
## Binary Usage
150
181
151
-
The crate includes a command-line binary tool `dce` (DOM Content Extraction) for extracting main content from HTML documents. It supports both local files and remote URLs as input sources.
182
+
The crate includes a command-line binary tool `dce` (DOM Content Extraction) for
183
+
extracting main content from HTML documents. It supports both local files and
184
+
remote URLs as input sources.
152
185
153
186
### Installation
154
187
@@ -167,19 +200,35 @@ Options:
167
200
-u, --url <URL> URL to fetch HTML content from
168
201
-f, --file <FILE> Local HTML file to process
169
202
-o, --output <FILE> Output file (stdout if not specified)
203
+
--format <FORMAT> Output format [default: text] [possible values: text, markdown]
170
204
-h, --help Print help
171
205
-V, --version Print version
172
206
```
173
207
174
208
Note: Either `--url` or `--file` must be specified, but not both.
175
209
210
+
### Markdown Output
211
+
212
+
To extract content as markdown format, use the `--format markdown` option:
213
+
214
+
```bash
215
+
# Extract as markdown from URL
216
+
cargo run --bin dce -- --url "https://example.com" --format markdown
217
+
218
+
# Extract as markdown from file and save to output
0 commit comments