You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -113,10 +138,16 @@ This one will extract content from generated "lorem ipsum" page
113
138
cargo run --example check -- lorem-ipsum
114
139
```
115
140
116
-
This one print node with highest density:
141
+
This one prints node with highest density:
117
142
118
143
```bash
119
-
cargo run --examples check -- test4
144
+
cargo run --example check -- test4
145
+
```
146
+
147
+
Extract content as markdown from lorem ipsum (requires markdown feature):
148
+
149
+
```bash
150
+
cargo run --example check -- lorem-ipsum-markdown
120
151
```
121
152
122
153
There is scoring example i'm trying to implement scoring.
@@ -154,7 +185,9 @@ Overall Performance:
154
185
155
186
## Binary Usage
156
187
157
-
The crate includes a command-line binary tool `dce` (DOM Content Extraction) for extracting main content from HTML documents. It supports both local files and remote URLs as input sources.
188
+
The crate includes a command-line binary tool `dce` (DOM Content Extraction) for
189
+
extracting main content from HTML documents. It supports both local files and
190
+
remote URLs as input sources.
158
191
159
192
### Installation
160
193
@@ -173,19 +206,35 @@ Options:
173
206
-u, --url <URL> URL to fetch HTML content from
174
207
-f, --file <FILE> Local HTML file to process
175
208
-o, --output <FILE> Output file (stdout if not specified)
209
+
--format <FORMAT> Output format [default: text] [possible values: text, markdown]
176
210
-h, --help Print help
177
211
-V, --version Print version
178
212
```
179
213
180
214
Note: Either `--url` or `--file` must be specified, but not both.
181
215
216
+
### Markdown Output
217
+
218
+
To extract content as markdown format, use the `--format markdown` option:
219
+
220
+
```bash
221
+
# Extract as markdown from URL
222
+
cargo run --bin dce -- --url "https://example.com" --format markdown
223
+
224
+
# Extract as markdown from file and save to output
0 commit comments