@@ -30,6 +30,13 @@ for token in Tokenizer::new(html).infallible() {
3030assert_eq! (new_html , " <title>hello world</title>" );
3131```
3232
33+ ` html5gum ` provides multiple kinds of APIs:
34+
35+ * Iterating over tokens as shown above.
36+ * Implementing your own ` Emitter ` for maximum performance, see [ the ` custom_emitter.rs ` example] ( examples/custom_emitter.rs ) .
37+ * A callbacks-based API for a middleground between convenience and performance, see [ the ` callback_emitter.rs ` example] ( examples/callback_emitter.rs ) .
38+ * With the ` tree-builder ` feature, html5gum can be integrated with ` html5ever ` and ` scraper ` . See [ the ` scraper.rs ` example] ( examples/scraper.rs ) .
39+
3340## What a tokenizer does and what it does not do
3441
3542` html5gum ` fully implements [ 13.2.5 of the WHATWG HTML
@@ -42,9 +49,6 @@ test suite](https://github.com/html5lib/html5lib-tests/tree/master/tokenizer). S
4249 gracefully from invalid UTF-8.
4350* ` html5gum ` ** does not** [ correct mis-nested
4451 tags.] ( https://html.spec.whatwg.org/#an-introduction-to-error-handling-and-strange-cases-in-the-parser )
45- * ` html5gum ` ** does not** recognize implicitly self-closing elements like
46- ` <img> ` , as a tokenizer it will simply emit a start token. It does however
47- emit a self-closing tag for ` <img .. /> ` .
4852* ` html5gum ` doesn't implement the DOM, and unfortunately in the HTML spec,
4953 constructing the DOM ("tree construction") influences how tokenization is
5054 done. For an example of which problems this causes see [ this example
@@ -54,23 +58,9 @@ test suite](https://github.com/html5lib/html5lib-tests/tree/master/tokenizer). S
5458 21] ( https://github.com/untitaker/html5gum/issues/21 ) .
5559
5660With those caveats in mind, ` html5gum ` can pretty much ~ parse~ _ tokenize_
57- anything that browsers can.
58-
59- ## The ` Emitter ` trait
60-
61- A distinguishing feature of ` html5gum ` is that you can bring your own token
62- datastructure and hook into token creation by implementing the ` Emitter ` trait.
63- This allows you to:
64-
65- * Rewrite all per-HTML-tag allocations to use a custom allocator or datastructure.
66-
67- * Efficiently filter out uninteresting categories data without ever allocating
68- for it. For example if any plaintext between tokens is not of interest to
69- you, you can implement the respective trait methods as noop and therefore
70- avoid any overhead creating plaintext tokens.
71-
72- See [ the ` custom_emitter ` example] [ examples/custom_emitter.rs ] for how this
73- looks like in practice.
61+ anything that browsers can. However, using the experimental ` tree-builder `
62+ feature, html5gum can be integrated with ` html5ever ` and ` scraper ` . See [ the
63+ ` scraper.rs ` example] ( examples/scraper.rs ) .
7464
7565## Other features
7666
@@ -116,3 +106,5 @@ Licensed under the MIT license, see [`./LICENSE`][LICENSE].
116106[ LICENSE ] : ./LICENSE
117107[ examples/tokenize_with_state_switches.rs ] : ./examples/tokenize_with_state_switches.rs
118108[ examples/custom_emitter.rs ] : ./examples/custom_emitter.rs
109+ [ examples/callback_emitter.rs ] : ./examples/callback_emitter.rs
110+ [ examples/scraper.rs ] : ./examples/scraper.rs
0 commit comments