Skip to content

Commit 9682b9c

Browse files
committed
refactor: carry out multiple refactors and create new code comments
1 parent fa06b05 commit 9682b9c

26 files changed

+486
-163
lines changed

README.md

Lines changed: 60 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,43 @@
11
# go-lolhtml
22

3-
![GitHub Workflow Status](https://img.shields.io/github/workflow/status/coolspring8/go-lolhtml/Go) ![Codecov](https://img.shields.io/codecov/c/github/coolspring8/go-lolhtml) [![Go Report Card](https://goreportcard.com/badge/github.com/coolspring8/go-lolhtml)](https://goreportcard.com/report/github.com/coolspring8/go-lolhtml) [![PkgGoDev](https://pkg.go.dev/badge/github.com/coolspring8/go-lolhtml)](https://pkg.go.dev/github.com/coolspring8/go-lolhtml)
3+
![GitHub Workflow Status](https://img.shields.io/github/workflow/status/coolspring8/go-lolhtml/Go) [![codecov](https://codecov.io/gh/CoolSpring8/go-lolhtml/branch/main/graph/badge.svg)](https://codecov.io/gh/CoolSpring8/go-lolhtml) [![Go Report Card](https://goreportcard.com/badge/github.com/coolspring8/go-lolhtml)](https://goreportcard.com/report/github.com/coolspring8/go-lolhtml) [![PkgGoDev](https://pkg.go.dev/badge/github.com/coolspring8/go-lolhtml)](https://pkg.go.dev/github.com/coolspring8/go-lolhtml)
44

5-
Go bindings for the Rust library [cloudflare/lol-html](https://github.com/cloudflare/lol-html/), the *Low Output Latency streaming HTML rewriter/parser with CSS-selector based API*, talking via cgo.
5+
Go bindings for the Rust crate [cloudflare/lol-html](https://github.com/cloudflare/lol-html/), the *Low Output Latency streaming HTML rewriter/parser with CSS-selector based API*, talking via cgo.
66

7-
**Status:** All abilities provided by C-API implemented, except for customized user data in handlers. Tests are partially covered. The code is at its early stage and the API is therefore subject to change. If you have any ideas on how API can be better structured, feel free to open a PR or an issue.
7+
**Status:**
8+
9+
**All abilities provided by lol_html's c-api are available**, except for customized user data in handlers. The original tests included in c-api package have also been translated to examine this binding's functionality.
10+
11+
The code is at its early stage and **breaking changes might be introduced**. If you have any ideas on how the public API can be better structured, feel free to open a PR or an issue.
12+
13+
* [go-lolhtml](#go-lolhtml)
14+
* [Installation](#installation)
15+
* [Features](#features)
16+
* [Getting Started](#getting-started)
17+
* [Examples](#examples)
18+
* [Documentation](#documentation)
19+
* [Other Bindings](#other-bindings)
20+
* [Versioning](#versioning)
21+
* [Help Wanted!](#help-wanted)
22+
* [License](#license)
23+
* [Disclaimer](#disclaimer)
824

925
## Installation
1026

11-
For Linux/macOS/Windows x86_64 platforms, installation is as simple as a single `go get`:
27+
For Linux/macOS/Windows x86_64 platform users, installation is as simple as a single `go get` command:
1228

1329
```shell
1430
$ go get github.com/coolspring8/go-lolhtml
1531
```
1632

17-
There is no need for you to install Rust. That's because lol-html could be prebuilt into static libraries, stored and shipped in `/build` folder, so that cgo can handle other matters naturally and smoothly.
33+
Installing Rust is not a necessary step. That's because lol-html could be prebuilt into static libraries, stored and shipped in `/build` folder, so that cgo can handle other compilation matters naturally and smoothly, without intervention.
1834

19-
For other platforms, you'll have to compile it yourself.
35+
For other platforms, you will have to compile it yourself.
36+
37+
## Features
38+
39+
- Fast: A Go (cgo) wrapper built around the highly-optimized Rust HTML parsing crate lol_html.
40+
- Easy to use: Utilizing Go's idiomatic I/O methods, [lolhtml.Writer](https://pkg.go.dev/github.com/coolspring8/go-lolhtml#Writer) implements [io.Writer](https://golang.org/pkg/io/#Writer) interface.
2041

2142
## Getting Started
2243

@@ -27,16 +48,18 @@ package main
2748

2849
import (
2950
"bytes"
30-
"github.com/coolspring8/go-lolhtml"
3151
"io"
3252
"log"
3353
"os"
54+
55+
"github.com/coolspring8/go-lolhtml"
3456
)
3557

3658
func main() {
3759
chunk := []byte("Hello, <span>World</span>!")
3860
r := bytes.NewReader(chunk)
3961
w, err := lolhtml.NewWriter(
62+
// output to stdout
4063
os.Stdout,
4164
&lolhtml.Handlers{
4265
ElementContentHandler: []lolhtml.ElementContentHandler{
@@ -56,26 +79,43 @@ func main() {
5679
if err != nil {
5780
log.Fatal(err)
5881
}
59-
defer w.Free()
6082

83+
// copy from the bytes reader to lolhtml writer
6184
_, err = io.Copy(w, r)
6285
if err != nil {
6386
log.Fatal(err)
6487
}
65-
66-
err = w.End()
88+
89+
// explicitly close the writer and flush the remaining content
90+
err = w.Close()
6791
if err != nil {
6892
log.Fatal(err)
6993
}
7094
// Output: Hello, <span>LOL-HTML</span>!
7195
}
7296
```
7397

74-
The above program takes the chunk `Hello, <span>World</span>!` as input, is configured to rewrite all texts in `span` tags to "LOL-HTML" and prints the result to standard output.
98+
The above program creates a new Writer configured to rewrite all texts in `span` tags to "LOL-HTML". It takes the chunk `Hello, <span>World</span>!` as input, and prints the result to standard output.
7599

76100
And the result is `Hello, <span>LOL-HTML</span>!` .
77101

78-
For more examples, please see the `/examples` directory.
102+
## Examples
103+
104+
example_test.go contains two examples.
105+
106+
For more detailed examples, please visit the `/examples` subdirectory.
107+
108+
- defer-scripts
109+
110+
Usage: curl -NL https://git.io/JeOSZ | go run main.go
111+
112+
- mixed-content-rewriter
113+
114+
Usage: curl -NL https://git.io/JeOSZ | go run main.go
115+
116+
- web-scraper
117+
118+
A ported Go version of https://web.scraper.workers.dev/.
79119

80120
## Documentation
81121

@@ -86,6 +126,14 @@ Available at [pkg.go.dev](https://pkg.go.dev/github.com/coolspring8/go-lolhtml).
86126
- Rust (native), C, JavaScript - [cloudflare/lol-html](https://github.com/cloudflare/lol-html/)
87127
- Lua - [jdesgats/lua-lolhtml](https://github.com/jdesgats/lua-lolhtml/)
88128

129+
## Versioning
130+
131+
This package does not really follow [Semantic Versioning](https://semver.org/). The current strategy is to follow lol_html's major and minor version, and the patch version number is reserved for this binding's updates, for Go Modul to upgrade correctly.
132+
133+
## Help Wanted!
134+
135+
There are a few interesting things at [Projects](https://github.com/coolspring8/go-lolhtml/projects/1) panel that I have considered but is not yet implemented. Other contributions and suggestions are also welcome!
136+
89137
## License
90138

91139
BSD 3-Clause "New" or "Revised" License

attribute.go

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,33 @@ package lolhtml
66
*/
77
import "C"
88

9-
// AttributeIterator cannot be iterated by "range" syntax. You should use AttributeIterator.Next() instead.
9+
// AttributeIterator can be used to iterate over all attributes of an element. The only way to
10+
// get an AttributeIterator is by calling AttributeIterator() on an Element. Note the "range" syntax is not
11+
// applicable here, use AttributeIterator.Next() instead.
1012
type AttributeIterator C.lol_html_attributes_iterator_t
13+
14+
// Attribute represents an HTML element attribute. Obtained by calling Next() on an AttributeIterator.
1115
type Attribute C.lol_html_attribute_t
1216

17+
// Free frees the memory held by the AttributeIterator.
1318
func (ai *AttributeIterator) Free() {
1419
C.lol_html_attributes_iterator_free((*C.lol_html_attributes_iterator_t)(ai))
1520
}
1621

22+
// Next advances the iterator and returns next attribute.
23+
// Returns nil if the iterator has been exhausted.
1724
func (ai *AttributeIterator) Next() *Attribute {
1825
return (*Attribute)(C.lol_html_attributes_iterator_next((*C.lol_html_attributes_iterator_t)(ai)))
1926
}
2027

28+
// Name returns the name of the attribute.
2129
func (a *Attribute) Name() string {
2230
nameC := (str)(C.lol_html_attribute_name_get((*C.lol_html_attribute_t)(a)))
2331
defer nameC.Free()
2432
return nameC.String()
2533
}
2634

35+
// Value returns the value of the attribute.
2736
func (a *Attribute) Value() string {
2837
valueC := (str)(C.lol_html_attribute_value_get((*C.lol_html_attribute_t)(a)))
2938
defer valueC.Free()

benchmark_test.go

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,11 +231,10 @@ func BenchmarkNewWriter(b *testing.B) {
231231
b.Fatal(err)
232232
}
233233

234-
err = w.End()
234+
err = w.Close()
235235
if err != nil {
236236
b.Fatal(err)
237237
}
238-
w.Free()
239238
}
240239
})
241240
}

builder.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ func (rb *rewriterBuilder) Build(sink OutputSink, config Config) (*rewriter, err
130130
)
131131
if r != nil {
132132
rb.built = true
133-
return &rewriter{rw: r, pointers: rb.pointers}, nil
133+
return &rewriter{rewriter: r, pointers: rb.pointers}, nil
134134
}
135135
return nil, getError()
136136
}

comment.go

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,21 @@ package lolhtml
77
import "C"
88
import "unsafe"
99

10+
// Comment represents an HTML comment.
1011
type Comment C.lol_html_comment_t
1112

13+
// CommentHandlerFunc is a callback handler function to do something with a Comment.
14+
// Expected to return a RewriterDirective as instruction to continue or stop.
1215
type CommentHandlerFunc func(*Comment) RewriterDirective
1316

17+
// Text returns the comment's text.
1418
func (c *Comment) Text() string {
1519
textC := (str)(C.lol_html_comment_text_get((*C.lol_html_comment_t)(c)))
1620
defer textC.Free()
1721
return textC.String()
1822
}
1923

24+
// SetText sets the comment's text and returns an error if there is one.
2025
func (c *Comment) SetText(text string) error {
2126
textC := C.CString(text)
2227
defer C.free(unsafe.Pointer(textC))
@@ -36,18 +41,18 @@ const (
3641
commentReplace
3742
)
3843

39-
func (c *Comment) alter(content string, alter commentAlter, isHtml bool) error {
44+
func (c *Comment) alter(content string, alter commentAlter, isHTML bool) error {
4045
contentC := C.CString(content)
4146
defer C.free(unsafe.Pointer(contentC))
4247
contentLen := len(content)
4348
var errCode C.int
4449
switch alter {
4550
case commentInsertBefore:
46-
errCode = C.lol_html_comment_before((*C.lol_html_comment_t)(c), contentC, C.size_t(contentLen), C.bool(isHtml))
51+
errCode = C.lol_html_comment_before((*C.lol_html_comment_t)(c), contentC, C.size_t(contentLen), C.bool(isHTML))
4752
case commentInsertAfter:
48-
errCode = C.lol_html_comment_after((*C.lol_html_comment_t)(c), contentC, C.size_t(contentLen), C.bool(isHtml))
53+
errCode = C.lol_html_comment_after((*C.lol_html_comment_t)(c), contentC, C.size_t(contentLen), C.bool(isHTML))
4954
case commentReplace:
50-
errCode = C.lol_html_comment_replace((*C.lol_html_comment_t)(c), contentC, C.size_t(contentLen), C.bool(isHtml))
55+
errCode = C.lol_html_comment_replace((*C.lol_html_comment_t)(c), contentC, C.size_t(contentLen), C.bool(isHTML))
5156
default:
5257
panic("not implemented")
5358
}
@@ -57,34 +62,69 @@ func (c *Comment) alter(content string, alter commentAlter, isHtml bool) error {
5762
return getError()
5863
}
5964

65+
// InsertBeforeAsText inserts the given content before the comment.
66+
//
67+
// The rewriter will HTML-escape the content before insertion:
68+
//
69+
// `<` will be replaced with `&lt;`
70+
//
71+
// `>` will be replaced with `&gt;`
72+
//
73+
// `&` will be replaced with `&amp;`
6074
func (c *Comment) InsertBeforeAsText(content string) error {
6175
return c.alter(content, commentInsertAfter, false)
6276
}
6377

64-
func (c *Comment) InsertBeforeAsHtml(content string) error {
78+
// InsertBeforeAsHTML inserts the given content before the comment.
79+
// The content is inserted as is.
80+
func (c *Comment) InsertBeforeAsHTML(content string) error {
6581
return c.alter(content, commentInsertBefore, true)
6682
}
6783

84+
// InsertAfterAsText inserts the given content before the comment.
85+
//
86+
// The rewriter will HTML-escape the content before insertion:
87+
//
88+
// `<` will be replaced with `&lt;`
89+
//
90+
// `>` will be replaced with `&gt;`
91+
//
92+
// `&` will be replaced with `&amp;`
6893
func (c *Comment) InsertAfterAsText(content string) error {
6994
return c.alter(content, commentInsertAfter, false)
7095
}
7196

72-
func (c *Comment) InsertAfterAsHtml(content string) error {
97+
// InsertAfterAsHTML inserts the given content before the comment.
98+
// The content is inserted as is.
99+
func (c *Comment) InsertAfterAsHTML(content string) error {
73100
return c.alter(content, commentInsertAfter, true)
74101
}
75102

103+
// ReplaceAsText replace the comment with the supplied content.
104+
//
105+
// The rewriter will HTML-escape the content:
106+
//
107+
// `<` will be replaced with `&lt;`
108+
//
109+
// `>` will be replaced with `&gt;`
110+
//
111+
// `&` will be replaced with `&amp;`
76112
func (c *Comment) ReplaceAsText(content string) error {
77113
return c.alter(content, commentReplace, false)
78114
}
79115

80-
func (c *Comment) ReplaceAsHtml(content string) error {
116+
// ReplaceAsHTML replace the comment with the supplied content.
117+
// The content is kept as is.
118+
func (c *Comment) ReplaceAsHTML(content string) error {
81119
return c.alter(content, commentReplace, true)
82120
}
83121

122+
// Remove removes the comment.
84123
func (c *Comment) Remove() {
85124
C.lol_html_comment_remove((*C.lol_html_comment_t)(c))
86125
}
87126

127+
// IsRemoved returns whether the comment is removed or not.
88128
func (c *Comment) IsRemoved() bool {
89129
return (bool)(C.lol_html_comment_is_removed((*C.lol_html_comment_t)(c)))
90130
}

0 commit comments

Comments
 (0)