Skip to content

Commit 4009cbf

Browse files
authored
Update docs (#28)
1 parent d846a83 commit 4009cbf

File tree

3 files changed

+164
-21
lines changed

3 files changed

+164
-21
lines changed

README.md

Lines changed: 72 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,23 @@ Additionally `simdjson-go` has the following features:
2222
- Support for [ndjson](http://ndjson.org/) (newline delimited json)
2323
- Pure Go (no need for cgo)
2424

25+
## Requirements
26+
27+
`simdjson-go` has the following requirements for parsing:
28+
29+
A CPU with both AVX2 and CLMUL is required (Haswell from 2013 onwards should do for Intel, for AMD a Ryzen/EPYC CPU (Q1 2017) should be sufficient).
30+
This can be checked using the provided [`SupportedCPU()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#SupportedCPU`) function.
31+
32+
The package does not provide fallback for unsupported CPUs, but serialized data can be deserialized on an unsupported CPU.
33+
34+
Using the `gccgo` will also always return unsupported CPU since it cannot compile assembly.
35+
2536
## Usage
2637

2738
Run the following command in order to install `simdjson-go`
2839

29-
```
30-
$ go get github.com/minio/simdjson-go
40+
```bash
41+
go get -u github.com/minio/simdjson-go
3142
```
3243

3344
In order to parse a JSON byte stream, you either call [`simdjson.Parse()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#Parse)
@@ -83,10 +94,44 @@ Each type then has helpers to access the data. When you get a type you can use t
8394
| TypeArray | `Array()` |
8495
| TypeRoot | `Root()` |
8596

97+
You can also get the next value as an `interface{}` using the [Interface()](https://pkg.go.dev/github.com/minio/simdjson-go#Iter.Interface) method.
98+
99+
Note that arrays and objects that are null are always returned as `TypeNull`.
100+
86101
The complex types returns helpers that will help parse each of the underlying structures.
87102

88103
It is up to you to keep track of the nesting level you are operating at.
89104

105+
For any `Iter` it is possible to marshal the recursive content of the Iter using
106+
[`MarshalJSON()`](https://pkg.go.dev/github.com/minio/simdjson-go#Iter.MarshalJSON) or
107+
[`MarshalJSONBuffer(...)`](https://pkg.go.dev/github.com/minio/simdjson-go#Iter.MarshalJSONBuffer).
108+
109+
Currently, it is not possible to unmarshal into structs.
110+
111+
## Parsing Objects
112+
113+
If you are only interested in one key in an object you can use `FindKey` to quickly select it.
114+
115+
An object kan be traversed manually by using `NextElement(dst *Iter) (name string, t Type, err error)`.
116+
The key of the element will be returned as a string and the type of the value will be returned
117+
and the provided `Iter` will contain an iterator which will allow access to the content.
118+
119+
There is a `NextElementBytes` which provides the same, but without the need to allocate a string.
120+
121+
All elements of the object can be retrieved using a pretty lightweight [`Parse`](https://pkg.go.dev/github.com/minio/simdjson-go#Object.Parse)
122+
which provides a map of all keys and all elements an a slide.
123+
124+
All elements of the object can be returned as `map[string]interface{}` using the `Map` method on the object.
125+
This will naturally perform allocations for all elements.
126+
127+
## Parsing Arrays
128+
129+
[Arrays](https://pkg.go.dev/github.com/minio/simdjson-go#Array) in JSON can have mixed types.
130+
To iterate over the array with mixed types use the [`Iter`](https://pkg.go.dev/github.com/minio/simdjson-go#Array.Iter)
131+
method to get an iterator.
132+
133+
There are methods that allow you to retrieve all elements as a single type,
134+
[]int64, []uint64, float64 and strings.
90135

91136
## Parsing NDSJON stream
92137

@@ -163,6 +208,31 @@ func findHondas(r io.Reader) {
163208

164209
More examples can be found in the examples subdirectory and further documentation can be found at [godoc](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc).
165210

211+
## Serializing parsed json
212+
213+
It is possible to serialize parsed JSON for more compact storage and faster load time.
214+
215+
To create a new serialized use [NewSerializer](https://pkg.go.dev/github.com/minio/simdjson-go#NewSerializer).
216+
This serializer can be reused for several JSON blocks.
217+
218+
The serializer will provide string deduplication and compression of elements.
219+
This can be finetuned using the [`CompressMode`](https://pkg.go.dev/github.com/minio/simdjson-go#Serializer.CompressMode) setting.
220+
221+
To serialize a block of parsed data use the [`Serialize`](https://pkg.go.dev/github.com/minio/simdjson-go#Serializer.Serialize) method.
222+
223+
To read back use the [`Deserialize`](https://pkg.go.dev/github.com/minio/simdjson-go#Serializer.Deserialize) method.
224+
For deserializing the compression mode does not need to match since it is read from the stream.
225+
226+
Example of speed for serializer/deserializer on [`parking-citations-1M`](https://files.klauspost.com/compress/parking-citations-1M.json.zst).
227+
228+
| Compress Mode | % of JSON size | Serialize Speed | Deserialize Speed |
229+
|---------------|----------------|-----------------|-------------------|
230+
| None | 177.26% | 425.70 MB/s | 2334.33 MB/s |
231+
| Fast | 17.20% | 412.75 MB/s | 1234.76 MB/s |
232+
| Default | 16.85% | 411.59 MB/s | 1242.09 MB/s |
233+
| Best | 10.91% | 337.17 MB/s | 806.23 MB/s |
234+
235+
In some cases the speed difference and compression difference will be bigger.
166236

167237
## Performance vs simdjson
168238

@@ -234,13 +304,6 @@ BenchmarkFindStructuralBitsParallelLoop 7225.24 8302.96 1.15x
234304

235305
These benchmarks were generated on a c5.2xlarge EC2 instance with a Xeon Platinum 8124M CPU at 3.0 GHz.
236306

237-
## Requirements
238-
239-
`simdjson-go` has the following requirements:
240-
241-
- A CPU with both AVX2 and CLMUL is required (Haswell from 2013 onwards should do for Intel, for AMD a Ryzen/EPYC CPU (Q1 2017) should be sufficient).
242-
This can be checked using the provided [`SupportedCPU()`](https://pkg.go.dev/github.com/minio/simdjson-go?tab=doc#SupportedCPU`) function.
243-
244307
## Design
245308

246309
`simdjson-go` follows the same two stage design as `simdjson`.

parsed_array.go

Lines changed: 83 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,8 @@ readArray:
145145
return dst, nil
146146
}
147147

148-
// AsInteger returns the array values as float.
149-
// Integers are automatically converted to float.
148+
// AsInteger returns the array values as int64 values.
149+
// Uints/Floats are automatically converted to int64 if they fit within the range.
150150
func (a *Array) AsInteger() ([]int64, error) {
151151
// Estimate length
152152
lenEst := (len(a.tape.Tape) - a.off - 1) / 2
@@ -197,6 +197,57 @@ readArray:
197197
return dst, nil
198198
}
199199

200+
// AsUint64 returns the array values as float.
201+
// Uints/Floats are automatically converted to uint64 if they fit within the range.
202+
func (a *Array) AsUint64() ([]uint64, error) {
203+
// Estimate length
204+
lenEst := (len(a.tape.Tape) - a.off - 1) / 2
205+
if lenEst < 0 {
206+
lenEst = 0
207+
}
208+
dst := make([]uint64, 0, lenEst)
209+
readArray:
210+
for {
211+
tag := Tag(a.tape.Tape[a.off] >> 56)
212+
a.off++
213+
switch tag {
214+
case TagFloat:
215+
if len(a.tape.Tape) <= a.off {
216+
return nil, errors.New("corrupt input: expected float, but no more values")
217+
}
218+
val := math.Float64frombits(a.tape.Tape[a.off])
219+
if val > math.MaxInt64 {
220+
return nil, errors.New("float value overflows uint64")
221+
}
222+
if val < 0 {
223+
return nil, errors.New("float value is negative")
224+
}
225+
dst = append(dst, uint64(val))
226+
case TagInteger:
227+
if len(a.tape.Tape) <= a.off {
228+
return nil, errors.New("corrupt input: expected integer, but no more values")
229+
}
230+
val := int64(a.tape.Tape[a.off])
231+
if val < 0 {
232+
return nil, errors.New("int64 value is negative")
233+
}
234+
dst = append(dst, uint64(val))
235+
case TagUint:
236+
if len(a.tape.Tape) <= a.off {
237+
return nil, errors.New("corrupt input: expected integer, but no more values")
238+
}
239+
240+
dst = append(dst, a.tape.Tape[a.off])
241+
case TagArrayEnd:
242+
break readArray
243+
default:
244+
return nil, fmt.Errorf("unable to convert type %v to integer", tag)
245+
}
246+
a.off++
247+
}
248+
return dst, nil
249+
}
250+
200251
// AsString returns the array values as a slice of strings.
201252
// No conversion is done.
202253
func (a *Array) AsString() ([]string, error) {
@@ -227,3 +278,33 @@ func (a *Array) AsString() ([]string, error) {
227278
}
228279
}
229280
}
281+
282+
// AsStringCvt returns the array values as a slice of strings.
283+
// Scalar types are converted.
284+
// Root, Object and Arrays are not supported an will return an error if found.
285+
func (a *Array) AsStringCvt() ([]string, error) {
286+
// Estimate length
287+
lenEst := len(a.tape.Tape) - a.off - 1
288+
if lenEst < 0 {
289+
lenEst = 0
290+
}
291+
dst := make([]string, 0, lenEst)
292+
i := a.Iter()
293+
var elem Iter
294+
for {
295+
t, err := i.AdvanceIter(&elem)
296+
if err != nil {
297+
return nil, err
298+
}
299+
switch t {
300+
case TypeNone:
301+
return dst, nil
302+
default:
303+
s, err := elem.StringCvt()
304+
if err != nil {
305+
return nil, err
306+
}
307+
dst = append(dst, s)
308+
}
309+
}
310+
}

simdjson_other.go

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ import (
2626

2727
// SupportedCPU will return whether the CPU is supported.
2828
func SupportedCPU() bool {
29-
return false
29+
return false
3030
}
3131

3232
// Parse a block of data and return the parsed JSON.
@@ -62,13 +62,12 @@ type Stream struct {
6262
// There is no guarantee that elements will be consumed, so always use
6363
// non-blocking writes to the reuse channel.
6464
func ParseNDStream(r io.Reader, res chan<- Stream, reuse <-chan *ParsedJson) {
65-
go func() {
66-
res <- Stream{
67-
Value: nil,
68-
Error: fmt.Errorf("Unsupported platform"),
69-
}
70-
close(res)
71-
}()
72-
return
65+
go func() {
66+
res <- Stream{
67+
Value: nil,
68+
Error: fmt.Errorf("Unsupported platform"),
69+
}
70+
close(res)
71+
}()
72+
return
7373
}
74-

0 commit comments

Comments
 (0)