Skip to content

Commit 0f609dd

Browse files
Update to simdjson 3.9.5 (#83)
* switched to ondemand parser * removed LIBS_PATH and debug symbols * a bit smarter about `pushinteger` and `pushnumber` for Lua 5.3+
1 parent 4ea3cc3 commit 0f609dd

File tree

7 files changed

+166866
-16462
lines changed

7 files changed

+166866
-16462
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ TARGET = simdjson.$(LIBEXT)
2525
all: $(TARGET)
2626

2727
$(TARGET):
28-
$(CXX) $(SRC) $(FLAGS) $(INCLUDE) $(LIBS_PATH) $(LIBS) -o $@
28+
$(CXX) $(SRC) $(FLAGS) $(INCLUDE) $(LIBS) -o $@
2929

3030
clean:
3131
rm *.$(LIBEXT)

README.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# lua-simdjson (WIP)
1+
# lua-simdjson
22
[![Build Status](https://github.com/FourierTransformer/lua-simdjson/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/FourierTransformer/lua-simdjson/actions?query=branch%3Amaster)
33

4-
A basic lua binding to [simdjson](https://simdjson.org). The simdjson library is an incredibly fast JSON parser that uses SIMD instructions and fancy algorithms to parse JSON very quickly. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, 5.3, and 5.4 on linux/osx. It has a general parsing mode and a lazy mode that uses a JSON pointer.
4+
A basic Lua binding to [simdjson](https://simdjson.org). The simdjson library is an incredibly fast JSON parser that uses SIMD instructions and fancy algorithms to parse JSON very quickly. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, 5.3, and 5.4 on linux/osx/windows. It has a general parsing mode and a lazy mode that uses a JSON pointer.
55

6-
Current simdjson version: 0.5.0
6+
Current simdjson version: 3.9.5
77

88
## Installation
99
If all the requirements are met, lua-simdjson can be install via luarocks with:
@@ -15,28 +15,29 @@ Otherwise it can be installed manually by pulling the repo and running luarocks
1515

1616
## Requirements
1717
* lua-simdjson only works on 64bit systems.
18-
* a lua build environment with support for C++11
18+
* a Lua build environment with support for C++11
1919
* g++ version 7+ and clang++ version 6+ or newer should work!
2020

2121
## Parsing
2222
There are two main ways to parse JSON in lua-simdjson:
23-
1. With `parse`: this parses JSON and returns a lua table with the parsed values
23+
1. With `parse`: this parses JSON and returns a Lua table with the parsed values
2424
2. With `open`: this reads in the JSON and keeps it in simdjson's internal format. The values can then be accessed using a JSON pointer (examples below)
2525

2626
Both of these methods also have support to read files on disc with `parseFile` and `openFile` respectively. If handling JSON from disk, these methods should be used and are incredibly fast.
2727

2828
## Typing
29-
* lua-simdjson uses `simdjson.null` to represent `null` values from parsed JSON.
30-
* Any application should use that for comparison as needed.
31-
* it uses `lua_pushnumber` and `lua_pushinteger` for JSON floats and ints respectively, so your lua version may handle that slightly differently.
32-
* All other types map as expected.
29+
* lua-simdjson uses `simdjson.null` to represent `null` values from parsed JSON.
30+
* Any application should use that for comparison as needed.
31+
* it uses `lua_pushnumber` and `lua_pushinteger` for JSON floats and ints respectively, so your Lua version may handle that slightly differently.
32+
* `lua_pushinteger` uses signed ints. A number from JSON larger than `LUA_MAXINTEGER` will be represented as a float/number
33+
* All other types map as expected.
3334

3435
### Parse some JSON
35-
The `parse` methods will return a normal lua table that can be interacted with.
36+
The `parse` methods will return a normal Lua table that can be interacted with.
3637
```lua
3738
local simdjson = require("simdjson")
3839
local response = simdjson.parse([[
39-
{
40+
{
4041
"Image": {
4142
"Width": 800,
4243
"Height": 600,
@@ -60,11 +61,11 @@ print(fileResponse["statuses"][1]["id"])
6061
```
6162

6263
### Open some json
63-
The `open` methods currently require the use of a JSON pointer, but are very quick.
64+
The `open` methods currently require the use of a JSON pointer, but are very quick. They are best used when you only need a part of a response. In the example below, it could be useful for just getting the `Thumnail` object with `:atPointer("/Image/Thumbnail")` which will then only create a Lua table with those specific values.
6465
```lua
6566
local simdjson = require("simdjson")
6667
local response = simdjson.open([[
67-
{
68+
{
6869
"Image": {
6970
"Width": 800,
7071
"Height": 600,
@@ -82,21 +83,21 @@ local response = simdjson.open([[
8283
print(response:atPointer("/Image/Width"))
8384

8485
-- OR to parse a file from disk
85-
local fileResponse = simdjson.open("jsonexamples/twitter.json")
86+
local fileResponse = simdjson.openFile("jsonexamples/twitter.json")
8687
print(fileResponse:atPointer("/statuses/0/id")) --using a JSON pointer
8788

8889
```
89-
Starting with version 0.5.0, the the `atPointer` method is JSON pointer compliant. The previous pointer implementation is considered deprecated, but is still available with the `at` method.
90+
Starting with version 0.5.0, the `atPointer` method is JSON pointer compliant. The previous pointer implementation is considered deprecated, but is still available with the `at` method.
9091

9192
The `open` and `parse` codeblocks should print out the same values. It's worth noting that the JSON pointer indexes from 0.
9293

93-
This lazy style of using the simdjson data structure could also be used with array access in the future, and would result in ultra-fast JSON "parsing".
94+
This lazy style of using the simdjson data structure could also be used with array access in the future.
9495

9596
## Error Handling
9697
lua-simdjson will error out with any errors from simdjson encountered while parsing. They are very good at helping identify what has gone wrong during parsing.
9798

9899
## Benchmarks
99-
I ran some benchmarks against lua-cjson, rapidjson, and dkjson. For each test, I loaded the JSON into memory, and then had the parsers go through each file 100 times and took the average time it took to parse to a lua table. You can see all the results in the [benchmark](benchmark/) folder. I've included a sample output run via Lua (the LuaJIT graph looks very similar, also in the benchmark folder). The y-axis is logarithmic, so every half step down is twice as fast.
100+
I ran some benchmarks against lua-cjson, rapidjson, and dkjson. For each test, I loaded the JSON into memory, and then had the parsers go through each file 100 times and took the average time it took to parse to a Lua table. You can see all the results in the [benchmark](benchmark/) folder. I've included a sample output run via Lua (the LuaJIT graph looks very similar, also in the benchmark folder). The y-axis is logarithmic, so every half step down is twice as fast.
100101

101102
![Lua Performance Column Chart](benchmark/lua-perf.png)
102103

@@ -109,13 +110,13 @@ All tested files are in the [jsonexamples folder](jsonexamples/).
109110
lua-simdjson, like the simdjson library performs better on more modern hardware. These benchmarks were run on a ninth-gen i7 processor. On an older processor, rapidjson may perform better.
110111

111112
## Caveats & Alternatives
112-
* there is no encoding/dumping a lua table to JSON (yet! Most other lua JSON libraries can handle this)
113-
* it only works on 64 bit systems (untested on Windows...)
113+
* there is no encoding/dumping a Lua table to JSON (yet! Most other lua JSON libraries can handle this)
114+
* it only works on 64 bit systems
114115
* it builds a large binary. On a modern linux system, it ended up being \~200k (lua-cjson comes in at 42k)
115116
* since it's an external module, it's not quite as easy to just grab the file and go (dkjson has you covered here!)
116117

117118
## Philosophy
118-
I plan to keep it fairly inline with what the original simdjson library is capable of doing, which really means not adding too many additional options. The big _thing_ that's missing so far is encoding a lua table to JSON. I may add in an encoder at some point (likely modified from an existing lua library). There are some rumours that simdjson _may_ support creating JSON structure in the future. If that happens, I would likely switch to it.
119+
I plan to keep it fairly inline with what the original simdjson library is capable of doing, which really means not adding too many additional options. The big _thing_ that's missing so far is encoding a lua table to JSON. I may add in an encoder at some point.
119120

120121
## Licenses
121122
* The jsonexamples, src/simdjson.cpp, src/simdjson.h are unmodified from the released version simdjson under the Apache License 2.0.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
package="lua-simdjson"
2-
version="0.0.2-1"
2+
version="0.0.3-1"
33
source = {
44
url = "git://github.com/FourierTransformer/lua-simdjson",
5-
tag = "0.0.2"
5+
tag = "0.0.3"
66
}
77
description = {
88
summary = "This is a simple Lua binding for simdjson",

spec/compile_spec.lua

Lines changed: 87 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -11,34 +11,93 @@ end
1111

1212

1313
local files = {
14-
"apache_builds.json",
15-
"canada.json",
16-
"citm_catalog.json",
17-
"github_events.json",
18-
"google_maps_api_compact_response.json",
19-
"google_maps_api_response.json",
20-
"gsoc-2018.json",
21-
"instruments.json",
22-
"marine_ik.json",
23-
"mesh.json",
24-
"mesh.pretty.json",
25-
"numbers.json",
26-
"random.json",
27-
"repeat.json",
28-
"twitter_timeline.json",
29-
"update-center.json",
30-
"small/adversarial.json",
31-
"small/demo.json",
32-
"small/flatadversarial.json",
33-
"small/smalldemo.json",
34-
"small/truenull.json"
14+
"apache_builds.json",
15+
"canada.json",
16+
"citm_catalog.json",
17+
"github_events.json",
18+
"google_maps_api_compact_response.json",
19+
"google_maps_api_response.json",
20+
"gsoc-2018.json",
21+
"instruments.json",
22+
"marine_ik.json",
23+
"mesh.json",
24+
"mesh.pretty.json",
25+
"numbers.json",
26+
"random.json",
27+
"repeat.json",
28+
"twitter_timeline.json",
29+
"update-center.json",
30+
"small/adversarial.json",
31+
"small/demo.json",
32+
"small/flatadversarial.json",
33+
"small/smalldemo.json",
34+
"small/truenull.json"
3535
}
3636

37-
describe("Make sure everything compiled correctly", function()
38-
for _, file in ipairs(files) do
39-
it("should parse the file: " .. file, function()
40-
local fileContents = loadFile("jsonexamples/" .. file)
41-
assert.are.same(cjson.decode(fileContents), simdjson.parse(fileContents))
42-
end)
43-
end
37+
describe("Make sure it parses strings correctly", function()
38+
for _, file in ipairs(files) do
39+
it("should parse the file: " .. file, function()
40+
local fileContents = loadFile("jsonexamples/" .. file)
41+
local cjsonDecodedValues = cjson.decode(fileContents)
42+
assert.are.same(cjsonDecodedValues, simdjson.parse(fileContents))
43+
end)
44+
end
4445
end)
46+
47+
describe("Make sure it parses files correctly", function()
48+
for _, file in ipairs(files) do
49+
it("should parse the file: " .. file, function()
50+
local fileContents = loadFile("jsonexamples/" .. file)
51+
local cjsonDecodedValues = cjson.decode(fileContents)
52+
assert.are.same(cjsonDecodedValues, simdjson.parseFile("jsonexamples/" .. file))
53+
end)
54+
end
55+
end)
56+
57+
describe("Make sure json pointer works with a string", function()
58+
it("should handle a string", function()
59+
local fileContents = loadFile("jsonexamples/small/demo.json")
60+
local decodedFile = simdjson.open(fileContents)
61+
assert.are.same(800, decodedFile:atPointer("/Image/Width"))
62+
assert.are.same(600, decodedFile:atPointer("/Image/Height"))
63+
assert.are.same(125, decodedFile:atPointer("/Image/Thumbnail/Height"))
64+
assert.are.same(943, decodedFile:atPointer("/Image/IDs/1"))
65+
end)
66+
end)
67+
68+
describe("Make sure json pointer works with openfile", function()
69+
it("should handle opening a file", function()
70+
local decodedFile = simdjson.openFile("jsonexamples/small/demo.json")
71+
assert.are.same(800, decodedFile:atPointer("/Image/Width"))
72+
assert.are.same(600, decodedFile:atPointer("/Image/Height"))
73+
assert.are.same(125, decodedFile:atPointer("/Image/Thumbnail/Height"))
74+
assert.are.same(943, decodedFile:atPointer("/Image/IDs/1"))
75+
end)
76+
end)
77+
78+
local major, minor = _VERSION:match('([%d]+)%.(%d+)')
79+
if tonumber(major) >= 5 and tonumber(minor) >= 3 then
80+
describe("Make sure ints and floats parse correctly", function ()
81+
it("should handle decoding numbers appropriately", function()
82+
83+
local numberCheck = simdjson.parse([[
84+
{
85+
"float": 1.2,
86+
"min_signed_integer": -9223372036854775808,
87+
"max_signed_integer": 9223372036854775807,
88+
"one_above_max_signed_integer": 9223372036854775808,
89+
"min_unsigned_integer": 0,
90+
"max_unsigned_integer": 18446744073709551615
91+
}
92+
]])
93+
94+
assert.are.same("float", math.type(numberCheck["float"]))
95+
assert.are.same("integer", math.type(numberCheck["max_signed_integer"]))
96+
assert.are.same("integer", math.type(numberCheck["min_signed_integer"]))
97+
assert.are.same("float", math.type(numberCheck["one_above_max_signed_integer"]))
98+
assert.are.same("integer", math.type(numberCheck["min_unsigned_integer"]))
99+
assert.are.same("float", math.type(numberCheck["max_unsigned_integer"]))
100+
101+
end)
102+
end)
103+
end

0 commit comments

Comments
 (0)