-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Given that there is a real effort for a rethinking of plumber in this package, I would love for this team to consider ways to improve the performance of json serialization and deserialization. While jsonlite is a tried and true method for parsing and serializing json, the R community has created far more performant alternatives. Most notably, I believe these are yyjsonr, RcppSimdJson, and jsonify.
Below I compare the use of yyjsonr as an alternative to jsonlite in the existing {plumber} package as motivation.
Serialization
For smaller data, jsonlite is a very solid performer. However, as the size of the object that we need to serialize grows, its performance gets outshined.
Below is a the output of bench::press() where the penguins dataset (shouts out R 4.5) is sampled 10, 100, 10_000, 100_000, and 1_000_000, times. Each sample is done 25 times.
Once we hit ~10,000 the difference between the two packages becomes quite stark:
| expression | n | niter | min | median | itr/sec | mem_alloc | gc/sec | n_itr |
|---|---|---|---|---|---|---|---|---|
| jsonlite | 10 | 25 | 3.96ms | 4.56ms | 165.1419049 | 50.38KB | 0.0000000 | 25 |
| yyjsonr | 10 | 25 | 3.24ms | 3.34ms | 295.1780670 | 50.02KB | 12.2990861 | 24 |
| jsonlite | 100 | 25 | 4.38ms | 4.84ms | 204.9297927 | 62.88KB | 0.0000000 | 25 |
| yyjsonr | 100 | 25 | 3.42ms | 3.69ms | 269.8536134 | 61.67KB | 0.0000000 | 25 |
| jsonlite | 1000 | 25 | 7.61ms | 8.07ms | 123.3828804 | 190.94KB | 0.0000000 | 25 |
| yyjsonr | 1000 | 25 | 5.34ms | 5.5ms | 177.4808128 | 178.15KB | 0.0000000 | 25 |
| jsonlite | 10000 | 25 | 42.59ms | 45.1ms | 22.0107531 | 1.45MB | 0.9171147 | 24 |
| yyjsonr | 10000 | 25 | 26.12ms | 27.19ms | 36.5426133 | 1.31MB | 1.5226089 | 24 |
| jsonlite | 100000 | 25 | 458.22ms | 477.87ms | 2.0399509 | 14.13MB | 3.0599263 | 10 |
| yyjsonr | 100000 | 25 | 225.47ms | 255.6ms | 3.9622886 | 12.69MB | 3.6574972 | 13 |
| jsonlite | 1000000 | 25 | 5.71s | 6.18s | 0.1617885 | 141.83MB | 0.1617885 | 25 |
| yyjsonr | 1000000 | 25 | 2.45s | 2.73s | 0.3680920 | 126.49MB | 0.3680920 | 25 |
Deserialization
For deserialization, I've do the same thing but for samples of 10, 100, 10_000, and 100_000.
| expression | n | niter | min | median | itr/sec | mem_alloc | gc/sec | n_itr |
|---|---|---|---|---|---|---|---|---|
| jsonlite | 10 | 25 | 5.29ms | 5.99ms | 162.3904318 | 51.07KB | 0.0000000 | 25 |
| yyjsonr | 10 | 25 | 5.09ms | 5.42ms | 184.3409827 | 51KB | 0.0000000 | 25 |
| jsonlite | 100 | 25 | 6.46ms | 7.17ms | 139.5966556 | 115.97KB | 0.0000000 | 25 |
| yyjsonr | 100 | 25 | 5.65ms | 6.1ms | 163.2072773 | 114.72KB | 0.0000000 | 25 |
| jsonlite | 1000 | 25 | 16.49ms | 17.3ms | 57.8857935 | 705.23KB | 0.0000000 | 25 |
| yyjsonr | 1000 | 25 | 9.4ms | 10.2ms | 96.3886582 | 688.04KB | 0.0000000 | 25 |
| jsonlite | 10000 | 25 | 108.73ms | 113.81ms | 8.6441140 | 6.92MB | 0.3601714 | 24 |
| yyjsonr | 10000 | 25 | 44.97ms | 46.59ms | 20.8253656 | 6.38MB | 0.8677236 | 24 |
| jsonlite | 100000 | 25 | 1.09s | 1.16s | 0.8444085 | 80.15MB | 0.1151466 | 22 |
| yyjsonr | 100000 | 25 | 409.87ms | 435.56ms | 2.2831995 | 63.51MB | 0.3113454 | 22 |
Repro code
If you've made it this far, I would like to say Thank You For Coming To My TED Talkโข. I think quality of life enhancements like this can make R seem like an even more viable tool for backend development. I'd like to prove all of the haters wrong.
router <- callr::r_bg(\() {
library(plumber)
get_penguins <- function(n = 100) {
np <- nrow(penguins)
idx <- sample(1:np, n, replace = TRUE)
penguins[idx, ]
}
read_penguins <- function(req) {
body <- req$body
as.numeric(Sys.time() - req$STARTED_AT)
}
yy_serializer <- plumber:::serializer_content_type(
"application/json",
function(val) {
yyjsonr::write_json_str(val)
}
)
yy_parser <- function(...) {
function(value, content_type = "application/json", ...) {
yyjsonr::read_json_raw(value)
}
}
register_parser("yyjson", yy_parser, fixed = "application/json")
router <- pr() |>
pr_get("/serialize", get_penguins) |>
pr_get(
"/serialize-yy",
get_penguins,
serializer = yy_serializer
) |>
pr_post(
"/deserialize",
read_penguins,
) |>
pr_post("/deserialize-yy", read_penguins, parser = "yyjson") |>
pr_filter("set_time", function(req) {
req$STARTED_AT <- Sys.time()
forward()
})
pr_run(router, port = "3000")
})
get_pengos <- function(n) {
start <- Sys.time()
httr2::request("http://127.0.0.1:3000/serialize?") |>
httr2::req_url_query(n = n) |>
httr2::req_perform()
Sys.time() - start
}
get_pengos_yy <- function(n) {
start <- Sys.time()
httr2::request("http://127.0.0.1:3000/serialize-yy?") |>
httr2::req_url_query(n = n) |>
httr2::req_perform()
Sys.time() - start
}
benches <- bench::press(
n = c(1e1, 1e2, 1e3, 1e4, 1e5, 1e6),
niter = 25,
{
bench::mark(
jsonlite = get_pengos(n),
yyjsonr = get_pengos_yy(n),
iterations = niter,
check = FALSE
)
}
)
# plot(benches)
# benches[,1:9] |>
# dplyr::mutate(n = format(n, scientific = F)) |>
# knitr::kable() |>
# clipr::write_clip()
deserialize <- function(n) {
np <- nrow(penguins)
idx <- sample(1:np, n, replace = TRUE)
httr2::request("http://127.0.0.1:3000/deserialize") |>
httr2::req_body_json(penguins[idx, ]) |>
httr2::req_perform()
}
deserialize_yy <- function(n) {
np <- nrow(penguins)
idx <- sample(1:np, n, replace = TRUE)
httr2::request("http://127.0.0.1:3000/deserialize-yy") |>
httr2::req_body_json(penguins[idx, ]) |>
httr2::req_perform()
}
benches_de <- bench::press(
n = c(1e1, 1e2, 1e3, 1e4, 1e5),
niter = 25,
{
bench::mark(
jsonlite = deserialize(n),
yyjsonr = deserialize_yy(n),
iterations = niter,
check = FALSE
)
}
)
# plot(benches_de)
# benches_de[,1:9] |>
# dplyr::mutate(n = format(n, scientific = F)) |>
# knitr::kable() |>
# clipr::write_clip()