Skip to content

JSON (de)serialization performanceย #29

@JosiahParry

Description

@JosiahParry

Given that there is a real effort for a rethinking of plumber in this package, I would love for this team to consider ways to improve the performance of json serialization and deserialization. While jsonlite is a tried and true method for parsing and serializing json, the R community has created far more performant alternatives. Most notably, I believe these are yyjsonr, RcppSimdJson, and jsonify.

Below I compare the use of yyjsonr as an alternative to jsonlite in the existing {plumber} package as motivation.

Serialization

For smaller data, jsonlite is a very solid performer. However, as the size of the object that we need to serialize grows, its performance gets outshined.

Below is a the output of bench::press() where the penguins dataset (shouts out R 4.5) is sampled 10, 100, 10_000, 100_000, and 1_000_000, times. Each sample is done 25 times.

Once we hit ~10,000 the difference between the two packages becomes quite stark:

expression n niter min median itr/sec mem_alloc gc/sec n_itr
jsonlite 10 25 3.96ms 4.56ms 165.1419049 50.38KB 0.0000000 25
yyjsonr 10 25 3.24ms 3.34ms 295.1780670 50.02KB 12.2990861 24
jsonlite 100 25 4.38ms 4.84ms 204.9297927 62.88KB 0.0000000 25
yyjsonr 100 25 3.42ms 3.69ms 269.8536134 61.67KB 0.0000000 25
jsonlite 1000 25 7.61ms 8.07ms 123.3828804 190.94KB 0.0000000 25
yyjsonr 1000 25 5.34ms 5.5ms 177.4808128 178.15KB 0.0000000 25
jsonlite 10000 25 42.59ms 45.1ms 22.0107531 1.45MB 0.9171147 24
yyjsonr 10000 25 26.12ms 27.19ms 36.5426133 1.31MB 1.5226089 24
jsonlite 100000 25 458.22ms 477.87ms 2.0399509 14.13MB 3.0599263 10
yyjsonr 100000 25 225.47ms 255.6ms 3.9622886 12.69MB 3.6574972 13
jsonlite 1000000 25 5.71s 6.18s 0.1617885 141.83MB 0.1617885 25
yyjsonr 1000000 25 2.45s 2.73s 0.3680920 126.49MB 0.3680920 25

Deserialization

For deserialization, I've do the same thing but for samples of 10, 100, 10_000, and 100_000.

Image
expression n niter min median itr/sec mem_alloc gc/sec n_itr
jsonlite 10 25 5.29ms 5.99ms 162.3904318 51.07KB 0.0000000 25
yyjsonr 10 25 5.09ms 5.42ms 184.3409827 51KB 0.0000000 25
jsonlite 100 25 6.46ms 7.17ms 139.5966556 115.97KB 0.0000000 25
yyjsonr 100 25 5.65ms 6.1ms 163.2072773 114.72KB 0.0000000 25
jsonlite 1000 25 16.49ms 17.3ms 57.8857935 705.23KB 0.0000000 25
yyjsonr 1000 25 9.4ms 10.2ms 96.3886582 688.04KB 0.0000000 25
jsonlite 10000 25 108.73ms 113.81ms 8.6441140 6.92MB 0.3601714 24
yyjsonr 10000 25 44.97ms 46.59ms 20.8253656 6.38MB 0.8677236 24
jsonlite 100000 25 1.09s 1.16s 0.8444085 80.15MB 0.1151466 22
yyjsonr 100000 25 409.87ms 435.56ms 2.2831995 63.51MB 0.3113454 22

Repro code

If you've made it this far, I would like to say Thank You For Coming To My TED Talkโ„ข. I think quality of life enhancements like this can make R seem like an even more viable tool for backend development. I'd like to prove all of the haters wrong.

router <- callr::r_bg(\() {
  library(plumber)

  get_penguins <- function(n = 100) {
    np <- nrow(penguins)
    idx <- sample(1:np, n, replace = TRUE)
    penguins[idx, ]
  }

  read_penguins <- function(req) {
    body <- req$body
    as.numeric(Sys.time() - req$STARTED_AT)
  }

  yy_serializer <- plumber:::serializer_content_type(
    "application/json",
    function(val) {
      yyjsonr::write_json_str(val)
    }
  )
  yy_parser <- function(...) {
    function(value, content_type = "application/json", ...) {
      yyjsonr::read_json_raw(value)
    }
  }

  register_parser("yyjson", yy_parser, fixed = "application/json")

  router <- pr() |>
    pr_get("/serialize", get_penguins) |>
    pr_get(
      "/serialize-yy",
      get_penguins,
      serializer = yy_serializer
    ) |>
    pr_post(
      "/deserialize",
      read_penguins,
    ) |>
    pr_post("/deserialize-yy", read_penguins, parser = "yyjson") |>
    pr_filter("set_time", function(req) {
      req$STARTED_AT <- Sys.time()
      forward()
    })

  pr_run(router, port = "3000")
})


get_pengos <- function(n) {
  start <- Sys.time()
  httr2::request("http://127.0.0.1:3000/serialize?") |>
    httr2::req_url_query(n = n) |>
    httr2::req_perform()
  Sys.time() - start
}

get_pengos_yy <- function(n) {
  start <- Sys.time()
  httr2::request("http://127.0.0.1:3000/serialize-yy?") |>
    httr2::req_url_query(n = n) |>
    httr2::req_perform()
  Sys.time() - start
}


benches <- bench::press(
  n = c(1e1, 1e2, 1e3, 1e4, 1e5, 1e6),
  niter = 25,
  {
    bench::mark(
      jsonlite = get_pengos(n),
      yyjsonr = get_pengos_yy(n),
      iterations = niter,
      check = FALSE
    )
  }
)

# plot(benches)


# benches[,1:9] |> 
#   dplyr::mutate(n = format(n, scientific = F)) |> 
#   knitr::kable() |> 
#   clipr::write_clip()


deserialize <- function(n) {
  np <- nrow(penguins)
  idx <- sample(1:np, n, replace = TRUE)
  httr2::request("http://127.0.0.1:3000/deserialize") |>
    httr2::req_body_json(penguins[idx, ]) |>
    httr2::req_perform()
}


deserialize_yy <- function(n) {
  np <- nrow(penguins)
  idx <- sample(1:np, n, replace = TRUE)
  httr2::request("http://127.0.0.1:3000/deserialize-yy") |>
    httr2::req_body_json(penguins[idx, ]) |>
    httr2::req_perform()
}


benches_de <- bench::press(
  n = c(1e1, 1e2, 1e3, 1e4, 1e5),
  niter = 25,
  {
    bench::mark(
      jsonlite = deserialize(n),
      yyjsonr = deserialize_yy(n),
      iterations = niter,
      check = FALSE
    )
  }
)


# plot(benches_de)
# benches_de[,1:9] |> 
#   dplyr::mutate(n = format(n, scientific = F)) |> 
#   knitr::kable() |> 
#   clipr::write_clip()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions