Skip to content

Writing batch loaders

Antti Koskinen edited this page Jul 31, 2018 · 3 revisions

Some applications can run into problems where you are fetching the same data over and over during the course of a single query. Also, same as ORMs, it's easy to write inefficient code where you load child resources one at a time. This is known as the N+1 queries problem.

So we need a mechanism of batching and caching data requests.

For example, if we want to load articles and their authors, a naive implementation might look like this:

(defn author
  "Fetch the author of an article"
  [node opt ctx info]
  (get-article-author ctx (:id node)))
    
(defn articles
  "Fetch all articles"
  [node opt ctx info]
  (map #(assoc % :author #'author) (get-all-articles ctx))))

We can see that get-article-author function is called for every article. If we have ten articles, we run eleven queries even if multiple articles have the same author. What we can do instead is load all authors in one go and reuse the cached result as needed. For that we define a batch-loader:

(require '[specialist-server.batch-loader :as b])

(def author-batch
  (b/batch-loader
    (fn [article-id-set ctx]
      (let [authors (get-articles-authors ctx article-id-set)]
        (reduce
          (fn [coll a] (assoc coll (:article-id a) a)) {} authors)))))

The general idea is that you define a function that takes a set of ids as an argument. From that function you return a map of data using that same set as keys. You can pass in other arguments as well, for example your resolver context map. Note that this function might get called multiple times when processing a query.

The author-batch function can be used in your resolvers like this:

(defn author
  "Fetch the author of an article"
  [node opt ctx info]
  (author-batch (:req-cache ctx) (:id node) ctx))

Instead of running a query and returning the result straight up, we call our batch loader with the id of the item we want. First argument to author-batch is the request cache atom. This is shared between all your resolvers and batch loaders. The second argument is the cache key, in this case the article id. Other arguments can be added as needed. Here we pass the context map as well so we can use it inside the author-batch function.

Our modified example only needs to run two queries, one for the list of articles and the other for all the authors we need.

The final piece to make this work is the per-request cache atom. This is defined as an empty map (atom {}) when calling the query executor:

(POST "/graphql" req
      (let [opt {:query     (get-in req [:body :query])
                 :variables (get-in req [:body :variables])
                 :root      (my-query-root)
                 :context   {:config (get-config)
                             :db-conn (get-db)
                             :req-cache (atom {})}}] ; you can also use (b/cache) for convenience
        (response (my-executor opt))))

In this example the :req-cache is initialized for every request. This is the recommended way for most cases but if you are feeling adventurous you could share the cache atom between requests. Just keep in mind that you need to handle cache invalidation yourself.

As always, you should benchmark everything and use the solution that makes the most sense for you.

Clone this wiki locally