Add section on "How complexity scoring works"

gmac · gmac · commit bfcc3f1d0144 · 2024-01-19T06:55:15.000-05:00
diff --git a/guides/queries/complexity_and_depth.md b/guides/queries/complexity_and_depth.md
@@ -10,6 +10,46 @@ index: 4
 
 GraphQL-Ruby ships with some validations based on {% internal_link "query analysis", "/queries/ast_analysis" %}. You can customize them as-needed, too.
 
+## Prevent deeply-nested queries
+
+You can also reject queries based on the depth of their nesting. You can define `max_depth` at schema-level or query-level:
+
+```ruby
+# Schema-level:
+class MySchema < GraphQL::Schema
+  # ...
+  max_depth 15
+end
+
+# Query-level, which overrides the schema-level setting:
+MySchema.execute(query_string, max_depth: 20)
+```
+
+By default, **introspection fields are counted**. The default introspection query requires at least `max_depth 13`. You can also configure your schema not to count introspection fields with `max_depth ..., count_introspection_fields: false`.
+
+You can use `nil` to disable the validation:
+
+```ruby
+# This query won't be validated:
+MySchema.execute(query_string, max_depth: nil)
+```
+
+To get a feeling for depth of queries in your system, you can extend {{ "GraphQL::Analysis::AST::QueryDepth" | api_doc }}. Hook it up to log out values from each query:
+
+```ruby
+class LogQueryDepth < GraphQL::Analysis::AST::QueryDepth
+  def result
+    query_depth = super
+    message = "[GraphQL Query Depth] #{query_depth} || staff?  #{query.context[:current_user].staff?}"
+    Rails.logger.info(message)
+  end
+end
+
+class MySchema < GraphQL::Schema
+  query_analyzer(LogQueryDepth)
+end
+```
+
 ## Prevent complex queries
 
 Fields have a "complexity" value which can be configured in their definition. It can be a constant (numeric) value, or a proc. If no `complexity` is defined for a field, it will default to a value of `1`. It can be defined as a keyword _or_ inside the configuration block. For example:
@@ -121,42 +161,159 @@ class Types::BaseField < GraphQL::Schema::Field
 end
 ```
 
-## Prevent deeply-nested queries
+## How complexity scoring works
 
-You can also reject queries based on the depth of their nesting. You can define `max_depth` at schema-level or query-level:
+GraphQL Ruby's complexity scoring algorithm is biased towards selection fairness. While highly accurate, its results are not always intuitive. Here's an example query performed on the [Shopify Admin API](https://shopify.dev/docs/api/admin-graphql):
 
-```ruby
-# Schema-level:
-class MySchema < GraphQL::Schema
-  # ...
-  max_depth 15
-end
-
-# Query-level, which overrides the schema-level setting:
-MySchema.execute(query_string, max_depth: 20)
+```graphql
+query {
+  node(id: "123") { # interface Node
+    id
+    ...on HasMetafields { # interface HasMetafields
+      metafield(key: "a") {
+        value
+      }
+      metafields(first: 10) {
+        nodes {
+          value
+        }
+      }
+    }
+    ...on Product { # implements HasMetafields
+      title
+      metafield(key: "a") {
+        definition {
+          description
+        }
+      }
+    }
+    ...on PriceList {
+      name
+      catalog {
+        id
+      }
+    }
+  }
+}
 ```
 
-By default, **introspection fields are counted**. The default introspection query requires at least `max_depth 13`. You can also configure your schema not to count introspection fields with `max_depth ..., count_introspection_fields: false`.
+First, GraphQL Ruby allows field definitions to specify a `complexity` attribute that provides a complexity score (or a proc that computes a score) for each field. Let's say that this schema defines a system where:
 
-You can use `nil` to disable the validation:
+- Leaf fields cost `0`
+- Composite fields cost `1`
+- Connection fields cost `children * input size`
 
-```ruby
-# This query won't be validated:
-MySchema.execute(query_string, max_depth: nil)
+Given these parameters, we get an itemized scoring distribution of:
+
+```graphql
+query {
+  node(id: "123") { # 1, composite
+    id # 0, leaf
+    ...on HasMetafields {
+      metafield(key: "a") { # 1, composite
+        value # 0, leaf
+      }
+      metafields(first: 10) { # 1 * 10, connection
+        nodes { # 1, composite
+          value # 0, leaf
+        }
+      }
+    }
+    ...on Product {
+      title # 0, leaf
+      metafield(key: "a") { # 1, composite
+        definition { # 1, composite
+          description # 0, leaf
+        }
+      }
+    }
+    ...on PriceList {
+      name # 0, leaf
+      catalog { # 1, composite
+        id # 0, leaf
+      }
+    }
+  }
+}
 ```
 
-To get a feeling for depth of queries in your system, you can extend {{ "GraphQL::Analysis::AST::QueryDepth" | api_doc }}. Hook it up to log out values from each query:
+However, we cannot naively tally these itemized scores without over-costing the query. Consider:
+
+- The `node` scope makes many _possible_ selections on an abstract type, so we need the maximum among concrete possibilities for a fair representation.
+- A `node.metafield` selection path is duplicated across the `HasMetafields` and `Product` selection scopes. This path will only resolve once, so should also only cost once.
+
+To reconcile these possibilities, the [complexity algorithm](https://github.com/rmosolgo/graphql-ruby/blob/master/lib/graphql/analysis/ast/query_complexity.rb) breaks the selection down into a tree of types mapped to possible selections, across which lexical selections can be coalesced and deduplicated (pseudocode):
 
 ```ruby
-class LogQueryDepth < GraphQL::Analysis::AST::QueryDepth
-  def result
-    query_depth = super
-    message = "[GraphQL Query Depth] #{query_depth} || staff?  #{query.context[:current_user].staff?}"
-    Rails.logger.info(message)
-  end
-end
+{
+  Schema::Query => {
+    "node" => {
+      Schema::Node => {
+        "id" => nil,
+      },
+      Schema::HasMetafields => {
+        "metafield" => {
+          Schema::Metafield => {
+            "value" => nil,
+          },
+        },
+        "metafields" => {
+          Schema::Metafield => {
+            "nodes" => { ... },
+          },
+        },
+      },
+      Schema::Product => {
+        "title" => nil,
+        "metafield" => {
+          Schema::Metafield => {
+            "definition" => { ... },
+          },
+        },
+      },
+      Schema::PriceList => {
+        "name" => nil,
+        "catalog" => {
+          Schema::Catalog => {
+            "id" => nil,
+          },
+        },
+      },
+    },
+  },
+}
+```
 
-class MySchema < GraphQL::Schema
-  query_analyzer(LogQueryDepth)
-end
+This aggregation provides a new perspective on the scoring where _possible typed selections_ have costs rather than individual fields. In this normalized view, `Product` acquires the `HasMetafields` interface costs, and ignores a duplicated path. Ultimately the maximum of possible typed costs is used, making this query cost `12`:
+
+```graphql
+query {
+  node(id: "123") { # max(11, 12, 1) = 12
+    id
+    ...on HasMetafields { # 1 + 10 = 11
+      metafield(key: "a") { # 1
+        value
+      }
+      metafields(first: 10) { # 10
+        nodes {
+          value
+        }
+      }
+    }
+    ...on Product { # 1 + 11 from HasMetafields = 12
+      title
+      metafield(key: "a") { # duplicated in HasMetafields
+        definition { # 1
+          description
+        }
+      }
+    }
+    ...on PriceList { # 1 = 1
+      name
+      catalog { # 1
+        id
+      }
+    }
+  }
+}
 ```